Skip to content

Consider moving metadata to the Grant object and remove package metadata in a future version #414

@mrshll1001

Description

@mrshll1001

I think we should consider dropping the package metadata, and moving as much of it as possible to the Grant object. Doing so has a number of benefits, both technical and practical. This does have some drawbacks in the short term for 1.x publishers, but has long-term benefits as we look towards a MAJOR upgrade.

The discussion at the 2025-06 SC meeting raised a point about version numbers being used in the package metadata, to identify which version of the Standard a file is using. This is identified by the version field, added by version 1.1. This means that version is an optional field.

Current usage of the version field is 23 publishers (list available upon request) across 45 files. This number was achieved by looking at the datastore's copies of the JSON data this morning (2025-06-19T09:35:41+01:00).

I think moving this field — as well as other appropriate metadata — to the Grant object has the following benefits:

  • Enables CSV publishers to publish metadata without updating our tool infrastructure. Metadata is just added as columns to the grants file.
  • Makes it easier to distinguish metadata about the file vs metadata about the grant (see metadata - whats the difference between "title" and "distribution:title" ? #254).
  • Makes it easier to publish large grants datasets as JSON Lines in the future, which has a number of technical benefits for data use compared to the current package format (although the two are not mutually exclusive). I think this specific discussion is slightly out of scope for this issue, but it's worth noting that moving metadata to the grant objects sets the Standard up for this in the future.

Some drawbacks:

  • Repeated information for tabular publishers. Metadata fields are repeated across grants.
  • If the SC approved a proposal to remove the package metadata, this is a MAJOR change. We could implement grant-level metadata in a MINOR upgrade and issue a deprecation notice for the package metadata, but this might cause some confusion while the fields exist in two places and it'd require careful documentation and guidance.
  • Publishers might be unclear whether they need to update the version number of historic grants or not, resulting in a grants file which had multiple different versions.

Of the existing fields in the package schema, I think the following would be most useful moved over:

  • version – explained above and discussed in the SC. Allows grants to declare the version of the standard they're using.
  • extensions – each grant would then declare its own list of extensions. This means it's easier to create JSON files where grants all use different extensions.
  • publisher – This ties the publisher to each grant specifically. This makes it easier to redistribute data as each grant is cited properly. This means each grant would have a funder and a publisher, which might be different. If we moved this to the grant schema, we could also make it use the Organization definition to standardise it (at the moment it's its own custom thing inside the package schema)
  • license – see above. This makes it easier to distribute packages of grants from multiple sources with different licenses. It also covers the (theoretical) possibility that certain publishers might have to publish different grants under different licenses.

The other fields in the package schema are:

  • title – this would conflict with the Grant/title field. I don't think it's necessary to bring this across to grant level.
  • description – this would conflict with the Grant/description field. I don't think it's necessary to bring this across to grant level.
  • issued – This is the date the package was issued. I don't think this is necessary to bring across, as Grant/dateModified could be used to track the latest modified date of the grant record, and this wouldn't add much except if people had a specific use case of analysing when a grant was published vs its updated date.
  • modified – This is the date the package was last modified. This is handled at grant level already with Grant/dateModified, so not necessary to bring this across.
  • identifier – This is a string identifier for the data package. The job this does for the package is already accomplished by Grant/id, and wouldn't need bringing over. I'm not certain many use cases depend on being able to track which package a grant was published in?
  • downloadURL – this might not be a good idea to move to the grant object wholesale as most grants are not identified by URLs in published files; the files are. Possibly relates to JSON-LD / schema.org integration? #352
  • accessURL – Same as above, although could be refactored into a URL to access the containing file. Something like distributedIn, similar. This way, individual grants could link to their original containing file, which might be nice in cases where grants data is repackaged/distributed.

Of course, if our goal is to simply increase uptake of the version field then this might not help us reach that goal in the short term and we can consider other options – but I think this is still a worthwhile discussion to have re how the standard's model and packaging system work. It's worth noting that OCDS have deprecated most of their package metadata, but have kept in version and extensions:

We should decide what works best for our publishers and data users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions