We need help suggesting, preparing and maintaining a set of "core" datasets as Data Packages. Note that:
- We package data rather than create it – our focus is to take source data and ensure it is of high quality and in a standard form
- We preserve a clean separation between the data source, the data package and this registry – for example, data packages are stored in git repos hosted separately (preferably github)
The list of datasets shortlisted for "core" is kept as a list of github issues:
Official list of shortlisted "core" datasets
Many of the shortlisted datasets need a "Packager" – someone to help tidy up the data and turn it into a Data Package (see instructions below). If you are interested in helping prep a dataset just comment on the issue and we'll assign it to you.
Suggest a Dataset to Shortlist
To propose a dataset for addition you open an issue in the Registry with the details of the proposed dataset.
Preparing and Submitting a Dataset
The key steps are:
Preparing a Dataset
All datasets MUST be provided in source form as "data
packages" and if tabular SHOULD be in Tabular Data Package. We also recommend storing in
a git repo on GitHub.
Read the Publishing Data Packages guide to find out more.
For dataset to be designated as "core" it should meet the following criteria:
- Quality - the dataset must be well structured
- Relevance and importance - the focus at present is on indicators and reference data
- Ongoing support - it should have a maintainer
- Openness - data should be open data and openly licensed in accordance with the Open Definition