An Open Knowledge Foundation Labs Project
This project is a community-driven effort from OKFN Labs – sign up now to get involved

Core Datasets as Data Packages
Important, commonly-used datasets in high quality, easy-to-use & open form

High Quality & Reliable

Sourcing, normalizing and quality checking a set of key reference and indicator datasets such as country codes, currencies, GDP and population.

Standard Form & Bulk Access

All datasets provided in a standardized form and can be accessed in bulk as CSV together with a simple JSON schema.

Versioned & Packaged

All data is in data packages and is versioned using git so all changes are visible and data can be collaboratively maintained.

Contribute

We need help suggesting, preparing and maintaining a set of "core" datasets as Data Packages. Note that:

  • We package data rather than create it – our focus is to take source data and ensure it is of high quality and in a standard form
  • We preserve a clean separation between the data source, the data package and this registry – for example, data packages are stored in git repos hosted separately (preferably github)

Shortlisted Datasets

The list of datasets shortlisted for "core" is kept as a list of github issues:

Official list of shortlisted "core" datasets

Many of the shortlisted datasets need a "Packager" – someone to help tidy up the data and turn it into a Data Package (see instructions below). If you are interested in helping prep a dataset just comment on the issue and we'll assign it to you.

Suggest a Dataset to Shortlist

To propose a dataset for addition you open an issue in the Registry with the details of the proposed dataset.

Preparing and Submitting a Dataset

The key steps are:

Preparing a Dataset

All datasets MUST be provided in source form as "data packages" and if tabular SHOULD be in Tabular Data Package. We also recommend storing in a git repo on GitHub.

Read the Publishing Data Packages guide to find out more.

Assessment Criteria

For dataset to be designated as "core" it should meet the following criteria:

  • Quality - the dataset must be well structured
  • Relevance and importance - the focus at present is on indicators and reference data
  • Ongoing support - it should have a maintainer
  • Openness - data should be open data and openly licensed in accordance with the Open Definition