An Open Knowledge Foundation Labs Project
This project is a community-driven effort from OKFN Labs – sign up now to get involved

Core Datasets as Data Packages
Important, commonly-used datasets in high quality, easy-to-use & open form

High Quality & Reliable

Sourcing, normalizing and quality checking a set of key reference and indicator datasets such as country codes, currencies, GDP and population.

Standard Form & Bulk Access

All datasets provided in a standardized form and can be accessed in bulk as CSV together with a simple JSON schema.

Versioned & Packaged

All data is in data packages and is versioned using git so all changes are visible and data can be collaboratively maintained.


We need help suggesting, preparing and maintaining a set of "core" datasets as Data Packages. Note that:

  • We package data rather than create it – our focus is to take source data and ensure it is of high quality and in a standard form
  • We preserve a clean separation between the data source, the data package and this registry – for example, data packages are stored in git repos hosted separately (preferably github)

Shortlisted Datasets

The list of datasets shortlisted for "core" is kept as a list of github issues:

Official list of shortlisted "core" datasets

Many of the shortlisted datasets need a "Packager" – someone to help tidy up the data and turn it into a Data Package (see instructions below). If you are interested in helping prep a dataset just comment on the issue and we'll assign it to you.

Suggest a Dataset to Shortlist

To propose a dataset for addition you open an issue in the Registry with the details of the proposed dataset.

Preparing and Submitting a Dataset

The key steps are:

Preparing a Dataset

All datasets MUST be provided in source form as "data packages" and if tabular SHOULD be in Tabular Data Package. We also recommend storing in a git repo on GitHub.

Read the Publishing Data Packages guide to find out more.

Assessment Criteria

For dataset to be designated as "core" it should meet the following criteria:

  • Quality - the dataset must be well structured
  • Relevance and importance - the focus at present is on indicators and reference data
  • Ongoing support - it should have a maintainer
  • Openness - data should be open data and openly licensed in accordance with the Open Definition