High-level overview of the components. A detailed overview is below.
Lightweight, RFC-style proposals and patterns focused on "Data Package" and associated formats.
v1.0b. We have a pretty good working version. To get to 1.0 requires some consolidation but most of work is in sub-profiles e.g. JSON Table Schema
v1.0b. Just about to add primary key etc
Tabular Profile for Data Package (tabular data)
v1.0b. Only issue is that JSON Table Schema still being updated
Important for validation and checking implementations against spec
A profile / mini-spec for geo data specifically (similar to tabular data package spec)
Have several examples done
e.g. interactive doc for key fields etc
Priority is lower because we also have stuff under outreach ...
Tooling and Integration
Tooling and integration to make it easy to publish and use data (packages). Divided into:
- Core Tools - libraries and specific infrastructure such as a registry
- Integration - making it easy to share and use "packaged" data from standard tools
Library that supports:
- create (init) data packages
- view data packages
JS chosen for core library as potential to use both in browser and in classic code environment
Create data packages
Infer JSON Table Schema from a CSV
This is needed to do data package init / create effectively
Command line tool to
- [ ] create (init) data packages
- [ ] install
- [ ] publish
- [ ] view data packages
Online tool that supports (API and human user interface):
- create (init) data packages (including tabular packages with support for interactive specifiation of field types etc)
Note should ensure API accessiblity from the browser (CORS). This item has sub-items (see next items)
Online validation app for data packages (plus integration into dpm)
Validate data in the data package against the schema in the tabular data package
Online app for validating tabular data
Build on basic library to provide full integration (publish, consume etc)
Publish and consume from Excel (via WebQueries consume in simplest sense is quite straightforward)
Service or deployable software app which takes a Tabular Data Package and automatically generates a web service with a nice API
Can do this via other items e.g. via CKAN DataStore
Data Catalog in a single file JS app
- List of data packages
- datacatalog.js - see existing code Git and Github for Data - draft post
- host on s3 / file system / dropbox etc or similar ...
Support for Dat and general work on syncing APIs
Documentation and outreach to engage, evangelize and build adoption around the concepts, standards and tooling
Reasonably complete ...
Project IRC channel. Located at #hyperdata
e.g. Example - House prices regressed on long interest rate and GDP (and population)
- [x] House price data - http://data.okfn.org/data/house-prices-uk
- [x] UK Long interest rate - http://data.okfn.org/data/bond-yields-uk-10y
- [x] UK GDP - http://data.okfn.org/data/gdp-uk
Existing: example merging 2 datasets to get deflated data. http://explorer.okfnlabs.org/#rgrp/e3e0b0f18dfe151f9f7e
High priority because this can drive everything else.
Related, Additional Activities
There are a variety of additional related activities we would like to see undertaken but which are not part of the primary roadmap. We list some of these here.
How to use github to store data packages
[Placeholder] Convert data between different formats
Tabular Stats tool
Given an Tabular Data Package compute stats for each tabular resource (sum, avg etc)
Data visualizer from data packages online - "Data Explorer"
Pull together different data packages and visualize online