Showing posts with label Open Data. Show all posts
Showing posts with label Open Data. Show all posts

German train monitor provides access to train delay data

The German newspaper Süddeutsche Zeitung (SZ) worked together with OpenDataCity to create an online train monitor of the German network: Zugmonitor. This is another great example of the new form of data journalism.

The project provides access to data of train delays collected over 150 days between 2 October 2011 and 1 March 2012 and allows you to analyse the delays in more detail.

Here is an example showing the delays by station.

This SZ article (in German) gives you an overview of the data and how to access it. I believe the most convient method to query the data is to use the Google Fusion tables. It allows you to import the data into R with the read.csv function. The filename to use is an url mixed with a little bit of SQL syntax.

Here is an example extracting the station data from the map above (Fusion table 3166152):

The other sources can be accessed in the same way:

DelayFusion table ID
by station3166152
between stations (all trains) 3166064
between stations (ICE tains only) 3166328
by country 3166042
by cause 3165200
by daytime 3164289
by train type 3165124

I am curious what people will make of the data. Apparently more data will be made available in the future. I will keep an eye the project page.


Data is the new gold

We need more data journalism. How else will we find the nuggets of data and information worth reading?

Life should become easier for data journalists, as the Guardian, one of the data journalism pioneers, points out in this article about the new open data initiative of the European Union (EU). The aims of the EU's open data strategy are bold. Data is seen as the new gold of the digital age. The EU is estimating that public data is already generating economic value of €32bn each year, with growth potential to €70bn, if more data will be made available. Here is the link to the press statement, which I highly recommend reading:

EUROPA - Press Releases - Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda, Data is the new gold, Opening Remarks, Press Conference on Open Data Strategy Brussels, 12th December 2011



I am particularly impressed that the EU even aims to harmonise the way data will be published by the various bodies. We know that working with data, open or proprietary, often means spending a lot of time on cleaning, reshaping and transforming it, in order to join it with other sources and to make sense out of it.

Data standards would really help in this respect. And the EU is pushing this as well. I can observe this in the insurance industry already, where new European regulatory requirements (Solvency II) force companies to increase their data management capabilities. This is often a huge investment and has to be seen as a long term project.

Although the press statement doesn't mention anything about open source software projects, I think that they are essential for unfolding the full potential of open data.

Open source projects like R provide a platform to share new ideas. I'd say that R, but equally other languages as well, provide interfaces between minds and hands. Packages, libraries, etc. make it possible to spread ideas and knowledge. Having access to scientific papers is great but being able to test the ideas in practice accelerates the time it takes to embed new developments from academia into the business world.