Accessing and plotting World Bank data with R
25 Sept 2011
15:41
Google Public Data Explorer
,
googleVis
,
Motion Chart
,
R
,
RJSONIO
,
Tutorials
,
World Bank
7 comments
Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool. You can have the world's data in your pocket. In this post I will show you how we can access data from the World Bank in R. As an example we create a motion chart, in the Hans Rosling style, as you find it on the Google Public Data Explorer site, which also uses data from the World Bank. Doing this, should give us the confidence that we understand the World Bank's interface. You can find this example as demo
WorldBank
as part of the googleVis package from version 0.2.10 onwards.So let's try to replicate the initial plot of the Google Public Data Explorer, which shows fertility rate against life expectancy for each country from 1960 to today, whereby the countries are represented as bubbles, with the size reflecting the population and the colour the region.
Duncan Temple Lang provides us with examples for accessing the World Bank's data using his RJSONIO and RCurl packages. The World Bank data is available via their API either as XML or JSON. We will use JSON as it is straightforward to read the JSON data set into R and to transform it into a data frame with the
fromJSON
function of the RJSONIO package. In order to query the data base we have to know which indicator variable we want and what its key is. Thankfully, the World Bank provides us with a page which lists all indicator variables. Clicking on any of those reveals the indicator key in the URL. For our example we get the following mappings:Indicator | Key |
fertility rate | SP.DYN.TFRT.IN |
life expectancy | SP.DYN.LE00.IN |
population | SP.POP.TOTL |
GDP per capita (current US$) | NY.GDP.PCAP.CD |
That's about it. From Duncan we have learned how to create the URL string to query the data base, and how to transform the query result from JSON into a data frame. The rest is re-arranging the data and combining the various data sets to get the final table. We display it via a motion chart using the
gvisMotionChart
function of the googleVis package. You find the detailed R code below. ## This demo shows how country level data can be accessed from ## the World Bank via their API and displayed with a Motion Chart. ## Inspired by Google's Public Data Explorer, see ## http://www.google.com/publicdata/home ## ## For the World Bank Data terms of use see: ## http://data.worldbank.org/summary-terms-of-use ## ## To run this demo an internet connection and Flash are required. ## This demo is part of the googleVis R package. ## ## Markus Gesmann, 24 September 2011 ## Distributed under GPL 2 or later getWorldBankData <- function(id='SP.POP.TOTL', date='1960:2010', value="value", per.page=12000){ require(RJSONIO) url <- paste("http://api.worldbank.org/countries/all/indicators/", id, "?date=", date, "&format=json&per_page=", per.page, sep="") wbData <- fromJSON(url)[[2]] wbData = data.frame( year = as.numeric(sapply(wbData, "[[", "date")), value = as.numeric(sapply(wbData, function(x) ifelse(is.null(x[["value"]]),NA, x[["value"]]))), country.name = sapply(wbData, function(x) x[["country"]]['value']), country.id = sapply(wbData, function(x) x[["country"]]['id']) ) names(wbData)[2] <- value return(wbData) } getWorldBankCountries <- function(){ require(RJSONIO) wbCountries <- fromJSON("http://api.worldbank.org/countries?per_page=12000&format=json") wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist))) wbCountries$longitude <- as.numeric(wbCountries$longitude) wbCountries$latitude <- as.numeric(wbCountries$latitude) levels(wbCountries$region.value) <- gsub(" \\(all income levels\\)", "", levels(wbCountries$region.value)) return(wbCountries) } ## Create a string 1960:this year, e.g. 1960:2011 years <- paste("1960:", format(Sys.Date(), "%Y"), sep="") ## Fertility rate fertility.rate <- getWorldBankData(id='SP.DYN.TFRT.IN', date=years, value="fertility.rate") ## Life Expectancy life.exp <- getWorldBankData(id='SP.DYN.LE00.IN', date=years, value="life.expectancy") ## Population population <- getWorldBankData(id='SP.POP.TOTL', date=years, value="population") ## GDP per capita (current US$) GDP.per.capita <- getWorldBankData(id='NY.GDP.PCAP.CD', date=years, value="GDP.per.capita.Current.USD") ## Merge data sets wbData <- merge(life.exp, fertility.rate) wbData <- merge(wbData, population) wbData <- merge(wbData, GDP.per.capita) ## Get country mappings wbCountries <- getWorldBankCountries() ## Add regional information wbData <- merge(wbData, wbCountries[c("iso2Code", "region.value", "incomeLevel.value")], by.x="country.id", by.y="iso2Code") ## Filter out the aggregates and country id column subData <- subset(wbData, !region.value %in% "Aggregates" , select= -country.id) ## Create a motion chart M <- gvisMotionChart(subData, idvar="country.name", timevar="year", options=list(width=700, height=600)) ## Display the chart in your browser plot(M)
Created by Pretty R at inside-R.org
Addition: We could simplify the code by using the WDI package by Vincent Arel-Bundock, as Diego points out in his comment below.
7 comments :
Nice post. You can also use the WDI package to access the World Bank's World Development Indicators
How long does this script take to run on your sys?
The entire script, including the data download, took on an 1.6GHz Intel Core i5:
user system elapsed
22.468 0.656 125.799
Liked the plots and plotting. Good data visualization.
Great post!
I'd need to have suggestion about how apply that to a json data like this:
http://apistat.istat.it/?q=getdatajson&dataset=DCIS_POPSTRBIL&dim=1,0,0,0&lang=1&tr=&te=
May you please help me?
I'd appreciate it very very much
Thank you
Superb post, mage... the animation (stepping through the years) was awesome, and not just because it conformed to my prejudices by showing that as life expectancy (and GDP per capita) grows over time, fertility falls as a natural (and welcome) concomitant.
You people have actually provided the best blogs that are easy to understand for the folks.home insurance
Post a Comment