Accessing and plotting World Bank data with R

7 comments
Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool. You can have the world's data in your pocket.

In this post I will show you how we can access data from the World Bank in R. As an example we create a motion chart, in the Hans Rosling style, as you find it on the Google Public Data Explorer site, which also uses data from the World Bank. Doing this, should give us the confidence that we understand the World Bank's interface. You can find this example as demo WorldBank as part of the googleVis package from version 0.2.10 onwards.

So let's try to replicate the initial plot of the Google Public Data Explorer, which shows fertility rate against life expectancy for each country from 1960 to today, whereby the countries are represented as bubbles, with the size reflecting the population and the colour the region.

Duncan Temple Lang provides us with examples for accessing the World Bank's data using his RJSONIO and RCurl packages. The World Bank data is available via their API either as XML or JSON. We will use JSON as it is straightforward to read the JSON data set into R and to transform it into a data frame with the fromJSON function of the RJSONIO package. In order to query the data base we have to know which indicator variable we want and what its key is. Thankfully, the World Bank provides us with a page which lists all indicator variables. Clicking on any of those reveals the indicator key in the URL. For our example we get the following mappings:

Indicator Key
fertility rate SP.DYN.TFRT.IN
life expectancy SP.DYN.LE00.IN
population SP.POP.TOTL
GDP per capita (current US$) NY.GDP.PCAP.CD

That's about it. From Duncan we have learned how to create the URL string to query the data base, and how to transform the query result from JSON into a data frame. The rest is re-arranging the data and combining the various data sets to get the final table. We display it via a motion chart using the gvisMotionChart function of the googleVis package. You find the detailed R code below.


## This demo shows how country level data can be accessed from
## the World Bank via their API and displayed with a Motion Chart.
## Inspired by Google's Public Data Explorer, see
## http://www.google.com/publicdata/home
##
## For the World Bank Data terms of use see:
## http://data.worldbank.org/summary-terms-of-use
##
## To run this demo an internet connection and Flash are required.
## This demo is part of the googleVis R package.
##
## Markus Gesmann, 24 September 2011
## Distributed under GPL 2 or later
 
 
getWorldBankData <- function(id='SP.POP.TOTL', date='1960:2010',
                             value="value", per.page=12000){ 
  require(RJSONIO)
  url <- paste("http://api.worldbank.org/countries/all/indicators/", id,
               "?date=", date, "&format=json&per_page=", per.page,
               sep="")
 
  wbData <- fromJSON(url)[[2]]
 
  wbData = data.frame(
    year = as.numeric(sapply(wbData, "[[", "date")), 
    value = as.numeric(sapply(wbData, function(x)
      ifelse(is.null(x[["value"]]),NA, x[["value"]]))),  
    country.name = sapply(wbData, function(x) x[["country"]]['value']),
    country.id = sapply(wbData, function(x) x[["country"]]['id'])    
    )
 
  names(wbData)[2] <- value
 
  return(wbData)
}
 
getWorldBankCountries <- function(){
  require(RJSONIO)
  wbCountries <-
    fromJSON("http://api.worldbank.org/countries?per_page=12000&format=json") 
  wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist)))
  wbCountries$longitude <- as.numeric(wbCountries$longitude)
  wbCountries$latitude <- as.numeric(wbCountries$latitude)
  levels(wbCountries$region.value) <- gsub(" \\(all income levels\\)",
                                           "", levels(wbCountries$region.value))
  return(wbCountries)
}
 
## Create a string 1960:this year, e.g. 1960:2011
years <- paste("1960:", format(Sys.Date(), "%Y"), sep="")
 
## Fertility rate
fertility.rate <- getWorldBankData(id='SP.DYN.TFRT.IN',
                                   date=years, value="fertility.rate")
 
## Life Expectancy
life.exp <- getWorldBankData(id='SP.DYN.LE00.IN',  date=years,
                             value="life.expectancy") 
 
## Population
population <- getWorldBankData(id='SP.POP.TOTL',  date=years,
                               value="population")
 
## GDP per capita (current US$)
GDP.per.capita <- getWorldBankData(id='NY.GDP.PCAP.CD',
                                   date=years,
                                   value="GDP.per.capita.Current.USD") 
 
## Merge data sets
wbData <- merge(life.exp, fertility.rate)
wbData <- merge(wbData, population)
wbData <- merge(wbData, GDP.per.capita)
 
## Get country mappings
wbCountries <- getWorldBankCountries()
 
## Add regional information
wbData <- merge(wbData, wbCountries[c("iso2Code", "region.value", 
                                      "incomeLevel.value")],
                by.x="country.id", by.y="iso2Code")
 
## Filter out the aggregates and country id column
subData <- subset(wbData, !region.value %in% "Aggregates" , select=
                  -country.id) 
 
## Create a motion chart
M <- gvisMotionChart(subData, idvar="country.name", timevar="year",
                     options=list(width=700, height=600))
 
## Display the chart in your browser
plot(M)

Created by Pretty R at inside-R.org

Addition: We could simplify the code by using the WDI package by Vincent Arel-Bundock, as Diego points out in his comment below.

7 comments :

Unknown said...

Nice post. You can also use the WDI package to access the World Bank's World Development Indicators

BOD said...

How long does this script take to run on your sys?

mages said...

The entire script, including the data download, took on an 1.6GHz Intel Core i5:
user system elapsed
22.468 0.656 125.799

Mittinty said...

Liked the plots and plotting. Good data visualization.

duccio said...

Great post!
I'd need to have suggestion about how apply that to a json data like this:
http://apistat.istat.it/?q=getdatajson&dataset=DCIS_POPSTRBIL&dim=1,0,0,0&lang=1&tr=&te=
May you please help me?
I'd appreciate it very very much
Thank you

Kratoklastes said...

Superb post, mage... the animation (stepping through the years) was awesome, and not just because it conformed to my prejudices by showing that as life expectancy (and GDP per capita) grows over time, fertility falls as a natural (and welcome) concomitant.

janiffer suiz said...

You people have actually provided the best blogs that are easy to understand for the folks.home insurance

Post a Comment