Sankey diagrams with googleVis

25 comments
Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign of 1812, which he actually produced in 1869.

Thomas Rahlf: Datendesign mit R

The above example from Thomas Rahlf's book Datendesign mit R shows that Minard's plot can be reproduced with base graphics in R. Aaron Berdanier posted in 2010 the SankeyR function and January Weiner published the river plot package on CRAN that allows users to create static Sankey charts as well.

Interactive Sankey diagram can be generated with rCharts and now also with googleVis (version >= 0.5.0). For my a first example I use UK visitor data from VisitBritain.org. The following diagram visualises the flow of visitors in 2012; where they came from and which parts of the UK they visited. This example illustrates the key concept already. I need a data frame with three columns that explains the flow of data from a source to a target and the strength or weight of the connection.




My next example uses a graph data set that I visualise in the same way again, but here I start to play around with the various parameters of the Google API.




As stated by Google, the Sankey chart may be undergoing substantial revisions in future Google Charts releases.

For more information and installation instructions see the googleVis project site and Google documentation.

Session Info

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets  methods  
[7] base     

other attached packages:
[1] googleVis_0.5.0-4 igraph_0.7.0     

loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.3 

25 comments :

Post a Comment

Reminder: Abstract submission for the 2014 'R in Insurance' conference will close this Friday

No comments
Don't forget, this is the final week you can submit an abstract for the second R in Insurance conference.
For more details see http://www.rininsurance.com and perhaps for inspiration review last year's programme.

No comments :

Post a Comment

Timeline charts with googleVis

8 comments
Last year at the Google I/O conference Mitchell Foley presented new developments of the Google Chart Tools API and one of the new features he mentioned were timeline charts (about 6 min into the talk).



Timeline charts are a great way of visualising different dates/events over time and are now also supported by googleVis from version 0.5.0 onwards (currently only available from GitHub). Here is an example, showing classroom allocation in the afternoon. The exact times and durations are given when you hover over the bars.



I can swap around the bar and row labels to show the rooms by languages:



Here is another example, inspired by Jason Bryer's timeline package, showing the US presidents and UK prime ministers during World War II. For gvisTimeline I have to remove the line breaks in Jason's data.



And finally a more complex example from the Google Chart Tools API documentation showing the terms of the first US presidents with various options set to change the colours and fonts:

8 comments :

Post a Comment

googleVis code development moved to GitHub

8 comments
After nearly 4 years of developing googleVis on Google Code with SVN we decided to move to GitHub. The main reason was that Google stopped the facility of hosting pre-CRAN builds of the package for user testing. The devtools package on the other hand makes it really easy to install packages from source hosted on GitHub. Additionally, we hope that GitHub will make collaboration with others more effective. Thus, bookmark http://github.com/mages/googleVis.

Screen shot of some of the new features in googleVis 0.5.0-1.

There are some exciting new features in the development version of 0.5.0-1 of googleVis, reflecting the enhanced Google Chart Tools API:

New Features

  • New functions gvisSankey, gvisAnnotationChart, gvisHistogram, gvisCalendar and gvisTimeline to support the new Google charts of the same names (without 'gvis').
  • New demo Trendlines showing how trend-lines can be added to Scatter-, Bar-, Column-, and Line Charts.
  • New demo Roles showing how different column roles can be used in core charts to highlight data.
  • New vignettes written in R Markdown showcasing googleVis examples and how the package works with knitr.

Changes

  • The help files of gvis charts no longer show all their options, instead a link to the online Google API documentation is given.
  • All googleVis output will be displayed in your default browser. In previous versions of googleVis output could also be displayed in the preview pane of RStudio. This feature is no longer available with the current version of RStudio, but is likely to be introduced again with the release of RStudio version 0.99 or higher.

I will post about the new features and changes in the coming weeks. Please feel free to test the development version already. Visit our GitHub project page for installation instructions and further details.

For the impatient (you will require R >= 3.0.2):
install.packages(c("devtools","RJSONIO", "knitr", "shiny", "httpuv"))
library(devtools)
install_github("mages/googleVis")

8 comments :

Post a Comment

Review: Kölner R Meeting 26 Feburary 2014

No comments
Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.

Connecting R with databases

Diego de Castillo's talk focused on the use of relational databases, such as PostgreSQL, SQLite and Oracle. For all these databases dedicated R drivers exist on CRAN that can be used in a generic way via the DBI package. This allows for a consistent approach to connect, query and return data to R. A popular alternative on Windows to the DBI framework is the use of the ODBC (Open Database Connectivity) API via RODBC or RJDBC.


R and kdb+

Kim Kuen Tang gave an overview of kdb+, a proprietary database that appears to be popular for time series data. kdb+ comes with its own expressive query language, q. Kim demonstrated how he could analyse large amount of stock market data stored in a kdb+ database using R and q all via sublime.

ArangoDB

Michael Hackstein and Claudius Weinberger introduced us to ArangoDB, a NoSQL (Not only SQL) database. ArangoDB is an open source document database. This means that data is stored as documents, which are similar to JavaScript objects, in so-called "collections". Their slides presented nicely the different concepts outside the traditional relational databases, such as key values stores, document stores and graph data. Claudius mentioned that they had received several requests from users who wanted to connect R to ArangoDB. Although a native driver does not exist for R yet, ArangoDB can be accessed by R using the HTTP-API via the packages bitops, RCurl and RJSONIO.


Next Kölner R meeting

The next meeting is scheduled for 23 May 2014. This will be our 10th meeting, clearly something we need to celebrate!

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

No comments :

Post a Comment