Thursday, 26 April 2012

Installing R packages without admin rights on MS Windows

Photo: Markus Gesmann
Is there a life outside the office?
Photo: Markus Gesmann

It is not unusual that you will not have admin rights in an IT controlled office environment. But then again the limitations set by the IT department can spark of some creativity. And I have to admit that I enjoy this kind of troubleshooting.

The other day I ended up in front of a Windows PC with R installed, but a locked down "C:\Programme Files" folder. That ment that R couldn't install any packages into the default directory "C:\Programme Files\R\R-X.Y.Z\library" (replace R-X.Y.Z with the version number of R installed).

Never-mind, there is an option for that, the libs argument in the install.packages function. However, I have to use the same argument also in the library statement then as well. Fair enough, yet it is more convenient to set the directory somewhere globally.

First of all I decided that I wanted to install my R packages into C:\Users\MyNAME\R, a folder to which I had read/write access (replace MyNAME, or the whole path with what works for you). The R command .libPaths(c("C:\\Users\\MyNAME\\R", .libPaths())) will make this directory the default directory for any additional packages I may want to install, and as it is put at the front of my search path, library will find them first as well.

The next step is to enable R to execute the above command at start up. For that I created the R file C:\Users\MyNAME\R\Rconfigure_default.R with the following content:

Finally I added a new shortcut to Rgui.exe to my desktop with the target set to:

"C:\Program Files\R\R-2.X.Y\bin\i386\Rgui.exe" R_PROFILE_USER="C:\Users\MyNAME\R\Rconfigure_default.r"

Job done. R will happily install any new packages locally and find them as well when I use library or require. For more information see also the R FAQs.

Friday, 20 April 2012

From the Guardian's data blog: Visualising risk

The Guardian published a nice summary and link collection of an interdisciplinary visualisation workshop hosted by Microsoft dedicated to visualising probability and risk. Check it out here.

OECD better life index
The links I found most interesting were those to the pages of Gregor Aisch and Moritz Stefaner. You may have come across their work in the past, as Moritz worked on the OECD better life index and Gregor contributed to the Where does my money go site.

Saturday, 14 April 2012

Sweeping through data in R

How do you apply one particular row of your data to all other rows?

Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by location, similar to the example in the table below.

Revenue by product and continent
AfricaAmericaAsiaAustraliaEurope
A 40% 30% 50% 40% 40%
B 20% 40% 20% 30% 40%
C 40% 30% 30% 30% 20%
Total 10% 40% 20% 10% 20%

I wanted to understand the revenue split by product and location. Hence, I have to multiply the total split by continent for each product in each column. Or in other words I would like to use the total line and sweep it through my data. Of course there is a function in base R for that. It is called sweep. To my surprise I can't remember that I ever used sweep before. The help page for sweep states that it used to be based on apply, so maybe that's how I would have approached those tasks in the past.

Anyhow, the sweep function requires an array or matrix as an input and not a data frame. Thus let's store the above table in a matrix.

Product <- c("A", "B", "C", "Total")
Continent <- c("Africa", "America", "Asia", "Australia", "Europe")
values <- c(0.4, 0.2, 0.4, 0.1, 0.3, 0.4, 0.3, 0.4, 0.5, 0.2, 
            0.3, 0.2, 0.4, 0.3, 0.3, 0.1, 0.4, 0.4, 0.2, 0.2)

M <- matrix(values, ncol=5, dimnames=list(Product, Continent))

Now I can sweep through my data. The arguments for sweep are the data set itself (in my case the first three rows of my matrix), the margin dimension (here 2, as I want to apply the calculations to the second dimension / columns), the summary statistics to be applied (in my case the totals in row 4) and the function to be applied (in my scenario a simple multiplication "*"):

swept.M <- sweep(M[1:3,], 2, M[4,], "*")

The output is what I desired and can be plotted nicely as a bar plot.

> swept.M
       Continent
Product Africa America Asia Australia Europe
      A   0.04    0.12 0.10      0.04   0.08
      B   0.02    0.16 0.04      0.03   0.08
      C   0.04    0.12 0.06      0.03   0.04

barplot(swept.M*100, legend=dimnames(swept.M)[["Product"]],
       main="Revenue by product and continent",
       ylab="Revenue split %") 

One more example

Another classical example for using the sweep function is of course the case when you have revenue information and would like to calculate the income split by product for each location:

Revenue <- matrix(1:15, ncol=5)
sweep(Revenue, 2, colSums(Revenue), "/")

This is actually the same as prop.table(Revenue, 2), which is short for:

sweep(x, margin, margin.table(x, margin), "/") 

Reading the help file for margin.table shows that this function is the same as apply(x, margin, sum) and colSum is just a faster version of the same statement.

Wednesday, 4 April 2012

Review: Kölner R Meeting 30 March 2012

The first Kölner R user meeting was great fun. About 20 useRs had turned up to exchange their ideas, questions and experience with R. Three talks about R & Excel, ggplot2 & XeLaTeX and Dynamical systems with R & simecol had kicked off the evening, with Kölsch (beer) losing our tongues further.

Thankfully a lot of people had brought along their laptops, as unfortunately we lacked a cable to connect any of the computers to the installed projector. Never-mind, we cuddled up around the notebooks and switched slides on the speakers sign.

Photos: Günter Faes

Similar to LondonR, it was a very informal event. Maybe slightly forced by myself, as I called everyone by his/her first name, which could be considered rude in Germany. But what I had noticed in London, and the same was true also in Cologne, was that people with a very diverse background and of all ages would meet to discuss matters around R, often not working in the same field. So why worry about hierarchies?

Most attendees were not R experts, but users in its pure sense, trying to solve real life problems, and I suppose that makes those meetings so special. R users are often not programmers by trade, but amateurs, who have a keen interest to extract stories and pictures from their data. And for that reason the discussions are often so engaging. Talking to people using R in social science, psychology, biology, pharma, energy, telcos, finance, insurance or actually statistics opens your mind and eyes. You realise that you are not alone, other people are weird as well. They have similar problems and challenges, but may use a different domain language and look at problems from a different angle. And this can be incredibly refreshing!

Anyhow, we agreed to meet again in about three months time. The pub was a great venue to socialise, yet a bit noisy for the talks. Hopefully we can use a room at the nearby university for the presentations next time. Promises were made already. We shall see. Günter was so kind to set up a mailing list to which you can sign up here. I will continue to use this blog to provide updates on the Cologne R user group in the future and set up a public calendar as well.

Talks

Many thanks to the speakers, who dared to give the first talks and had to improvise on the spot without a projector. Please drop me a line if you would like to speak at one of the next events.