Notes from the Kölner R meeting, 26 June 2015

No comments
Last Friday the Cologne R user group came together for the 14th time. For the first time we met at Startplatz, a start-up incubator venue. The venue was excellent, not only did they provide us with a much larger room, but also with table-football and drinks. Many thanks to Kirill for organising all of this!

Photo: Günter Faes
We had two excellent advanced talks. Both were very informative and well presented.

Data Science at the Command Line

Kirill Pomogajko showed us how he uses various command line tools to pre-process log-files for further analysis with R.
Photo: Günter Faes
Imagine you have several servers that generate large data sets with no standard delimiters, like the example below.

The columns appear to be separated by a blank at first glance, but the second column (Military) has strings such as Air Force that include a blank itself. Furthermore, other columns have missing data (Month) and another uses speech-marks (Car). Thus, it's messy and difficult to read into R.

To solve the problem Kirill developed a Makefile that uses tools such as scp, sed and awk to download and clean the server files.

Kirill's tutorial files are available via GitHub.

An Introduction to RStan and the Stan Modelling Language


Paul Viefers gave a great introduction to Stan and RStan, with a focus on explaining the differences to other MCMC packages such as JAGS.

Photo: Günter Faes

Stan is a probabilistic programming language for Bayesian inference. One of the major challenges in Bayesian analysis is that often there is no analytical solution for the posterior distribution. Hence, the posterior distribution is approximated via simulations, such as Gibbs sampling in JAGS. Stan, on the other hand, uses Hamiltonian Monte Carlo (HMC), an algorithm that is more subtle in proposing jumps, using more structure by translation into Hamiltonian mechanics framework.

Paul ended his talk by walking us through the various building blocks of a Stan script, using a hierarchical logistic regression example.

You can access Paul's slides on RPubs.

Drinks and Networking

No Cologne R user group meeting is complete without Kölsch and networking. In the end some of us ended up in a fancy burger place.

Next Kölner R meeting

The next meeting will be scheduled in September. Details will be published on our Meetup site. Thanks again to Revolution Analytics for their sponsorship.

No comments :

Post a Comment

Next Kölner R User Meeting: Friday, 26 June 2015

No comments
Koeln R
The next Cologne R user group meeting is scheduled for this Friday, 6 June 2015 and we have an exciting agenda with two talks followed by networking drinks.

  • Data Science at the Commandline (Kirill Pomogajko)
  • An Introduction to RStan and the Stan Modelling Language (Paul Viefers)
Please note: Our venue changed! We have outgrown the seminar room at the Institute of Sociology and move to Startplatz, a start-up incubator venue: Im Mediapark, 550670 Köln

Drinks and Networking

The event will be followed by drinks (Kölsch!) and networking opportunities.

For further details visit our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their Matrix programme.

No comments :

Post a Comment

How to place titles in lattice plots

No comments
I like the Economist theme in the latticeExtra package. It produces nice looking charts that mimic the design of the weekly newspaper, such as in this example:


For some time I wondered how I could put the title of my lattice plots into the top left corner as well (by default titles are centred). Reviewing the code of the theEconomist.theme function by Felix Andrews reveals the trick. It is the setting of par.main.text:

library(lattice)
my.settings <- list(
  par.main.text = list(font = 2, # make it bold
                       just = "left", 
                       x = grid::unit(5, "mm")))

xyplot(sin(1:100) ~ cos(1:100), 
       par.settings=my.settings,
       main="Hello World", 
       type="l")


Furthermore, I can use the same approach to place a sub-title in the bottom left corner of my chart, e.g. to describe the source of my data:

my.settings <- list(
  par.main.text = list(font = 2, # make it bold
                       just = "left", 
                       x = grid::unit(5, "mm")),
  par.sub.text = list(font = 1, 
                      just = "left", 
                      x = grid::unit(5, "mm"))
  )

xyplot(sin(1:100) ~ cos(1:100), 
       par.settings=my.settings,
       main="Hello World", 
       sub="Source: Nobody knows",
       type="l")


For more information see also the lattice help pages or the lattice book by Deepayan Sarkar: Lattice: Multivariate Data Visualization with R.

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] lattice_0.20-31

loaded via a namespace (and not attached):
[1] tools_3.2.0 grid_3.2.0

No comments :

Post a Comment

Using system and web fonts in R plots

No comments
The forthcoming R Journal has an interesting article on the showtext package by Yixuan Qiu. The package allows me to use system and web fonts directly in R plots, reminding me a little of the approach taken by XeLaTeX. But "unlike other methods to embed fonts into graphics, showtext converts text into raster images or polygons, and then adds them to the plot canvas. This method produces platform-independent image files that do not rely on the fonts that create them." [1]

Here is an example with fonts from my local system:


library(showtext)
png("System-Fonts.png", width=550, height=350);
par(mfrow=c(2,2))
plot(1 ~ 1, main="Lucida Bright", family = "Lucida Bright")
plot(1 ~ 1, main="Courier", family = "Courier")
plot(1 ~ 1, main="Helvetica Neue Light", family = "Helvetica Neue Light") 
plot(1 ~ 1, main="Lucida Handwriting Italic", family = "Lucida Handwriting Italic")
dev.off()
Additionally showtext allows me to use fonts hosted online, e.g. Google web fonts:

font.add.google("Alegreya Sans", "aleg");
font.add.google("Permanent Marker", "marker")
font.add.google("Gruppo", "gruppo")
font.add.google("Lobster", "lobster")
png("Google-Fonts.png", width=550, height=350)
showtext.begin()
par(mfrow=c(2,2))
plot(1 ~ 1, main="Alegreya Sans", family = "aleg")
plot(1 ~ 1, main="Permanent Marker", family = "marker")
plot(1 ~ 1, main="Gruppo", family = "gruppo") 
plot(1 ~ 1, main="Lobster", family = "lobster") 
showtext.end()
dev.off()

For more information read the article and/or visit the project site.

References

[1] Yixuan Qiu. showtext: Using System Fonts in R Graphics. The R Journal, 7(1), 2015.

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] showtext_0.4-2 sysfonts_0.5  

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.6 showtextdb_1.0 jsonlite_0.9.16 bitops_1.0-6

No comments :

Post a Comment

Back from R/Finance in Chicago

No comments
I had a great time at the R/Finance conference in Chicago last Friday/Saturday. Some brief takeaways for me were:

From Emanuel Derman's talk: It is is important to distinguish between theories and models. Theories live in an abstract world and for a given set of axioms they can be proven right. However, models live in the real world, are build on simplifying assumptions and are only useful until experiments/data proves them wrong.

'Pornography is hard to define, but I know it when I see it.' Matt Dowle from h2o had the laughs on his side when he started his talk with this Justice Potter Stewart quote to illustrate the value of his data.table package to its users.

Bryan W. Lewis showed why inverting a matrix is tricky, particularly when it contains entries close to zero and what you can do about it.

Marius Hofert gave a stimulating talk on simsalapar a package for parallel simulations, which I need to study in more detail.

Following a brief conversation with Dirk on drat I finally got the punch line of the package, but not so much the joke on drat as a fairly mild expression of anger or annoyance. I had never heard the expression in the UK. Perhaps drat is better explained as Dirk's R Archive Template?

The audience seemed to have appreciated my talk on Communicating Risk. My chart of visualising profitability using a Whale Chart appeared to have resonated with a few.

Furthermore, I learned that the weather in Chicago is even more unstable than in London. After an amazing conference dinner at the Trump Tower, spending most of the time outside and admiring the sunset, we experienced a very cold and rainy Saturday. But then again, there is always time for a Jazz club and a drink. Talking about drinks, thanks to Q Ethan McCallum I had true American breakfast experience, including bottomless coffee.

Yet, the last word should go to ShabbyChef, who took a photo of a slide during Louis Marascio's keynote and tweeted:


Amen.

No comments :

Post a Comment