Top 10 tips to get started with R

11 comments
  • Be motivated. R has a steep learning curve. Find a problem you can't solve otherwise. E.g. plotting multivariate data, a statistical analysis for which an R function exists already.
  • Download and install R. Get to know the R console. Learn how to install additional packages, how to access the history, how to use auto completion and open the help system. Review the R Installation and Administration manual and check out the free books section on CRAN.
  • Get familiar with the R help files. They can appear cryptic at the start, but there is a structure to them. Read and re-read a couple of help files again and again. Look out for the input and output sections, execute the examples, run the demos, e.g. demo(graphics). Subscribe to R-help and read questions and answers, check out stackoverflow, follow blogs. Search with Rseek.org.
  • Learn how to get your data into R. The easiest way is usually via a CSV-file (CSV=comma separated values), using read.csv. Look into XLConnect, if you have to deal with spreadsheet files. Move on to write queries against data bases, e.g. using RODBC. Skim through the R Data Import/Export manual.
  • Try to understand the different data types in R and how to modify them. What are the differences between a matrix and a data frame? What is a factor? What is a list? Think about the different use cases. Review the Introduction to R manual.
  • Do charts! Lots of charts. They are rewarding and keep you motivated. Be inspired by the R Graph Gallery. Check out the following packages: lattice, plotrix, ggplot2, deducer, googleVis.
  • Learn how you can modify and reshape data in R and apply functions on subsets using by, apply, lapply, ave, reshape, sweep, with, within, etc. Set aside a weekend to think about these functions.
  • Write your R code into files instead of typing it all into the R console. Use an integrated development environment (IDE), e.g. ESS Emacs, RStudio, StatET Eclipse.
  • Understand the concept of functions. Write a function, which gives "Hello World" back. Modify it, so it has an input argument NAME and it prints "Hello NAME". Review the code of existing R functions. Copy from existing code.
  • Document your code! Start your code by explaining what you want to achieve and only code that much, then write down the next step in plain English and code again. How will you know that your code does what you want it to do? Testing can help. Think about your code style and how you will be versioning your files.

Bonus tip

11 comments :

PirateGrunt said...

Great post! After more than a year of intense effort, I often still feel like a novice. It's good to be reminded of the fundamentals.

aeon said...

Nice post.

I wonder who came up first with "R has a steep learning curve." Should run a search at r-help, maybe @devl.
I also wonder what's to be meant by this. This is backed by my own experience, and, to keep it peer-reviewed, that of my lab.

Actually, I don't even think it should be described as a curve. It starts as a curve, than asymptotically approaches a certain level, and then abruptly becomes another curve, leveling off. For example, you start with for-loops. And learn a lot. Then, you use apply sometimes. Then, you might often go for plyr's functions. Each step has a steep learning curve with a lot of new stuff to grasp, and then levels off quite rapidly.



Or, you start with some build-in functions, and plotting. Then, you start tweak graphic parameters and manipulating function codes. Then, you start writing your own functions.

I have to admit, I had relapses. And there are levels I did not master, yet. S3 and S4 classes, and namespaces, are still somewhat mysterious to me. Some stuff does foo:bar!1!!, and I go: SNAFU, tweaktweaktweak. (The R-inferno is very helpful, but to dense to be a goodnight read while still learning-by-doing.)

aeon said...

Oh, drat: forgot to close the html-tag. Sorry.

Tal Galili said...

Nice list :)

A tiny self promotion (for future lists), a tiny package I wrote for updating R on Windows:
http://www.r-statistics.com/2013/04/updating-r-on-windows-through-a-menu-bar-installr-0-9-released-on-cran/



Yours,
Tal

Markus Gesmann said...

I share your view about the learning curve and I am grateful for the posts on R-help, R-devel or R-bloggers, which helped me to get to the next level. Often small snippets of R code stimulated my curiosity and gave me the motivation to re-read the help pages and experiment again.

M. Edward (Ed) Borasky said...

Bonus tip number two: RStudio rules!

M. Edward (Ed) Borasky said...

If R is your first language or you don't have a background in scientific and statistical computing, the learning curve is *extremely* steep. Most of the newer languages, especially Python and Ruby, are a *lot* easier for a beginner than R.

Stephanie M. Davis said...

Good suggestions all!

Stephanie M. Davis said...

I love RStudio; it helps me navigate much more easily than is otherwise possible.

Stephanie M. Davis said...

I'm a reasonably intelligent person but with no programming experience. I can usually pick things up quickly as I go. R is a different animal. I washed out of the Data Analysis course at Coursera. However, I did learn how to use RStudio and intend to retrace my steps until I gain mastery. Thanks for the tips!

RSTata said...

Unfortunately, you are so right. I am proficient in Stata and SPSS syntax and I tried R for a couple of times. Although I'd love to use a open source statistical analysis software, R's learning curve always brought be (quite quickly) back to the other programmes.



For example in Stata, I am barely using the mouse-clicky interface, but the Stata syntax itself is much easier than the R syntax. I always dreamed of someone programming a Stata-syntax command line for R....

Post a Comment