Tuesday, 2 September 2014

Zoom, zoom, googleVis

The Google Charts API is quite powerful and via googleVis you can access it from R. Here is an example that demonstrates how you can zoom into your chart.


In the example below I set the maximum zoom level to 5% of the chart. Drag and pan with a left mouse button to zoom in; use a right mouse click to zoom out again. The functionality is available in other core charts as well, such as line, column and bar charts. For more configuration options of the explorer settings visit the Google documentation.

R code

Tuesday, 26 August 2014

ChainLadder 0.1.8 released

Over the weekend we released version 0.1.8 of the ChainLadder package for claims reserving on CRAN.

What is claims reserving?

The insurance industry, unlike other industries, does not sell products as such but promises. An insurance policy is a promise by the insurer to the policyholder to pay for future claims for an upfront received premium.

As a result insurers don't know the upfront cost for their service, but rely on historical data analysis and judgement to predict a sustainable price for their offering. In General Insurance (or Non-Life Insurance, e.g. motor, property and casualty insurance) most policies run for a period of 12 months. However, the claims payment process can take years or even decades. Therefore often not even the delivery date of their product is known to insurers. The money set aside for those future claims payments are called reserves.


Over the years several methods and models have been developed to estimate both the level and variability of reserves for insurance claims, see [1] or [2] for an overview.

In practice the Mack chain-ladder and bootstrap chain-ladder models are used by many actuaries along with stress testing / scenario analysis and expert judgement to estimate ranges of reasonable outcomes, see the surveys of UK actuaries in 2002 [3], and across the Lloyd's market in 2012 [4].

The ChainLadder package provides various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance. You can get a very brief overview on the package and reserving from my R in Finance lightning talk:



The package vignette [5] gives more details about the various models and methods implemented.

More context and theory is given in the chapter Claims reserving and IBNR of [6], including the log-linear model of [7] and [8] I discussed earlier on my blog.

Claims reserving is an active field of research as can be seen by the programme of the R in Insurance conference.

News

Version 0.1.8 fixes:
  • BootChainLadder produced warnings for triangles that had static developments when the argument process.distr was set to "od.pois"
  • as.triangle.data.frame didn't work for a data.frame with less than three rows
  • Arguments xlab and ylab were not passed through in plot.triangle when lattice=TRUE

Tuesday, 19 August 2014

googleVis 0.5.5 released

Earlier this week we released googleVis 0.5.5 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R. This is mainly a maintenance release, updating documentation and minor issues.

Screen shot of some of the Google Charts

New to googleVis? Review the examples of all googleVis charts on CRAN.

Perhaps the best known example of the Google Chart API is the motion chart, popularised by Hans Rosling in his 2006 TED talk.

Tuesday, 12 August 2014

GrapheR: A GUI for base graphics in R

How did I miss the GrapheR package?

The author, Maxime Hervé, published an article about the package [1] in the same issue of the R Journal as we did on googleVis. Yet, it took me a package update notification on CRANbeeries to look into GrapheR in more detail - 3 years later! And what a wonderful gem GrapheR is.

The package provides a graphical user interface for creating base charts in R. It is ideal for beginners in R, as the user interface is very clear and the code is written along side into a text file, allowing users to recreate the charts directly in the console.

Adding and changing legends? Messing around with the plotting window settings? It is much easier/quicker with this GUI than reading the help file and trying to understand the various parameters.

Here is a little example using the iris data set.
library(GrapheR)
data(iris)
run.GrapheR()
This will bring up a window that helps me to create the chart and tweak the various parameters.


Once I am happy with my configuration I hit DRAW and R will create the chart for me.


Finally, I find the underlying R code in a file created by GrapheR. For more details read also the package vignette, which is available in English, French and German!

Tuesday, 5 August 2014

Thanks to R Markdown: Perhaps Word is an option after all?

In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to include mathematical notations and R output into Word.


I have been using R Markdown for a while now and have grown very fond of it. Although I am quite happy with PDF and HTML output for basic reports and to switch to Sweave/LaTeX for more complex documents, I was pleasantly surprised to learn that the new version of RStudio can produce MS Word files directly from R Markdown as well; thanks to the power of pandoc. Perhaps Word is an option after all?

Tuesday, 29 July 2014

Hit and run. Think Bayes!

At the R in Insurance conference Arthur Charpentier gave a great keynote talk on Bayesian modelling in R. Bayes' theorem on conditional probabilities is strikingly simple, yet incredibly thought provoking. Here is an example from Daniel Kahneman to test your intuition. But first I have to start with Bayes' theorem.

Bayes' theorem

Bayes' theorem states that given two events \(D\) and \(H\), the probability of \(D\) and \(H\) happening at the same time is the same as the probability of \(D\) occurring, given \(H\), weighted by the probability that \(H\) occurs; or the other way round. As a formula it can be written as:
\[
P(H \cap D) = P(H|D) \, P(D) = P(D|H) \, P(H)
\]
Or if I rearrange it:
\[
P(H|D) = \dfrac{P(D|H) \, P(H)}{P(D)}
\]
Imagine \(H\) is short for hypothesis and \(D\) is short for data, or evidence. Then Bayes' theorem states that the probability of a hypothesis given data is the same as the likelihood that we observe the data given the hypothesis, weighted by the prior belief of the hypothesis, normalised by the probability that we observe the data regardless of the hypothesis.

The tricky bit in real life is often to figure out what the hypothesis and data are.

Hit and run accident

This example is taken from Daniel Kahneman's book Thinking, fast and slow [1].
A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colours 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?

What is here the data and what is here the hypothesis? Intuitively you may think that the proportion of Blue and Green cabs is the data at hand and the witness accusation that a Blue cab was involved in the accident is the hypothesis. However, after some thought I found the following assignment much more helpful, as then \(P(H|D)\) matches the above question:

\(H =\) Accident caused by Blue cab. \(D =\) Witness said the cab was Blue.

With this it is straightforward to get the probabilities of \(P(H)=15\%\) and \(P(D|H)=80\%\). But what is \(P(D)\)? Well, when would the witness say that the cab was Blue? Either, when the cab was Blue and so the witness is right, or when the cab was actually Green and the witness is incorrect. Thus, following the law of total probability:
$$\begin{align}
P(D) & = P(D|H) P(H) + P(D | \bar{H}) P(\bar{H})\\
& = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.29
\end{align}$$Therefore I get \(P(H|D)=41\%\). Thus, even if the witness states that the cab involved in the accident was Blue, the probability of this being true is only \(41\%\).

An alternative way to think about this problem is via a Bayesian Network. The colour of the cab will influence the statement of the witness. In R I can specify such a network using the gRain package [2], which I discussed in an earlier post. Here I provide the distribution of the cabs and the conditional distribution of the witness as an input. After I compile the network, I can again read off the probabilities that a Blue cab was involved, when the witness said so.

R code

Tuesday, 22 July 2014

Notes from the 2nd R in Insurance Conference

The 2nd R in Insurance conference took place last Monday, 14 July, at Cass Business School London.

This one-day conference focused once more on applications in insurance and actuarial science that use R. Topics covered included reserving, pricing, loss modelling, the use of R in a production environment and more.

In the first plenary session, Montserrat Guillen (Riskcenter, University of Barcelona) and Leo Guelman (Royal Bank of Canada, RBC Insurance) spoke about the rise of uplift models. These predictive models are used for improved targeting of policyholders by marketing campaigns, through the use of experimental data. The presenters illustrated the use of their uplift package (available on CRAN), which they have developed for such applications.


Thereafter, the programme consisted of a combination of contributed presentations and lightning talks, as well as a panel discusson on R at the interface of practitioner / academic interraction. The panel, drawn from academia and practice, discussed the efforts made in bridging through the use of R cultural and communication divides, as well as the challenges of developing collaborative business models that respond to market needs and the incentives of academic researchers.


In the closing plenary, Arthur Charpentier (Professor of Actuarial Science at UQAM, Canada) gave a non-Bayesian's account of Bayesian modelling in R. While many are sympathetic to the Bayesian paradigm, it is easy access to computational tools that makes its wider application a realistic prospect. The presenter demonstrated how Bayesian methods can be used to offer alternative analyses of standard actuarial problems.


The audience of the conference included both practitioners (70%) and academics (30%) who are active or interested in the applications of R in Insurance. It was a truly international event with speakers and delegates from many different countries, including USA, Canada, Belgium, Netherlands, Switzerland, Germany, Ireland, Argentina, France, Spain and of course the UK. The coffee breaks and conference dinner at Ironmongers Hall offered great networking opportunities.


All conference presentations are available on request.

Finally, we are grateful to our sponsors Mango Solutions, CYBAEA, PwC and RStudio. This conference would not have been possible without their generous support.

R in Insurance 2015


We are delighted to announce next year's event already. Following two years in London at Cass Business School, the conference will travel across the Channel to Amsterdam, 29 June 2015.


We are looking forward to seeing you there. Further details will be published on www.rininsurance.com.