mages' blog

Notes from the 5th R in Insurance Conference

The fifth R in Insurance conference took place at Ecole Nationale de la Statistique et de l’Administration Economique, Paris on 8 June 2017.


This one-day conference focused once more on the wide range of applications of R in insurance, actuarial science and beyond. The conference programme covered topics including reserving, pricing, loss modelling, the use of R in a production environment and also new statistical methods such as big data analysis.


In the first plenary session, Julie Seguela (Covea) spoke about the textual analysis of expert reports to increase knowledge of technological risks. Text mining techniques and some helpful visualization packages were used on expert reports detailing circumstances, causes and consequences of these accidents.

The plenary talk was followed by two morning sessions. The first session focused on big data analytics, emphasizing new usage in the insurance industry. The second session consisted of a series of lightning talks about R packages or R modelling. The afternoon was split into general and life insurance, including talks on reserving or vine copulas.


In the closing plenary talk, Katrien Antonio (Professor of Actuarial Science at KU Leuven, Belgium) presented recent developments and challenges in non-life reserving. Her talk was illustrated on a large European dataset of liability claims (from private individuals) with monthly exposures using R.

All conference presentations are available on the conference website.

Finally, we are grateful to our sponsors RStudio, Verisk Insurance Solutions, Barnett Waddingham, Mirai Solutions, Milliman. This conference would not have been possible without their generous
support.

Next year we are back at Cass Business School London, 16 July 2018.

R in Insurance 2017 Programme online

The programme for the 2017 R in Insurance conference in Paris has been published. Talks will discuss new ideas and research with the applications in life and general insurance, from network analysis, reserving, pricing to catastrophe modelling, followed by a conference dinner at the Musée d'Orsay. Registration is open until 22 May.


R in Insurance 2017

The fifth conference on R in Insurance will be held on 8 June 2017 at ENSAE. ENSAE is the Paris Graduate School for Economics, Statistics and Finance.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in Insurance.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered are:

  • the use of R in a production environment
  • life insurance
  • non-life insurance
All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

Programs of previous editions are available online: 2013, 2014, 2015 and 2016. To learn more about past events, please visit the associated web page.

Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the UK Actuarial Profession

Registrations

Registrations on the conference website are opened and will be finalized in 2017 when participants will pay the registration fees. The registration's fee will be:

  • Professional: 250 euros with the conference's dinner (150 euros without dinner)
  • Academic: 100 euros with the conference's dinner (20 euros without dinner).
The gala dinner will take place in Musée d'Orsay.

Keynote speakers

We are pleased to announce that our keynote speakers are:

Venue

The conference will take place at ENSAE, 3 Avenue Pierre Larousse, 92240 Malakoff close to Paris.

Committees

Conference committee:

You can reach the conference committe via rininsurance17@sciencesconf.org.

The scientific committee consists of:

Sponsors

The organisers gratefully acknowledge the following sponsors

Institutional sponsors:

Notes from the Kölner R meeting, 14 October 2016

Last Friday the Cologne R user group came together for two talks and a quiz at Eye/o, the company behind Adblock Plus, in Köln-Ehrenfeld. Eye/o were a great host, offering nibbles and drinks to warm up the event and pizza at the end.

Cologne R user meeting at Eye/o
The first talk was given by Jiddu Alexander, a physicist turned freelance data scientist. Jiddu gave an introduction into the tidyverse. He presented the concept of tidy data, and how the tidyverse bundle can be used to manage multiple models. Furthermore, he explained the concept of learning curves for model selection. Jiddu's slides are available from his web site.

Jiddu Alexander explaining learning curves

Next up was Nils Glück to share his experience on performance profiling. R code often grows from a small idea for a specific task to a longer and longer script as more and more ideas and use cases are added. Occasionally, we end up with a long and poorly documented script that 'does the job' but has become slow. Finding the bottlenecks and addressing them is good short term remedy. Nils showed us how the Rprof function of the utils package can be used to understand the performance profile of R code. Furthermore, the microbenchmark package with a function of the same name can then be used to test new approaches for a code block.

Nils Glück quoting others who are not bothered about performance

To bridge the time for the pizzas to arrive our host Kirill had prepared a little R quiz: Could we guess the output of simple R statements? Well, it is more difficult than you might think. Kirill had a great selection of quirky one-liners, which he had collected over time and borrowed from the fabulous R Inferno book by Pat Burns.

Next Kölner R meeting

The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Eye/o for their support.

Please get in touch, if you would like to present at the next meeting.

Next Kölner R User Meeting: Friday 14 October

Koeln R
The 19th Cologne R user group meeting is scheduled for this Friday, 14 October 2016. We have three talks, followed by networking drinks.

  • Introduction to the tidyverse tools - Jiddu Alexander
  • Performance profiling and improvement in R - Nils Glück
  • Batch processing of R-Scripts with Excel - Klaus Jacobi
Venue: Eyeo GmbH, Lichtstraße 25, 50825 Köln

For further details visit our KölnRUG Meetup site.

Notes from past meetings are available here.

Notes from 4th Bayesian Mixer Meetup

Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madness kindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3 (Peadar Coyle) and Stan (Michael Betancourt).

The agenda had two feature talks by Dominic Steinitz and Volodymyr Kazantsev and a lightning talk by Jon Sedar.

Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystems
Dominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.

Volodymyr Kazantsev: Bayesian Model Averaging
Finding the 'best' model was Volodymyr's challenge. He tried various R packages (BMA, BMS and BAS) for Bayesian model averaging, with various degrees of success.

Jon Sedar: Easier Plate Notation in Python using Daft
Finally, Jon gave a brief overview on Daft, a nifty Python package for creating graphs, or plate notation.

Next meeting

The next Bayesian Mixer Meetup meeting is already scheduled for 21 October. We will be back at Cass Business School, with two talks:

  • Darren Wilkinson: Hierarchical Bayesian Modelling of Growth Curves inc Stochastic Processes
  • Peadar Coyle: Advanced PyMC3

Fitting a distribution in Stan from scratch

Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Group a training programme on Bayesian Inference with Stan for Pharmacometrics in Paris.

Daniel Lee and Michael Betancourt, who run the course over three days, are not only members of Stan's development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stan at the Paris Dataiku User Group last week as well.

Eric Kramer (Dataiku), Daniel Lee, Eric Novik & Michael Betancourt (Stan Group)

I have been playing around with Stan on and off for some time, but as Eric pointed out to me, Stan is not that kind of girl(boy?). Indeed, having spent three days working with Stan has revitalised my relationship. Getting down to the basics has been really helpful and I shall remember, Stan is not drawing samples from a distribution. Instead, it is calculating the joint distribution function (in log space), and evaluating the probability distribution function (in log space).

Thus, here is a little example of fitting a set of random numbers in R to a Normal distribution with Stan. Yet, instead of using the built-in functions for the Normal distribution, I define the log probability function by hand, which I will use in the model block as well, and even generate a random sample, starting with a uniform distribution. However, I do use pre-defined distributions for the priors.

Why do I want to do this? This will be a template for the day when I have to use a distribution, which is not predefined in Stan, e.g. the actuar package has some interesting candidates.

Testing

I start off by generating fake data, a sample of 100 random numbers drawn from a Normal distribution with a mean of 4 and a standard deviation of 2. Note, the sample mean of the 100 figures is 4.2 and not 4.
Histogram of 100 random numbers drawn from N(4,2).
I then use the Stan script to fit the data, i.e. to find the the parameters \(\mu\) and \(\sigma\), assuming that the data was generated by a Gaussian process.

Traceplot of 4 chains, including warm-up phase
Histograms of posterior parameter and predictive samples
Comparison of the emperical distributions
The posterior parameter distributions include both \(\mu\) and \(\sigma\) in the 95% credible interval. The distribution of posterior predictive check (y_ppc) is wider, taking into account the uncertainty of the parameters. The interquartile range and mean of my initial fake data and the sample of the posterior predictive distribution look very similar. That's good, my model generates data, which looks like the original data.

Bayesian Mixer Meetup

Btw, tonight we have the 4th Bayesian Mixer Meetup in London.

Session Info

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12 (Sierra)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] MASS_7.3-45 rstan_2.12.1 StanHeaders_2.12.0 ggplot2_2.1.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      codetools_0.2-14 digest_0.6.10    grid_3.3.1      
 [5] plyr_1.8.4       gtable_0.2.0     stats4_3.3.1     scales_0.4.0    
 [9] labeling_0.3     tools_3.3.1      munsell_0.4.3    inline_0.3.14   
[13] colorspace_1.2-6 gridExtra_2.2.1