2016

R in Insurance 2017

13 Dec 2016 08:30 Conference , ENSAE , Paris , R , R in Insurance No comments

The fifth conference on R in Insurance will be held on 8 June 2017 at ENSAE. ENSAE is the Paris Graduate School for Economics, Statistics and Finance.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in Insurance.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered are:

the use of R in a production environment
life insurance
non-life insurance

All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

Programs of previous editions are available online: 2013, 2014, 2015 and 2016. To learn more about past events, please visit the associated web page.

Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the UK Actuarial Profession

Registrations

Registrations on the conference website are opened and will be finalized in 2017 when participants will pay the registration fees. The registration's fee will be:

Professional: 250 euros with the conference's dinner (150 euros without dinner)
Academic: 100 euros with the conference's dinner (20 euros without dinner).

The gala dinner will take place in Musée d'Orsay.

Keynote speakers

We are pleased to announce that our keynote speakers are:

Katrien Antonio, KU Leuven
Julie Seguela, Covéa.

Venue

The conference will take place at ENSAE, 3 Avenue Pierre Larousse, 92240 Malakoff close to Paris.

Committees

Conference committee:

You can reach the conference committe via rininsurance17@sciencesconf.org.

The scientific committee consists of:

Notes from the Kölner R meeting, 14 October 2016

21 Oct 2016 08:46 Kölner R Users , KölnR , R , tidyverse No comments

Last Friday the Cologne R user group came together for two talks and a quiz at Eye/o, the company behind Adblock Plus, in Köln-Ehrenfeld. Eye/o were a great host, offering nibbles and drinks to warm up the event and pizza at the end.

Cologne R user meeting at Eye/o

The first talk was given by Jiddu Alexander, a physicist turned freelance data scientist. Jiddu gave an introduction into the tidyverse. He presented the concept of tidy data, and how the tidyverse bundle can be used to manage multiple models. Furthermore, he explained the concept of learning curves for model selection. Jiddu's slides are available from his web site.

Jiddu Alexander explaining learning curves

Next up was Nils Glück to share his experience on performance profiling. R code often grows from a small idea for a specific task to a longer and longer script as more and more ideas and use cases are added. Occasionally, we end up with a long and poorly documented script that 'does the job' but has become slow. Finding the bottlenecks and addressing them is good short term remedy. Nils showed us how the Rprof function of the utils package can be used to understand the performance profile of R code. Furthermore, the microbenchmark package with a function of the same name can then be used to test new approaches for a code block.

Nils Glück quoting others who are not bothered about performance

To bridge the time for the pizzas to arrive our host Kirill had prepared a little R quiz: Could we guess the output of simple R statements? Well, it is more difficult than you might think. Kirill had a great selection of quirky one-liners, which he had collected over time and borrowed from the fabulous R Inferno book by Pat Burns.

Next Kölner R meeting

The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Eye/o for their support.

Please get in touch, if you would like to present at the next meeting.

No comments :

Post a Comment

Next Kölner R User Meeting: Friday 14 October

11 Oct 2016 08:00 Kölner R Users , KölnR , News , R No comments

The 19th Cologne R user group meeting is scheduled for this Friday, 14 October 2016. We have three talks, followed by networking drinks.

Introduction to the tidyverse tools - Jiddu Alexander
Performance profiling and improvement in R - Nils Glück
Batch processing of R-Scripts with Excel - Klaus Jacobi

Venue: Eyeo GmbH, Lichtstraße 25, 50825 Köln

For further details visit our KölnRUG Meetup site.

Notes from past meetings are available here.

No comments :

Post a Comment

Notes from 4th Bayesian Mixer Meetup

4 Oct 2016 07:40 Bayesian , Bayesian Mixer , Daft , Dynamical Systems , Model Averaging , python , R , Stan No comments

Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madness kindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3 (Peadar Coyle) and Stan (Michael Betancourt).

The agenda had two feature talks by Dominic Steinitz and Volodymyr Kazantsev and a lightning talk by Jon Sedar.

Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystems

Dominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.

Volodymyr Kazantsev: Bayesian Model Averaging

Finding the 'best' model was Volodymyr's challenge. He tried various R packages (BMA, BMS and BAS) for Bayesian model averaging, with various degrees of success.

Jon Sedar: Easier Plate Notation in Python using Daft

Finally, Jon gave a brief overview on Daft, a nifty Python package for creating graphs, or plate notation.

Next meeting

The next Bayesian Mixer Meetup meeting is already scheduled for 21 October. We will be back at Cass Business School, with two talks:

Darren Wilkinson: Hierarchical Bayesian Modelling of Growth Curves inc Stochastic Processes
Peadar Coyle: Advanced PyMC3

No comments :

Post a Comment

Fitting a distribution in Stan from scratch

27 Sept 2016 07:45 Actuarial , Bayesian , fit distribution , R , Stan No comments

Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Group a training programme on Bayesian Inference with Stan for Pharmacometrics in Paris.

Daniel Lee and Michael Betancourt, who run the course over three days, are not only members of Stan's development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stan at the Paris Dataiku User Group last week as well.

Eric Kramer (Dataiku), Daniel Lee, Eric Novik & Michael Betancourt (Stan Group)

I have been playing around with Stan on and off for some time, but as Eric pointed out to me, Stan is not that kind of girl(boy?). Indeed, having spent three days working with Stan has revitalised my relationship. Getting down to the basics has been really helpful and I shall remember, Stan is not drawing samples from a distribution. Instead, it is calculating the joint distribution function (in log space), and evaluating the probability distribution function (in log space).

Thus, here is a little example of fitting a set of random numbers in R to a Normal distribution with Stan. Yet, instead of using the built-in functions for the Normal distribution, I define the log probability function by hand, which I will use in the model block as well, and even generate a random sample, starting with a uniform distribution. However, I do use pre-defined distributions for the priors.

Why do I want to do this? This will be a template for the day when I have to use a distribution, which is not predefined in Stan, e.g. the actuar package has some interesting candidates.

Testing

I start off by generating fake data, a sample of 100 random numbers drawn from a Normal distribution with a mean of 4 and a standard deviation of 2. Note, the sample mean of the 100 figures is 4.2 and not 4.

Histogram of 100 random numbers drawn from N(4,2).

I then use the Stan script to fit the data, i.e. to find the the parameters \(\mu\) and \(\sigma\), assuming that the data was generated by a Gaussian process.

Traceplot of 4 chains, including warm-up phase

Histograms of posterior parameter and predictive samples

Comparison of the emperical distributions

The posterior parameter distributions include both \(\mu\) and \(\sigma\) in the 95% credible interval. The distribution of posterior predictive check (y_ppc) is wider, taking into account the uncertainty of the parameters. The interquartile range and mean of my initial fake data and the sample of the posterior predictive distribution look very similar. That's good, my model generates data, which looks like the original data.

Bayesian Mixer Meetup

Btw, tonight we have the 4th Bayesian Mixer Meetup in London.

Session Info

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12 (Sierra)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] MASS_7.3-45 rstan_2.12.1 StanHeaders_2.12.0 ggplot2_2.1.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      codetools_0.2-14 digest_0.6.10    grid_3.3.1      
 [5] plyr_1.8.4       gtable_0.2.0     stats4_3.3.1     scales_0.4.0    
 [9] labeling_0.3     tools_3.3.1      munsell_0.4.3    inline_0.3.14   
[13] colorspace_1.2-6 gridExtra_2.2.1

No comments :

Post a Comment

googleVis 0.6.1 on CRAN

6 Sept 2016 07:15 googleVis , jsonlite , News , R , RJSONIO No comments

We released googleVis version 0.6.1 on CRAN last week. The update fixes issues with setting certain options, following the switch from RJSONIO to jsonlite.

Screen shot of some of the Google Charts

New to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.

To lean more see the examples of googleVis charts on CRAN and read the introduction vignette.

No comments :

Post a Comment

Notes from the 4th R in Insurance Conference

27 Jul 2016 09:35 Cass Business School , Conference , R , R in Insurance No comments

The 4th R in Insurance conference took place at Cass Business School London on 11 July 2016. This one-day conference focused once more on the wide range of applications of R in insurance, actuarial science and beyond. The conference programme covered topics including reserving, pricing, loss modelling, the use of R in a production environment and much more.

The audience of the conference included both practitioners (c.80%) and academics (c.20%) who are active or interested in the applications of R in Insurance. It was a truly international event with speakers and delegates from Europe, Asia and the Americas. The coffee breaks and conference dinner offered great networking opportunities.

Mario Wüthrich, ETH Zürich

In the first plenary session Mario Wüthrich (RiskLab ETH Zurich) spoke about the (new) challenges in actuarial science. While fundamentals of analysing data have not changed over the years, the data and technology available has, and with that new challenges emerged. Yet, as Mario pointed out, insurance is still often concerned with analysing 'little' data, as losses occur rarely. Furthermore, the bigger data sets, often generated by sensors, require careful calibration, monitoring and cleansing. Those new challenges provide opportunities for new research (if data is being made available) and the industry. The R community can provide links between the two. Mario would like to see more and better documentation of R packages, more insurance examples and better handling of big data.

Thereafter, the programme consisted of a combination of contributed presentations and lightning talks, as well as a panel discussion on how analytics is transforming the insurance business. Adrian Cuc (Verisk), Simon Brickman (Beazley), Roland Schmid (Mirai Solutions) and Markus Gesmann (Vario Partners) discussed the efforts made in bridging between data vendors, consultants and insurers, as well as the challenges of developing collaborative business models that respond to market needs.

Dan Murphy, Trinostics

In the closing plenary, Dan Murphy (Trinostics, San Francisco) gave an insight into his experience as an actuary on how to provide persuasive advice for senior management. He uses the three-C's: context, confidence and clarity. Context is about articulating the problem in a language senior management can understand it. Why does the management need to worry about the problem? If you have a solution, then you have to deliver it with conviction, because, most importantly is has to be actionable. Clarity, of your actionable insight, ensures that those actions can be delegated to the relevant team/employee by the management without you in the room.

The slides of the conference are available on request.

Scientific committee and sponsors

The members of the scientific committee were: Katrien Antonio (KU Leuven, UvA), Christophe Dutang (Université du Maine), Markus Gesmann (Vario Partners), Giorgio Spedicato (UnipolSai ) and Andreas Tsanakas (Cass Business School).

Finally, we are grateful to our sponsors Verisk, Mirai Solutions, Applied AI, RStudio, CYBAEA and Oasis, without whom the event wouldn't be possible.

R in Insurance 2017

We are delighted to announce next year’s event already. The conference will travel across the Channel to ENSAE, Paris, 8 June 2017. Further details will be published on www.rininsurance.com.

No comments :

Post a Comment

Notes from the Kölner R meeting, 9 July 2016

13 Jul 2016 15:31 Kölner R Users , KölnR , Microsoft , News , R , Spark No comments

Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server.

Introduction to Apache Spark

Download slides

Dubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using R. Spark is designed for cluster computing, i.e. to distribute jobs across several computers. Not all tasks in R can be split easily across several nodes in a cluster, but if you use functions like by in R, then it is most likely doable. The by function in R splits a data set into several subsets and applies a specific function to each subgroup and collects the results in the end. In the world of Hadoop, this is called MapReduce. Spark has an advanced DAG (directed acyclic graph) execution engine that supports cyclic data flow and in-memory computing. Additionally, Spark has a direct API for R, which makes it relatively ease to write applications with Spark.

Microsoft R Server

Download slides

Since the acquisition of Revolution Analytics in 2015, Microsoft has been busy integrating R into its product offerings. Stefan Cronjaeger gave an overview of how R can be integrated into a production environment. Microsoft R server aims to solve the problem of doing 'big data' analytics with R, which allows to carrying out in-memory and disk-based data analysis. Additional new tools are called ScaleR for big data and parallelized analytics, ConnectR to connect to various other data sources, DistributedR for grid computing. Finally, Stefan showed us how Visual Studio can be used as an R development environment, similar to RStudio.

Next Kölner R meeting

The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Microsoft for their support.

Please get in touch, if you would like to present at the next meeting.

No comments :

Post a Comment

Notes from 3rd and 3.5th Bayesian Mixer Meetup

5 Jul 2016 07:52 Bayesian , PyMC3 , R , Stan No comments

Two Bayesian Mixer meet-ups in a row. Can it get any better?

Our third 'regular' meeting took place at Cass Business School on 24 June. Big thanks to Pietro and Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event.

3rd Bayesian Mixer meet-up

First up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the 'loo' (leave one out) package in R. Luis started with a simple artificial example that aimed to predict the probability of goalkeepers to save a shot on target. Adding a hierarchical structure to the model and treating the variance as a random variable, resulted in a pathological posterior distribution, which makes sampling next to impossible. Instead, fitting different models, with different fixed parameters, allows the user then to compare the models via cross-validation using the 'loo' function. Clever! I need to learn more about this. Luis' slides are available here and the underlying source code on GitHub.

Luis Usier talking about cross-validation in R and Stan

We were lucky to have Robert Cowell talking to us, in what was his final week at Cass. Robert has been very much at the forefront of Bayesian development over the last 30 years. He is one of the co-authors of Probabilistic Networks and Expert Systems. Robert gave an insightful talk on probabilistic models for analysing mixed DNA traces. For illustration purpose, he used a crime case, where a man was killed in a pub, and where blood traces were used to support identifying the murder - turning statistics into a thriller.

Following those two stimulating talks, we had a few networking drinks at the Artillery Arms. But not too many, as the next day continued with another Bayesian event.

3.5th Meetup: PyMC3 summer special

We had a rare opportunity to gather together a few of the core contributors of the PyMC3 package for a talks & hack session. PyMC3 is a leading framework for probabilistic programming entirely based in Python with a 'theano' backend, with support for the NUTS sampler, Variational Inference and lots of useful functionality - an alternative to Stan.

We had two core contributors with us: Chris Fonnesbeck (usually in Nashville, USA) and Thomas Wiecki (online from Düsseldorf, Germany), plus other package contributors.

Chris Fonnesbeck talking about PyMC3

On Saturday morning Chris gave an overview of PyMC3, followed by a detailed talk of Thomas on Bayesian Deep Learning. The afternoon was spent hacking together away on different problems. I was new to PyMC3, so I went through the tutorial on Probabilistic Programming using PyMC3, which Chris had given at a workshop in Oslo.

Many thanks to all who helped to make these events such a success and especially to Chris, Thomas, Luis, Robert, Andreas, Pietro and Jon.

If you have ideas for a future event, then please get in touch and visit our Meetup page.

No comments :

Post a Comment

Early bird registration for R in Insurance closes 30 May

24 May 2016 07:03 Cass Business School , Conference , Programme , R , R in Insurance No comments

Hurry! The early bird registration offer for the 4th R in Insurance conference, 11 July 2016, at Cass Business School closes 30 May.

This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more.

We have a fantastic programme with international speakers and conference dinner at Ironmongers Hall. Keynotes will be given by Mario Wüthrich and Dan Murphy.

The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

No comments :

Post a Comment

R in Insurance 2016 Programme

3 May 2016 07:17 Cass Business School , Conference , Insurance , R , R in Insurance No comments

We are delighted to announce that the programme for the 4th R in Insurance conference at Cass Business School in London, 11 July 2016, have been finalised.

Register by the end of May to get the early bird booking fee.

The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

No comments :

Post a Comment

New R package to access World Bank data

26 Apr 2016 08:00 googleVis , R , wbstats , World Bank No comments

Staying on top of new CRAN packages is quite a challenge nowadays. However, thanks to Dirk's CRANberries service I occasionally spot a new gem, such as wbstats, which appeared on CRAN last week.

Similarly to the WDI package, wbstats offers an interface to the World Bank database.

With the functions of wbstats the World Bank data can be searched and data for several indicators requested. Unlike WDI, the data is returned in a 'long' table with one column for all values and a separate column for the indicators. Additionally, the function wb allows me to specify how many most recent values (mrv) I am interested.

Thus, to recreate the famous Gapminder chart by Hans Rosling, showing the correlation between fertility, i.e. number of children per woman, and life expectancy over time by country and region, I can write (note, a Flash player is required):

If you'd like to learn more about how to create interactive charts with googleVis, then check out the free tutorial on DataCamp.

Session Info

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] googleVis_0.5.10 data.table_1.9.6 wbstats_0.1     

loaded via a namespace (and not attached):
[1] httr_1.1.0        R6_2.1.2          rsconnect_0.4.2.1
[4] tools_3.2.4       curl_0.9.7        RJSONIO_1.3-0    
[7] jsonlite_0.9.19   chron_2.3-47

No comments :

Post a Comment

Notes from 2nd Bayesian Mixer Meetup

19 Apr 2016 21:41 Bayesian , Cass Business School , Networks , News , R , Stan No comments

Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event.

Bayesian Mixer at Cass

First up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components.

The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model. In particular, Bayesian networks can help the investigator in the definition of a conditional dependence/independence structure where a joint multivariate probability distribution is determined. Hence, the use of Bayesian network can lead to a more efficient way of designing experiments.

Davide De March: Bayesian Networks to design optimal experiments

The second act of the night was Mick Cooney, presenting ideas of using growth curves to estimate the ultimate amounts paid in insurance by some cohort of policies.

The talk showed a model for these curves, discussed the implementation in Stan and how posterior predictive checks can be used to assess the output of the model.

Mick Cooney: Bayesian Modelling for Loss Curves in Insurance

Thanks again to everyone who helped to make the event a success, particularly our speakers and Jon Sedar of Applied AI.

We are planning to run another event in mid-June. Please get in touch via our Meetup site with ideas and talk proposals.

No comments :

Post a Comment

R in Insurance: Abstract submission closes end of March

18 Mar 2016 07:36 Cass Business School , Conference , R , R in Insurance No comments

Hurry! The abstract submission deadline for the 4th R in Insurance conference in London, 11 July 2016 is approaching soon.

You have until the 28th of March to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. Please email your abstract of no more than 300 words (in text or pdf format) to rinsuranceconference@gmail.com.

Invited talks will be given by:

Mario V. Wüthrich, RiskLab, Department of Mathematics, ETH Zurich.
Dan Murphy, President, Trinostics LLC.

Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School, London.

Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

For more information about the past events visit www.rininsurance.com.

Gold Sponsors

Silver Sponsors

No comments :

Post a Comment

Notes from the Kölner R meeting, 26 February 2016

1 Mar 2016 07:34 brms , Koelner R User , Kölner R Users , R , RKward No comments

Last Friday the Cologne R user group came together for the 17th time. This time, we were in for a special treatment, with two talks by psychologists!

But, there was nothing to fear, we were in safe hands, and for the first time, we met at the new Microsoft office in Cologne.

Lecture room at Microsoft, Cologne

First up was Meik Michalke from the University of Düsseldorf presenting the RKWard project. RKWard is a graphical user interface and integrated development environment for statistical analysis with R. RKWard is a fully featured and extendable environment for R, available on all platforms. Furthermore, as Meik demonstrated, it is very straightforward to build new plugins for RKWard. These plugins can extend the user interface, which is great if you build tools for people who are less familiar with R, but perhaps more with SPSS. Meik is one of the developers of RKWard and he uses it to run an analysis, develop packages and to teach statistics.

Download slides

Next up was Paul-Christian Bürkner from the University of Münster, presenting an overview of his brms package. The name is short for Bayesian regression models with Stan. Although the package is still less than one-year-old, it is already quite mature, allowing the user to specify regression models in the usual R formula syntax. brms takes those formula calls, writes out the Stan code, compiles and runs the model, and it also provides methods to plot and predict brms models. Hence, it is a great way to get started with Stan and to build more complex Bayesian models.

Download slides

Following the talks, there was still plenty of time for questions and networking. Microsoft provided us with a great venue and enough drinks to keep us going until finally our stomachs asked for food and, dare I say it, Kölsch. As a result some of us ended up in Rheinau, a nice gastropub around the corner.

Next Kölner R meeting

No comments :

Post a Comment

Next Kölner R User Meeting: Friday, 26 Feburary 2016

23 Feb 2016 07:40 Kölner R Users , KölnR , News , R No comments

The 17th Cologne R user group meeting is scheduled for this Friday, 26 February 2016. We have two talks, followed by networking drinks.

Introduction to Bayesian Regression Models using Stan with the brms package - Paul-Christian Bürkner (Uni Münster)
RKWard: A Graphical User Interface and Integrated Development Environment for Statistical Analysis with R - Meik Michalke (Uni Düsseldorf)

Venue: Microsoft Deutschland, Holzmarkt 2a Cologne 50676 DE, Köln

For further details visit our KölnRUG Meetup site. Unfortunately, this event is already fully booked, but please sign up if you would like to come along to future events.

Notes from past meetings are available here.

No comments :

Post a Comment

Bayesian Mixer on Meetup

16 Feb 2016 22:45 Bayesian , News , R , Stan No comments

We had our first successful Bayesian Mixer Meetup last Friday night at the Artillery Arms!

We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes' grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.

The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context. My talk "Experience vs Data" was based on presentations I had given last year at LondonR and the Warsaw R user group.

Jon Sedar followed with a fascinating talk about outlier detection using PyMC3.

Suppose, you have a bunch of data points, most of them centred, but with some further away. How do you decide if they are outliers, or not?

This question sounds very relevant to me in the insurance context as well. I have heard stories of underwriters telling me that certain years or events (meaning costly losses) were freaks, and should be disregarded, or in other words, without those losses the underwriter would have made a huge profit. I am not sure, I buy those arguments, as they undermine the fundamental business proposition of insurance; to pay, when policyholders experience 'freak' events. I am getting on my soap box, which I shouldn't.

We had a good night, very good discussions and some drinks. As a result Jon and I are committed to organise another event.

Jon has already set up a Meetup page, so please register online and get in touch with ideas, venues, talks, etc.

Slides/Files

No comments :

Post a Comment

Using SVG graphics in blog posts

9 Feb 2016 07:30 github , graphics , R , SVG No comments

My traditional work flow for embedding R graphics into a blog post has been via a PNG files that I upload online. However, when I created a 'simple' graphic with only basic curves and triangles for a recent post, I noticed that the PNG output didn't look as crisp as I expected it to be. So, eventually I used a SVG (scalable vector graphic) instead.

Creating a SVG file with R could't be easier; e.g. use the svg() function in the same way as png(). Next, make the file available online and embed it into your page. There are many ways to do this, in the example here I placed the file into a public GitHub repository.

To embed the figure into my page I could use either the traditional <img> tag, or perhaps better the <object> tag. Paul Murrell provides further details on his blog.

With <object> my code looks like this:

<object data="https://rawgithub.com/mages/diesunddas/master/Blog/transitionPlot.svg" type="image/svg+xml" width="400"> </object>

There is a little trick required to display a graphic file hosted on GitHub.

By default, when I look for the raw URL, GitHub will provide an address starting with https://raw.githubusercontent.com/..., which needs to be replaced with https://rawgithub.com/....

Ok, let's look at the output. As a nice example plot I use a transitionPlot by Max Gordon, something I wanted to do for a long time.

SVG output

PNG output

Conclusions

The SVG output is nice and crisp! Zoom in and the quality will not change. The PNG graphic on the other hand appears a little blurry on my screen and even the colours look washed out. Of course, the PNG output could be improved by fiddling with the parameters. But, after all it is a raster graphic.

Yet, I don't think that SVG is always a good answer. The file size of an SVG file can grow quite quickly, if there are many points to be plotted. As an example check the difference in file size for two identical plots with 10,000 points.

x <- rnorm(10000)
png()
plot(x)
dev.off()
file.size("Rplot001.png")/1000
# [1] 118.071
svg()
plot(x)
dev.off()
file.size("Rplot001.svg")/1000
# [1] 3099.181

That's 3.1 Mb vs 118 kb, a factor of 26! Even compressed to a .svgz file, the SVG file is still 317kb.

Update 10 Feb 2016

Or, is SVG the answer? Kenton pointed me towards the svglite package.

library(svglite)
svglite(file = "Rplot001.svg")
plot(x)
dev.off()
file.size("Rplot001.svg")/1000
# [1] 973.619
gz <- function(in_path, out_path = tempfile()) {
  out <- gzfile(out_path, "w")
  writeLines(readLines(in_path), out)
  close(out)
  invisible(out_path)
}
file.size(gz("Rplot001.svg", "Rplot001.svgz")) / 1000
#> [1] 74.11

R code

Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] RColorBrewer_1.1-2 Gmisc_1.3          htmlTable_1.5     
[4] Rcpp_0.12.3       

loaded via a namespace (and not attached):
 [1] Formula_1.2-1       knitr_1.12.3       
 [3] cluster_2.0.3       magrittr_1.5       
 [5] splines_3.2.3       munsell_0.4.2      
 [7] colorspace_1.2-6    lattice_0.20-33    
 [9] stringr_1.0.0       plyr_1.8.3         
[11] tools_3.2.3         nnet_7.3-12        
[13] gtable_0.1.2        latticeExtra_0.6-26
[15] htmltools_0.3       digest_0.6.9       
[17] forestplot_1.4      survival_2.38-3    
[19] abind_1.4-3         gridExtra_2.0.0    
[21] ggplot2_2.0.0       acepack_1.3-3.3    
[23] rsconnect_0.3.79    rpart_4.1-10       
[25] rmarkdown_0.9.2     stringi_1.0-1      
[27] scales_0.3.0        Hmisc_3.17-1       
[29] XML_3.98-1.3        foreign_0.8-66

No comments :

Post a Comment

First Bayesian Mixer Meeting in London

2 Feb 2016 07:39 Bayesian , R No comments

There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought.

Source: http://www.artillery-arms.co.uk/

Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.

We intend the group to be small (announcing only on the stan user group, pymc-devs gitter, and here for now) and geared to open discussion of Bayesian inference, tools, techniques and theory. Neither of us is a great expert, we're really just users of the tools, but we'd love to welcome academic discussion as well as real world examples etc.

Jon is more the Python/PyMC guy, while I come from the R/Rstan corner. We will prepare two talks to kick this off. Jon will talk about GLM Robust Regression with Outlier Detection using PyMC3, while I will talk about Experience vs Data with some stories from insurance and actuarial science, sprinkled with RStan examples.

If you would like to join us, please get in touch via the form below, so that we can keep tabs on numbers, and if this goes all well we shall set up a Meetup site.

No comments :

Post a Comment

Flowing triangles

26 Jan 2016 07:36 Bridget Riley , grid , R , SVG No comments

I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise.

Yet, the actual painting can be explained easily, in a sense of a specification document to reproduce the pattern precisely. However, seeing the real print, as I had the chance at the London Art Fair last week, and a reproduction on the screen is incommensurable.

Having said that, I could not resist programming a figure that resembles the artwork labelled Bagatelle 2. Well, at least I can say that I learned more about grid [1], grid.path [2] and gridSVG [3] in R.

R Code

References

[1] P. Murrell. R Graphics, Second Edition. CRC Press. 2011
[2] P. Murrell. What's in a Name? . The R Journal, 4(2):5–12, dec 2012.
[3] P. Murrell and S. Potter. gridSVG: Export grid graphics as SVG. R package 1.5-0. 2015

Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils  datasets 
[7] methods   base     

other attached packages:
[1] gridSVG_1.5-0    data.table_1.9.6

loaded via a namespace (and not attached):
[1] tools_3.2.3   RJSONIO_1.3-0 chron_2.3-47  XML_3.98-1.3

No comments :

Post a Comment

Formatting table output in R

19 Jan 2016 07:48 formattable , R No comments

Formatting data for output in a table can be a bit of a pain in R. The package formattable by Kun Ren and Kenton Russell provides some intuitive functions to create good looking tables for the R console or HTML quickly. The package home page demonstrates the functions with illustrative examples nicely.

There are a few points I really like:

the functions accounting, currency, percent transform numbers into better human readable output
cells can be highlighted by adding color information
contextual icons can be added, e.g. from Glyphicons
output can be displayed in RStudio's viewer pane

The CRAN Task View: Reproducible Research lists other packages as well that help to create tables for web output, such as compareGroups, DT, htmlTable, HTMLUtils, hwriter, Kmisc, knitr, lazyWeave, SortableHTMLTables, texreg and ztable. Yet, if I am not mistaken, most of these packages focus more on generating complex tables with multi-columns rows, footnotes, math notation, etc, than the points I mentioned above.

Finally, here is a little formattable example from my side:

Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] formattable_0.1.5

loaded via a namespace (and not attached):
 [1] shiny_0.12.2.9006 htmlwidgets_0.5.1 R6_2.1.1         
 [4] rsconnect_0.3.79  markdown_0.7.7    htmltools_0.3    
 [7] tools_3.2.3       yaml_2.1.13       Rcpp_0.12.2      
[10] highr_0.5.1       knitr_1.12        jsonlite_0.9.19  
[13] digest_0.6.9      xtable_1.8-0      httpuv_1.3.3     
[16] mime_0.4

No comments :

Post a Comment

R in Insurance: Registration and abstract submission opened

12 Jan 2016 08:00 Cass Business School , Conference , R , R in Insurance No comments

Following the successful 3rd R in Insurance conference in Amsterdam last year, we return to London this year.

The registration for the 4th conference on R in Insurance on Monday 11 July 2016 at Cass Business School has opened.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Invited talks will be given by:

Mario V. Wüthrich, RiskLab, Department of Mathematics, ETH Zurich.
Dan Murphy, President, Trinostics LLC.

Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School, London.

The submission deadline for abstracts is 28 March 2016. Please email your abstract of no more than 300 words to: rinsuranceconference@gmail.com.

Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

For more information about the past events visit www.rininsurance.com.