Reserving based on log-incremental payments in R, part III
I will use the same triangle of incremental claims data as introduced in my previous post. The final model had three parameters for origin periods and two parameters for development periods. It is possible to reduce the model further as Christofides illustrates in section L onwards by using an inflation index to bring all claims payments to current value and a claims volume adjustment or weight for each origin period to normalise the triangle.
In his example Christofides uses claims volume adjustments for the origin years and an earning or inflation index for the different payment calendar years. The claims volume adjustments aims to normalise the triangle for similar exposures across origin periods, while the earnings index, which measures largely wages and other forms of compensations, is used as a first proxy for claims inflation. Note that the earnings index shows significant year on year changes from 5% to 9%. Barnett and Zehnwirth [2] would probably recommend to add further parameters for the calendar year effects to the model.
# Page D5.36
ClaimsVolume <- data.frame(origin=0:6,
volume.index=c(1.43, 1.45, 1.52, 1.35, 1.29, 1.47, 1.91))
# Page D5.36
EarningIndex <- data.frame(cal=0:6,
earning.index=c(1.55, 1.41, 1.3, 1.23, 1.13, 1.05, 1))
# Year on year changes
round((1-EarningIndex$earning.index[-1]/EarningIndex$earning.index[-7]),2)
# [1] 0.09 0.08 0.05 0.08 0.07 0.05
dat <- merge(merge(dat, ClaimsVolume), EarningIndex)
# Normalise data for volume and earnings
dat$logvalue.ind.inf <- with(dat, log(value/volume.index*earning.index))
with(dat, interaction.plot(dev, origin, logvalue.ind.inf))
points(1+dat$dev, dat$logvalue.ind.inf, pch=16, cex=0.8)
Indeed, the interaction plot shows the various origin years now to be much more closely grouped. Only the single point of the last origin period stands out now.
Christofides tests several models with different numbers of origin levels, but I am happy with the minimal model using only one parameter for the origin period, namely the intercept:# Page D5.39
summary(Fit4 <- lm(logvalue.ind.inf ~ d + s, data=na.omit(dat)))
#
# Call:
# lm(formula = logvalue.ind.inf ~ d + s, data = na.omit(dat))
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.24591 -0.05066 0.01044 0.05202 0.26070
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 8.50073 0.05271 161.278 < 2e-16 ***
# d -0.28598 0.06901 -4.144 0.000342 ***
# s -0.48889 0.01725 -28.337 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1179 on 25 degrees of freedom
# Multiple R-squared: 0.9795, Adjusted R-squared: 0.9779
# F-statistic: 597.2 on 2 and 25 DF, p-value: < 2.2e-16
All coefficients are significant and I am left with a model of only three parameters. The residual plots suggest that my model is reasonable, only the QQ-plot shows that the distribution of the residuals is a little bit skewed.resPlot(Fit4, dat)
op <- par(mfrow=c(2,2),oma = c(0, 0, 3, 0))
plot(Fit4)
par(op)
I am happy with the model. To forecast the future claims payments I prepare a data frame with the predictors for those years.
# Tail of 6 more years over the observed period
tail.years <- 6
# Create a data frame for the future periods
fdat <- data.frame(
origin=rep(0:(m-1), n+tail.years),
dev=rep(0:(n+tail.years-1), each=m)
)
fdat <- within(fdat,{
d <- ifelse(dev < 1, 1, 0)
s <- ifelse(dev < 1, 0, dev)
cal <- origin + dev
a6 <- ifelse(origin == 6, 1, 0)
})
# New data
ND <- subset(fdat, cal>6)
ND <- merge(ND, ClaimsVolume)
Next I update my prediction function from last week with new parameters for claims inflation and indexation. The volume index and claims inflation parameters will be used to scale the output back to the original units and to inflated the future payments by a constant rate. Of course the indexation is a model in itself with uncertainty, which can be considered as part of the model error. Note, that I scale the data back to my volume index, but not earnings/inflation.log.incr.predict <- function(
model, # lm output
newdata, # same argument as in predict
claims.inflation=0, # Assumed inflation (scalar)
volume.index=NULL, # name of the v.i. column in newdata
origin.var="origin", # name of the origin column in newdata
dev.var="dev", # name of the dev column in newdata
cal.var="cal" # name of the cal. period col. in newdata
){
origin <- newdata[[origin.var]]
dev <- newdata[[dev.var]]
cal <- newdata[[cal.var]]
if(is.null(volume.index)){
index <- 1
}else{
index <- newdata[[volume.index]]
}
Pred <- predict(model, newdata=newdata, se.fit=TRUE)
Y <- Pred$fit
VarY <- Pred$se.fit^2 + Pred$residual.scale^2
P <- exp(Y + VarY/2)
P <- P*index*(1 + claims.inflation)^(cal - min(cal) + 1)
VarP <- P^2*(exp(VarY)-1)
seP <- sqrt(VarP)
## Recreate formula to derive the model.frame and future design matrix
model.formula <- as.formula(paste("~", as.character(formula(model)[3])))
## See also package formula.tool
mframe <- model.frame(model.formula, data=newdata)
fdm <- model.matrix(model.formula, data=newdata)
varcovar <- fdm %*% vcov(model) %*% t(fdm)
Total.SE <- sqrt( t(P) %*% (exp(varcovar)-1) %*% P )
Total.Reserve <- sum(P)
# Prepare output
Incr=data.frame(origin, dev, Y, VarY, P, seP, CV=seP/P)
out <- list(Forecast=Incr[order(newdata[[origin.var]]),],
Totals=data.frame(Total.Reserve,
Total.SE=Total.SE,
CV=Total.SE/Total.Reserve))
return(out)
}
With my new prediction function it is easy to test different scenarios of claims inflations and their potential impact on the overall reserve requirements. Following the paper I will assume claims inflation of 7.5%. This gives me the following future payements triangle and reserves.
FM4 <- log.incr.predict(Fit4, ND,
claims.inflation=0.075,
volume.index="volume.index")
# Page D5.41
round(xtabs(P ~ origin + dev, data=FM4$Forecast),0)
# dev
# origin 1 2 3 4 5 6 7 8 9 10 11 12
# 0 0 0 0 0 0 0 249 165 109 72 47 31
# 1 0 0 0 0 0 412 272 179 118 78 52 34
# 2 0 0 0 0 703 464 306 202 134 88 58 39
# 3 0 0 0 1018 671 443 292 193 127 84 56 37
# 4 0 0 1585 1045 690 455 300 198 131 87 57 38
# 5 0 2945 1942 1280 845 557 368 243 160 106 70 46
# 6 6241 4114 2712 1788 1180 778 514 339 224 148 98 65
round(xtabs(seP ~ origin + dev, data=FM4$Forecast),0)
# dev
#origin 1 2 3 4 5 6 7 8 9 10 11 12
# 0 0 0 0 0 0 0 36 25 18 13 9 6
# 1 0 0 0 0 0 55 39 27 19 14 10 7
# 2 0 0 0 0 90 62 44 31 22 16 11 8
# 3 0 0 0 125 86 59 42 29 21 15 11 7
# 4 0 0 192 129 88 61 43 30 21 15 11 8
# 5 0 358 235 158 108 75 52 37 26 19 13 9
# 6 777 500 329 220 151 105 73 52 37 26 19 13
FM4$Totals
# Total.Reserve Total.SE CV
# 38083.25 1724.987 0.04529515
Compared to the results of the previous week the overall reserves increased by £5,000, while the overall standard error has been reduced due to the smaller number of parameters. Chirstofides explains that the big increase in the overall reserves is largely driven by the most recent origin year, for which I have only one data point. From the residual plot I can see that the standardised residual for this point is about -1. This is not statistical significant, but I noticed that the highest value of the original triangle in development period 0 has become the lowest after the volume adjustments, see also the interaction plot at the top.
By putting the last origin period back into the model I get an output which is more in line with the result of last week.
# Page D5.42
log.incr.predict(lm(logvalue.ind.inf ~ a6 + d + s, dat), ND,
claims.inflation=0.075,
volume.index="volume.index")$Totals
# Total.Reserve Total.SE CV
# 35901.59 2609.29 0.07267895
Conclusions
Reserving is always mixture of art and science, a combination of sound data analysis with expert judgement. A statistical data analysis can help to understand how much expert judgement is required. As Christofides points out in his closing remarks of section L, it is desirable to embed reserving into a Bayesian framework. Wayne Zhang has done some great research in this area. Yet, simple linear models are powerful tools to investigate the data. The model presented here can particularly help to investigate trends in the calendar/payement year direction. Those trend changes can highlight movements in claims inflation, or indeed changes in the claims settling process. Neither of those factors should be ignored.
The assumption that claims follow a log-normal distribution feels intuitively reasonable to me. Yet, the occasional negative incremental claim needs to be carefully considered. It certainly is a prompt to check if the assumption of log-normal distributed incremental claims is reasonable. Packages like car
(Companion to Applied Regression) offer lots of diagnostic tools. Any pointers to how I could use those tools effectively will be much appreciated.
Barnett and Zehnwirth present a further idea to test these models by reducing the data for the fitting exercise and testing how stable the coefficients and predictions of the model are, see section 3.3 and table 3.3 in [2].
As usual the R code of this post is available as a gist on GitHub. The code on Github contains more details than presented here. It also includes examples from the Barnett and Zehnwirth paper mentioned above. Is there a demand to include the functions presented here into the ChainLadder package? Please get in touch.
References
[1] Stavros Christofides. Regression models based on log-incremental payments. Claims Reserving Manual. Volume 2 D5. September 1997
[2] Glen Barnett and Ben Zehnwirth. Best estimates for reserves. Proceedings of the CAS, LXXXVII(167), November 2000.
Session Info
R version 2.15.2 Patched (2013-01-01 r61512)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.15.2
3 comments :
Hi Marcus,
Thank you for re-visiting my paper and for sharing your implementation in R.
You will be pleased to hear that back in 1990, when the paper was first published, the main motivation was also to share the mechanics of these calculations with other reserving practitioners to encourage the understanding, further testing and the development of stochastic reserving. We would appear to share common motives albeit a generation apart!
The trigger for my work was Ben Zehnwirth who was at that time selling his ICRFS black box and was reluctant to share its internal secrets (calculations). The paper owes much to the assistance and encouragement of Dr Renshaw and Prof Verrall of CASS. The Barnett & Zehnwirth paper was published in 2000, which was about ten years after the original publication of the Institute Claims Reserving Manual (CRM), Vol 2.
The accompanying spreadsheet was made available in the 1997 edition of the CRM after I transferred the original SuperCalc model to MS Excel, which was by then the standard spreadsheet office application. This morning I transferred the Excel model to my android mobile (2011 model) and it runs perfectly well on the free Kingsoft Office app. Recalculation takes just under five seconds, considerably faster than the couple of minutes taken by office PC's at the time of development. How times have changed..
The underlying assumptions behind the stochastic reserving models of the last twenty five years continue to produce results that often defy belief or bear any relation to the real world of reserving and reserving variability. Yet, such results are often used as a basis for estimating or determining variability of insurance outcomes for capital purposes and Solvency II.
In this context I hope that your blog and good work encourage you and others to look afresh at the underlying assumptions to see whether a new, different, fresh approach can help produce more realistic results. There is significantly more information on claims now readily available to the analyst as well as the computer systems and processing power to encourage thinking outside of the constraints of simplistic data triangles that still appear to dominate stochastic reserving methodology.
Kind regards
Stavros Christofides
stavros@christofides.net
Dear Stavros,
Many thanks for your comments. I share your thoughts; if our mobile phones are more powerful today than our office PCs 15 years ago, then we clearly should be able to make more use of more granular data and sophisticated models and algorithms.
Markus
this is amazing site of car camera.if you want many things product of car camera please visit www.newlookmegastore.com
here i hope you will get your chooseful product.
Post a Comment