London Olympics 100m men's sprint results

8 comments
The 100m mean's sprint finals of the 2012 London Olympics are over and Usain Bolt won the gold medal again with a winning time of 9.63s. Time to compare the result with my forecast of 9.68s, posted on 22 July.


My simple log-linear model predicted a winning time of 9.68s with a prediction interval from 9.39s to 9.97s. Well, that is of course a big interval of more than half a second, or ±3%. Yet, the winning time was only 0.05s away from my prediction. That is less than 1% difference. Not bad for such a simple model.

Comments on my earlier post suggested to look into other parameters as well, such as track, weather and wind, or the times of the other medal winners. Others thought to focus on the recent past performance of the participants, rather than historical times over the last 100 years.

Interestingly enough the Economist published an article on the same subject (Faster, higher, no longer) in its current print edition (following my example :-?). The article uses data back to 1912 as well and considers wind and altitude as critical parameters.

The Olympic games are not over yet, and there are more opportunities to forecast results. Rob J. Hyndman lists further interesting examples of Olympic models and predictions on his blog, and my colleague Matt Malin presents ideas to model the 100m butterfly men's swimming final on his site.

R code used in this post

## 100m men's sprint historical winning times
## Sourced from:
## http://www.databaseolympics.com/sport/sportevent.htm?enum=110&sp=ATH

golddata <- read.table(sep=",", header=TRUE, 
text="Year, Event, Athlete, Medal, Country, Result
1896, 100m Men,        Tom Burke,  GOLD,     USA,  12.00
1900, 100m Men,     Frank Jarvis,  GOLD,     USA,  11.00
1904, 100m Men,      Archie Hahn,  GOLD,     USA,  11.00
1906, 100m Men,      Archie Hahn,  GOLD,     USA,  11.20
1908, 100m Men,    Reggie Walker,  GOLD,     SAF,  10.80
1912, 100m Men,      Ralph Craig,  GOLD,     USA,  10.80
1920, 100m Men,  Charles Paddock,  GOLD,     USA,  10.80
1924, 100m Men,  Harold Abrahams,  GOLD,     GBR,  10.60
1928, 100m Men,   Percy Williams,  GOLD,     CAN,  10.80
1932, 100m Men,      Eddie Tolan,  GOLD,     USA,  10.30
1936, 100m Men,      Jesse Owens,  GOLD,     USA,  10.30
1948, 100m Men, Harrison Dillard,  GOLD,     USA,  10.30
1952, 100m Men,   Lindy Remigino,  GOLD,     USA,  10.40
1956, 100m Men,     Bobby Morrow,  GOLD,     USA,  10.50
1960, 100m Men,       Armin Hary,  GOLD,     GER,  10.20
1964, 100m Men,        Bob Hayes,  GOLD,     USA,  10.00
1968, 100m Men,        Jim Hines,  GOLD,     USA,   9.95
1972, 100m Men,    Valery Borzov,  GOLD,     URS,  10.14
1976, 100m Men,  Hasely Crawford,  GOLD,     TRI,  10.06
1980, 100m Men,      Allan Wells,  GOLD,     GBR,  10.25
1984, 100m Men,       Carl Lewis,  GOLD,     USA,   9.99
1988, 100m Men,       Carl Lewis,  GOLD,     USA,   9.92
1992, 100m Men, Linford Christie,  GOLD,     GBR,   9.96
1996, 100m Men,   Donovan Bailey,  GOLD,     CAN,   9.84
2000, 100m Men,   Maurice Greene,  GOLD,     USA,   9.87
2004, 100m Men,    Justin Gatlin,  GOLD,     USA,   9.85
2008, 100m Men,       Usain Bolt,  GOLD,     JAM,   9.69
")

myData <- subset(golddata, Year>=1900 & Year<2012)
log.linear <- lm(log(Result)~Year, data=myData)  
years <- seq(1896,2012, 4)

predictions <- exp(predict(log.linear, 
                           newdata=data.frame(Year=years), 
                           level=0.95,
                           interval="prediction"))

predictions <- data.frame(predictions)

plot(Result ~ Year,  data=golddata,
     xlim=c(1896,2012),
     ylim=c(9.5,12),
     xlab="Year", main="Olympic 100 metre sprint",
     ylab="Winning time for the 100m men's final (s)")  
lines(years, predictions$fit, col="red")
lines(years, predictions$lwr, col="black", lty=2)
lines(years, predictions$upr, col="black", lty=2)
London.Prediction <- predictions$fit[length(years)] 
points(2012, London.Prediction, pch=19, col="red")

## 2012 London Olympics 100m men's gold winnning time
winning.time <- 9.63
points(2012, winning.time, pch=21, cex=1, bg="gold")
legend(x=1960, y=12, title="London Olympics",
       legend=c(paste("Prediction: ", round(London.Prediction, 2)),
        paste("2012 Result:", winning.time)), 
       col=c("red", "black"), pch=c(19, 21),
       box.col="white", pt.bg=c("red", "gold"))

8 comments :

Ben Moran said...

Nicely done!

I've had issues when extracting midpoint predictions from a
log-linear model though.  The upper and lower confidence bounds are right, but
because you have modelled the log time, it looks like you have fitted the geometric
mean rather than the arithmetic mean, so the prediction will be biased too low.



I think you should add in 0.5*summary(log.linear)$sigma**2 to the
midpoint prediction before you exponentiate, to get an unbiased estimate of the mean.  See e.g. 
http://shazam.econ.ubc.ca/intro/olslog.htm.  It makes little difference in this case because the variance is small, but it can be very important in general.

Tal Galili said...

Hi Ben,
I suspect that in this case, what he gives is a prediction for the Median, but indeed not for the mean.

Cheers,
Tal

Cisco said...

Good job. This forecast was on my mind the whole time during the 100m event. .05s is pretty close in my book. Impressive.

Roma Hicks said...

Very cool, but the variance from 1980 on has me interested.  It is so much smalller then the previous years.  Prehaps more electronic timing?  Any ideas?

Sports said...

check which nation has scored more gold medals 
London Olympics Medal Table 2012

Olympics said...

check which nation has scored more gold medals 
London Olympics Medal Table 2012

anna Sonata said...

Nice article, thanks for the information.
Anna @ http://griyamobilkita.wordpress.com

Fr. said...

When adapting your code for a class demo, I realized that integrating the 1896 data point to the log-linear model produces an almost-perfect prediction. The code is there: http://f.briatte.org/teaching/ida/080_models.html

Post a Comment