London Olympics and a prediction for the 100m final
It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games.
The Economist investigated the potential financial impact on sponsors (some benefits), tax payers (no benefits) and the athletes (if they are lucky) in its recent issue and video.
The Guardian has a whole series around the Olympics, including the data of all Summer Olympic Medallists since 1896.
100m men final
The biggest event of the Olympics will be one of the shortest: the 100 metres men final. It will be all over in less than 10 seconds. In 1968 Jim Hines was the first gold medal winner, who achieved a sub-ten-seconds time and since 1984 all gold medal winners have run faster than 10 seconds. The historical run times of the past Olympics going back to 1896 are available from databasesport.com.
Looking at the data it appears that a simple log-linear model will give a reasonable forecast for the 2012 Olympic's result (ignoring the 1896 time). Of course such a model doesn't make sense forever, as it would suggest that future run-times will continue to shrink. Hence, some kind of logistics model might be a better approach, but I have no idea what would be a sensible floor for it. Others have used ideas from extreme value theory to investigate the 100m sprint, see the paper by Einmahl and Smeets, which would suggest a floor greater than 9 seconds.
Historical winning times for the 100m mean final. Red line: log-linear regression, black line: logistic regression. |
My simple log-linear model forecasts a winning time of 9.68 seconds, which is 1/100 of a second faster than Usain Bolt's winning time in Beijing in 2008, but still 1/10 of a second slower than his 2009 World Record (9.58s) in Berlin.
Never-mind, I shall stick to my forecast. The 100m final will be held on 5 August 2012. Now even I get excited about the Olympics, and be it for less than 10 seconds.
R code
Here is the R code used in this the post:
library(XML)
library(drc)
url <- "http://www.databaseolympics.com/sport/sportevent.htm?enum=110&sp=ATH"
data <- readHTMLTable(readLines(url), which=2, header=TRUE)
golddata <- subset(data, Medal %in% "GOLD")
golddata$Year <- as.numeric(as.character(golddata$Year))
golddata$Result <- as.numeric(as.character(golddata$Result))
tail(golddata,10)
logistic <- drm(Result~Year, data=subset(golddata, Year>=1900), fct = L.4())
log.linear <- lm(log(Result)~Year, data=subset(golddata, Year>=1900))
years <- seq(1896,2012, 4)
predictions <- exp(predict(log.linear, newdata=data.frame(Year=years)))
plot(logistic, xlim=c(1896,2012),
ylim=c(9.5,12),
xlab="Year", main="Olympic 100 metre",
ylab="Winning time for the 100m men's final (s)")
points(golddata$Year, golddata$Result)
lines(years, predictions, col="red")
points(2012, predictions[length(years)], pch=19, col="red")
text(2012, 9.55, round(predictions[length(years)],2))
13 comments :
Speeds (say in m/s) look pretty nearly linear since 1900; from the look of plotting that (speed vs year), I'd be inclined to say that bettering the 2008 time (which speed was faster than the trend) looks pretty close to a 50-50 bet, the small positive but not significant correlation at lag 1 in the residuals partly offsetting the fact that the fit for 2012 is sitting just below the 2008 speed. Converting the speed back (ignoring the small bias term in a Taylor expansion of the inverse), I'd predict about 9.69 s, same as 2008 - but to be honest the standard error of the prediction interval is pretty big.
Interesting trend but the final time will very much depend on the weather, especially the wind.
Check london olympics information like schedule, medals, sport events results here
London Olympics 2012 Results
Interesting, but you have more data to use. As Qqvvdb points out there are a number of variables that can be used, including weather (wind and temp), competitors and their data, type of track, etc. Furthermore, many of the likely finalists can run/have run 9.7s. It is not too difficult to identify the likely finalists and examine their head-to-head races in the last year or two. Thanks for start this off!
Great post. But shouldn't the red line (for the log-linear model) be curved?
Nice!! Here is an alternative prediction
http://www.redaelli.org/matteo-blog/2012/07/23/london-olympics-prediction-for-the-100m-final-with-r-and-strategico/
The red is
'more visibly' curved if you let the prediction run for long enough.
plot(exp(predict(log.linear, newdata=data.frame(Year=1:5000))),
t="l")
I see no reason why one would use Years as X... this seems much more like a series where you are trying to predict the next term, regardless timeline.
What if you were to model the predicted time blending (or somehow otherwise factoring in) all gold, silver, and bronze times? Be interesting, for one thing, to simply see "win/place/show" range/variation whisker graphics across the timeline. Seems like, just looking at the data, the top 3 differ by mostly .2 seconds (Gold to Bronze) from one Olympiad to another. Might not add anything of value, but, dunno. 3x the "n".
Great post, thanks for sharing!
You inspired us pretty much with the idea and also with the detailed R code you provided, so we came up with some similar predictions on swimming times, please find the attached images or the forthcoming ones on FB: https://www.facebook.com/pages/Rapporter/335445206543916
Unfortunately our models do not include the FINA regulations regarding LZR Racer Suit yet, but that's coming too - with shared sources on GitHub soon :)
Very interesting. We'll all be sure to hold you to your prediction :-). I'm going to make a much simpler prediction of 9.48 seconds seeing as Usain Bolt ran a 9.58 in 2009 and a 9.69 in 2008 (as you mentioned in your post).
Very nice prediction, given that the winning time was 9.63 secs!
Look at Startegico, It successfully hits the goal 9.30
http://www.redaelli.org/matteo-blog/2012/07/23/london-olympics-prediction-for-the-100m-final-with-r-and-strategico/
Post a Comment