I can't resist to write about this here as well. David's first post is about k-means cluster analysis. One of the popular algorithms for k-means is Lloyd's algorithm. So, on that note I will use a picture of the Lloyd's of London building to play around with David's code, despite the fact that the two Lloyds have nothing to do with each other. Lloyd's provides pictures of its building copyright free on its web site. However, I will use a reduced file size version hosted on wikimedia.
The jpeg package by Simon Urbanek  allows me to load a jpeg-file into R. The R object of the images is an array, which has the structure of three layered matrices, representing the value of the colours red, green and blue for each x and y coordinate. I convert the array into a data frame, as this is an accepted structure by k-means and plot the data.
library(jpeg) url <-"http://upload.wikimedia.org/wikipedia/commons/6/6a/6414A_1_copy.jpg" fn <- paste(tempfile(), "jpeg", sep=".") download.file(url, fn, mode = ifelse(Sys.info()['sysname'] == "Windows", 'wb', 'w')) readImage <- readJPEG(fn) dm <- dim(readImage) rgbImage <- data.frame( x=rep(1:dm, each=dm), y=rep(dm:1, dm), r.value=as.vector(readImage[,,1]), g.value=as.vector(readImage[,,2]), b.value=as.vector(readImage[,,3])) plot(y ~ x, data=rgbImage, main="Lloyd's building", col = rgb(rgbImage[c("r.value", "g.value", "b.value")]), asp = 1, pch = ".")
Running a k-means analysis on the three colour columns in my data frame allows me to reduce the picture to k colours. The output gives me for each x and y coordinate the colour cluster it belongs to. Thus, I plot my picture again, but replace the original colours with the cluster colours.
kColors <- 5 kMeans <- kmeans(rgbImage[, c("r.value", "g.value", "b.value")], centers = kColors) clusterColour <- rgb(kMeans$centers[kMeans$cluster, ]) plot(y ~ x, data=rgbImage, main="Lloyd's building", col = clusterColour, asp = 1, pch = ".", axes=FALSE, ylab="", xlab="k-means cluster analysis of 5 colours")
Running the k-means algorithm not only over the colours but over the whole data frame, including the x and y coordinate will find localised clusters, or so called Voronoi cells. This gives quite an artistic picture.
nRegions <- 2000 # Number of Voronoi cells. # Be patient, this takes time. voronoiMeans <- kmeans(rgbImage, centers = nRegions, iter.max = 50) voronoiColor <- rgb(voronoiMeans$centers[voronoiMeans$cluster, 3:5]) plot(y ~ x, data=rgbImage, col = voronoiColor, asp = 1, pch = ".", main="Lloyd's building", axes=FALSE, ylab="", xlab="2000 local clusters")
Instead of plotting all x and y coordinates I can also plot only the cluster centres. In the following chart I generated 500 cluster and plotted their centres as little rectangles, which provides an interesting picture in itself.
nRegions <- 500 voronoiMeans <- kmeans(rgbImage, centers = nRegions, iter.max = 50) voronoiColor <- voronoiColor <- rgb(voronoiMeans$centers[,3:5]) plot(y ~ x, data=voronoiMeans$centers, col = voronoiColor, main="Lloyd's building", asp = 1, pch = 15, cex=1.5, axes=FALSE, ylab="", xlab="500 local clusters")
You can access the R-code of this post also as gist on Github. Many thanks again to David Sparks for his inspirations on is.R().
 David Sparks. Dominant color palettes with-k-means, is.R(). November 27, 2012
 David Sparks. Images as Voronoi tessellation, is.R(). November 28, 2012
 Simon Urbanek (2012). jpeg: Read and write JPEG images R package version 0.1.2.
sessionInfo() R Under development (unstable) (2012-10-19 r60974) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale:  en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  jpeg_0.1.2