How to use optim in R

12 Mar 2013 07:49 optim , R , Tutorials 9 comments

A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course there are built-in functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim.

The function optim provides algorithms for general-purpose optimisations and the documentation is perfectly reasonable, but I remember that it took me a little while to get my head around how to pass data and parameters to optim. Thus, here are two simple examples.

I start with a linear regression by minimising the residual sum of square and discuss how to carry out a maximum likelihood estimation in the second example.

Minimise residual sum of squares

I start with an x-y data set, which I believe has a linear relationship and therefore I'd like to fit y against x by minimising the residual sum of squares.

dat=data.frame(x=c(1,2,3,4,5,6), 
               y=c(1,3,5,6,8,12))

Next, I create a function that calculates the residual sum of square of my data against a linear model with two parameter. Think of y = par[1] + par[2] * x.

min.RSS <- function(data, par) {
              with(data, sum((par[1] + par[2] * x - y)^2))
             }

Optim minimises a function by varying its parameters. The first argument of optim are the parameters I'd like to vary, par in this case; the second argument is the function to be minimised, min.RSS. The tricky bit is to understand how to apply optim to your data. The solution is the ... argument in optim, which allows me to pass other arguments through to min.RSS, here my data. Therefore I can use the following statement:

result <- optim(par = c(0, 1), min.RSS, data = dat)
# I find the optimised parameters in result$par
# the minimised RSS is stored in result$value
result
## $par
## [1] -1.267  2.029
## 
## $value
## [1] 2.819
## 
## $counts
## function gradient 
##       89       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Let me plot the result:

plot(y ~ x, data = dat, main="Least square regression")
abline(a = result$par[1], b = result$par[2], col = "red")

Great, this looks reasonable. How does it compare against the built in linear regression in R?

lm(y ~ x, data = dat)
## 
## Call:
## lm(formula = y ~ x, data = dat)
## 
## Coefficients:
## (Intercept)            x  
##       -1.27         2.03

Spot on! I get the same answer.

Maximum likelihood

In my second example I look at count data and I would like to fit a Poisson distribution to this data.

Here is my data:

obs = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 42, 43)
freq = c(1392, 1711, 914, 468, 306, 192, 96, 56, 35, 17, 15, 6, 2, 2, 1, 1)
x <- rep(obs, freq)
plot(table(x), main="Count data")

To fit a Poisson distribution to x I don't minimise the residual sum of squares, instead I maximise the likelihood for the chosen parameter lambda.

The likelihood function is given by:

lklh.poisson <- function(x, lambda) lambda^x/factorial(x) * exp(-lambda)

and the sum of the log-liklihood function is:

log.lklh.poisson <- function(x, lambda){ 
                     -sum(x * log(lambda) - log(factorial(x)) - lambda) 
                     }

By default optim searches for parameters, which minimise the function fn. In order to find a maximium, all I have to do is change the sign of the function, hence -sum(...).

optim(par = 2, log.lklh.poisson, x = x)
## Warning: one-diml optimization by Nelder-Mead is unreliable: use "Brent"
## or optimize() directly
## $par
## [1] 2.704
## 
## $value
## [1] 9966
## 
## $counts
## function gradient 
##       24       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Ok, the warning message tells me that I shoud use another optimisation algorithm, as I have a one dimensional problem - a single parameter. Thus, I follow the advise and get:

optim(par = 2, log.poisson, x = x, method = "Brent", lower = 2, upper = 3)
## $par
## [1] 2.704
## 
## $value
## [1] 9966
## 
## $counts
## function gradient 
##       NA       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

It's actually the same result. Let's compare the result to fitdistr, which uses maximum liklihood as well.

library(MASS)
fitdistr(x, "Poisson")
##    lambda 
##   2.70368 
##  (0.02277)

No surprise here, it gives back the mean, which is the maximum likelihood parameter.

mean(x)
## [1] 2.704

For more information on optimisation and mathematical programming with R see the CRAN Task View on this subject.

Session Info

sessionInfo()
## R version 2.15.2 (2012-10-26)
## Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MASS_7.3-22

9 comments :

ayush said...: Hi,

I tried the same thing as yours. But I am getting different answers from optim and lm(). Please look into it.

Following is the data.
Int Ext y

1 89 21 2625
2 93 24 2700
3 91 21 3100
4 122 23 3150
5 115 27 3175
6 100 18 3100
7 98 19 2700
8 105 16 2475
9 112 23 3625
10 109 28 3525
11 130 20 3225
12 104 25 3450
13 104 20 2425
14 111 26 3025
15 97 28 3625
16 115 29 2750
17 113 25 3150
18 88 23 2600
19 108 19 2525
20 101 16 2650

Following is the code.

min.function <- function(data, par)
{
with(data, sum((par[1]+par[2]*Int+par[3]*Ext - y)^2))
}
result <- optim(par =c(0,1,2), min.function, data=dat)
result

$par

[1] -52.53001 16.02561 59.25839

$value

[1] 2069061

$counts

function gradient

192 NA

$convergence

[1] 0

$message

NULL

From built in lm package

lm(formula = y ~ Int+Ext)

Coefficients:

(Intercept) Int Ext

993.92 8.22 49.71

Please help.
Why am I getting different answers from both ?; 22 May 2013 at 10:19
Markus Gesmann said...: Change you initial parameters from c(0,1,2) to c(100, 1, 2) and see if this has an impact.; 22 May 2013 at 23:14
Ayush Raj Singh said...: Hi Markus,

It worked. Awesome. But could you please explain why it worked ? I just followed your code. I really do not understand how par =c(0,1,2) or par = c(100,1,2) are related to par[1], par[2] and par[3] ?

Thank you for the help.

ayush; 23 May 2013 at 04:38
Markus Gesmann said...: The parameters par[1], par[2], par[3] are initialised via the vector par=c(0,1,2). I believe, you should have used a different optimisation method for your problem, as optim(par=c(0,1,2), min.function, data=dat, method="BFGS") would give you the answer you are looking for. I don't know which fitting method 'lm' is using in the background, but I am sure the answer is in the documentation or online.; 23 May 2013 at 18:58
ayush said...: Ok. But, how does initialization of parameters affects the answer. Acc. to me answer should come out to be the same from both initial values (100 or 0).

Also, what other optimization method would you suggest for my problem.

Thank you.; 24 May 2013 at 09:47
Markus Gesmann said...: Well, optim is looking for minima and what do you do if your function has serval minima and your start close a local one? I think you have to take a text book and look into this in more detail and read the documentation.; 24 May 2013 at 18:55
Ashish Verma said...: Hi Markus,

I am grad student and I have written a custom function for piece wise continuous fit with constraints on boundaries. I have used optim() and I am getting decent results but in order to improve these results I am trying to program in the gradient as well, but when I put the gradient in optim returns the exact same parameters which I give it to initialize. If you have any suggestions please share. Also, I faced same problem with optimx() and lab.optim(). Thanks in advance.; 28 June 2013 at 19:57
Markus Gesmann said...: I suggest you post your question with a small reproducible example to R-help: http://www.r-project.org/mail.html.; 28 June 2013 at 21:54
Arjun Kavallur said...: Hello Markus
The blog you have posted has been very informative and helpful!I have been trying to calculate the maximum likelihood estimate of an ARMA process using optim.
In the process I have to recursively find the error terms(white noise) using a for loop which involves the use of parameters.I want to optimize the sum of the square of the error terms and hence estimate the parameters. I believe that both the processes cannot be done under the same function(fn). If I calculate the for loop under a different function the error message of "parameters" are missing crops up.
Moreover I have found that if we optimized a function using two different parameters an error message of "second parameter" missing with no default comes up.
Could you please advice me on these two issues.
Thanks; 31 January 2014 at 09:35

How to use optim in R

Minimise residual sum of squares

Maximum likelihood

Session Info

9 comments :

Post a Comment

Popular Posts

Blog Archive

My Blog List

ContactForm