possible solution to lmfit-pml issue? #277

arunpersaud · 2021-06-23T01:54:47Z

This is in regards to the fitting problem described in #251, but since it doesn't relate to minuit a new issue is probably the best place to post this.

Also just as a note, I don't really understand the details of this issue, so not sure if the comment here actually applies or if this is just a random coincidence that makes things look as if they work ;)

I noticed that when I install numdifftools (an optional dependency for lmfit), I do get error estimates for lmfit-pml after changing

calc_covar=True

around line 660 in core/fitting.py.

The errors are too big though (I get 1000 +- 1100).

To get something reasonable, I had to change

scale_covar=False,

and also add a factor of 2 to

diff = (model - scipy.special.xlogy(data, model))*2

here.

I'm especially not sure about the factor of 2, but I think for Gaussians
there is a factor of two between the likelihood and chi-squared... so
perhaps there is something similar here?

This gives:

A = 1001.4240078859714 +- 31.671526225310576

For the example from the github issue #251. This seems to match the expected value quite well!

Perhaps it is worthwhile to make numdifftools an explicit dependency for becquerel. If the other changes makes sense to you, perhaps one could use this to fix lmfit-pml.

The text was updated successfully, but these errors were encountered:

cosama · 2021-06-23T02:54:11Z

I wrote that code a while back and don't remember all the details. I think lmfit-pml produced weird uncertainties when numdifftools was installed, so I decided to disable any uncertainty calculation in general. That is probably why the calc_covar parameter is False. If it is True the covariance matrix is calculate. To do that it uses numerical derivation (that is why the numdifftools package is needed) to calculate the Fisher information (see https://en.wikipedia.org/wiki/Fisher_information) and then uses the Cramér-Rao bound (see same Wikipedia article further down) to estimate the covariance matrix. The diagonal elements in the covariance matrix are the respective uncertainties for each parameter.

The log likelihood function in

becquerel/becquerel/core/fitting.py

Line 805 in 8f26ec3

diff = model - scipy.special.xlogy(data, model)

(log[f(x, theta)] in the wikipedia article) is an abbreviated form though, and needs an additional -sum_i log(k!)factor (see https://statlect.com/fundamentals-of-statistics/Poisson-distribution-maximum-likelihood) to be a proper log likelihood. I am not sure if that affects the fisher information though, I think it shouldn't as that factor disappears during the differentiation process.

Your observation is interesting, because I remember that lmfit actually has the assumptions of Gaussian statistics backed into the method that calculates the uncertainties. Another reason, I think, I disabled that. However, what you found suggest that by adding a factor of two we recover, as you suggest, something that is chi^2 distributed (I think that would make sense, Mark might comment on this), and thus produces the correct answer (or close to correct) for large number of counts in each bin, as Poisson statistics approaches Gaussian statistics in that case.

I think we can enable this method with the suggestions you posted above and maybe add a warning about to what extend the respective uncertainties can be trusted.

As a general note, Jayson implemented iminuit into the fitting routine, which seems to produce more reliable fits (better convergence) and the iminuit-pml backend should give you the correct answer in a Poisson sense. I would be happy to hear if you can confirm this.

markbandstra · 2021-06-23T03:13:47Z

I'm especially not sure about the factor of 2, but I think for Gaussians
there is a factor of two between the likelihood and chi-squared... so
perhaps there is something similar here?

That would be my guess. It looks like lmfit usually works with residuals and minimizing the sum of their squares, just like scipy.optimize.leastsq. The chi-squared statistic is exactly this sum, and in gaussian statistics the chi-squared statistic is 2 times the negative log likelihood, up to an additive constant. Likewise, model - scipy.special.xlogy(data, model) is the Poisson negative log likelihood, up to an additive constant, so including the factor of 2 makes sense if we are to "hack" lmfit.

It probably would be easiest to resolve this issue if someone can make a minimum working example (MWE) that demonstrates this, like fitting a constant model to a few simulated data points.

arunpersaud · 2021-06-23T05:40:54Z

I used the code in issue #251 which can be used as a MWE. Something like:

import numpy as np
import becquerel as bq

np.random.seed(2021)
samples = np.random.normal(size=1000)
counts, edges = np.histogram(samples, bins=100)

fitter = bq.Fitter(
    model=['gauss'],
    x=edges[:-1],
    y=counts,
    y_unc=np.sqrt(counts)
)
fitter.fit(backend='lmfit-pml')

fitter.param_val('gauss_amp') / np.diff(edges)[0]  
fitter.param_unc('gauss_amp') / np.diff(edges)[0]

markbandstra · 2021-06-23T20:16:10Z

Thanks @arunpersaud

@cosama as we discussed at the meeting today, there is an easy fix here, to include the factor of 2 in the reduce_fcn here.

Another fix could be to use the deviance residuals. Not sure if this will work properly, but it would involve modifying this line

diff = model - scipy.special.xlogy(data, model)

to

diff = np.sign(data - model) * np.sqrt(2 * (scipy.special.xlogy(data, data / model) - data + model))

where the sqrt(2) is accounting for the negative log likelihood vs chi-squared difference. The reference for the deviance residual is McCullagh & Nelder, Generalized Linear Models, 2nd Ed, 1989:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible solution to lmfit-pml issue? #277

possible solution to lmfit-pml issue? #277

arunpersaud commented Jun 23, 2021

cosama commented Jun 23, 2021 •

edited

Loading

markbandstra commented Jun 23, 2021

arunpersaud commented Jun 23, 2021

markbandstra commented Jun 23, 2021

possible solution to lmfit-pml issue? #277

possible solution to lmfit-pml issue? #277

Comments

arunpersaud commented Jun 23, 2021

cosama commented Jun 23, 2021 • edited Loading

markbandstra commented Jun 23, 2021

arunpersaud commented Jun 23, 2021

markbandstra commented Jun 23, 2021

cosama commented Jun 23, 2021 •

edited

Loading