Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About dof_residual #509

Closed
gragusa opened this issue Nov 16, 2022 · 3 comments · Fixed by #560
Closed

About dof_residual #509

gragusa opened this issue Nov 16, 2022 · 3 comments · Fixed by #560
Milestone

Comments

@gragusa
Copy link

gragusa commented Nov 16, 2022

I noticed something odd about dof_residual. In few words, when the family does not have a dispersion_parameter, dof_residual is bumped by 1which is inconsistent with other software (R, Stata, etc.) and with statistics more generally 😄.

using DataFrames
using StableRNGs
using RCall

rng = StableRNG(123);

x1 = rand(rng, 25);
x2 = ifelse.(randn(rng, 25) .> 0, 1, 0);
y = ifelse.(0.004 .- 0.01 .* x1 .+ 1.5 .* x2 .+ randn(rng, 25) .> 0, 1, 0);
df = DataFrame(y=y, x1=x1, x2=x2);

## Julia
l = glm(@formula(y~x1+x2), df, Binomial());

## R
@rput df;
R"rl <- glm(y~x1+x2, df, family='binomial')";
@rget rl;

println("Julia dof_residual:", dof_residual(l))
println("R dof_residual: ", rl[:df_residual])

which gives

Julia dof_residual: 23
R dof_residual: 22

The families without a dispersion_parameter are Bernoulli, Binomial, Poisson:

dispersion_parameter(D) = true
dispersion_parameter(::Union{Bernoulli, Binomial, Poisson}) = false

Fortunately, dof_residuals is not used to scale the vcov and so this bug does not have important ramifications. I also noted that we are not testing dof_residual for these families.

The fix is relatively simple: change

dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + 1

to

dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + dispersion_parameter(obj.rr.d)
@nalimilan
Copy link
Member

Good catch. Actually this definition seems to have been there forever, though the implementation has been changed recently by #265. We should probably add a docstring to explain more precisely what this method returns as it's not necessarily obvious for users how the dispersion parameter must be handled (R doesn't say anything).

Cc: @andreasnoack @Nosferican

@alfa33333
Copy link

What is dof returning? I had a simple linear regression model which in principle, should have only one intercept and one predictor, but dof returns 3., although dof_residuals returns the right value.

From the discussion, it seems to add a third parameter, the dispersion, but this is not clear from the documentation, especially for linear regression models.

@andreasnoack
Copy link
Member

See also #560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants