You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
glm.nb(formula = daysabs ~ math + prog, data = dat, init.theta = 1.032713156,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1547 -1.0192 -0.3694 0.2285 2.5273
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.615265 0.197460 13.245 < 2e-16
math -0.005993 0.002505 -2.392 0.0167
progAcademic -0.440760 0.182610 -2.414 0.0158
progVocational -1.278651 0.200720 -6.370 1.89e-10
Python code:
##########################################################
#
# GLM with negative binomial distribution
#
# This is an example of GLM with negative binomial distribution.
# In this example, we will be using an example from R community below
# https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/
#
# Here, we would like to predict the days absense of high school juniors
# at two schools from there type of program they are enrolled, and their math
# score.
##########################################################
# Import relevance libraries
import pandas as pd
from pyglmnet import GLM
import matplotlib.pyplot as plt
##########################################################
# Read and preprocess data
df = pd.read_stata("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")
print(df.head())
# histogram of type of program they are enrolled
df.hist(column='daysabs', by=['prog'])
plt.show()
# print mean and standard deviation for each program enrolled
print(df.groupby('prog').agg({'daysabs': ['mean', 'std']}))
##########################################################
# Feature
X = df.drop('daysabs', axis=1)
y = df['daysabs'].values
# design matrix
program_df = pd.get_dummies(df.prog)
Xdsgn = pd.concat((df['math'], program_df.drop(1.0, axis=1)), axis=1).values
##########################################################
# Predict using GLM
# theta = 0.968
theta = 1.033
# theta = 5.
distr = 'neg-binomial'
# distr = 'softplus'
glm_neg_bino = GLM(distr=distr,
alpha=0.0, reg_lambda=0.0, max_iter=1000, solver='cdfast',
score_metric='pseudo_R2', theta=theta, learning_rate=10)
glm_neg_bino.fit(Xdsgn, y)
yhat = glm_neg_bino.predict(Xdsgn)
print(glm_neg_bino.beta0_, glm_neg_bino.beta_)
print(glm_neg_bino.score(Xdsgn, yhat))
The text was updated successfully, but these errors were encountered:
Dataset: https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/
R code:
Output:
Python code:
The text was updated successfully, but these errors were encountered: