-
Notifications
You must be signed in to change notification settings - Fork 17
/
intro_r_session5_slides.Rmd
199 lines (150 loc) · 5.32 KB
/
intro_r_session5_slides.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: 'Intro to R Workshop: Session 5'
output: slidy_presentation
smaller: true
date: "April 13, 2017"
subtitle: UCI Data Science Initiative
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## Session 5 Agenda
+ More Plotting (Line Charts)
+ Logistic Regression
## Line Charts
```{r echo=TRUE}
cars = c(2,3,4,3,5)
plot(cars, type = "o")
```
## Line Charts - Change Some Attributes
```{r echo=TRUE}
plot(cars, type="o", col="red", ylim=c(0.5,5),
xlab="Months", ylab="Sales in Millions",
main="Autos Sales")
```
## Line Charts - Add Another Line
```{r echo=TRUE}
plot(cars, type="o", col="red", ylim=c(0.5,5),
xlab="Months", ylab="Sales in Millions",
main="Autos Sales")
suvs = c(1,4,3,3,4)
lines(suvs, type="o", col="blue", lty=2, pch=6)
```
## Line Charts - Add a Horizontal Line
```{r echo=TRUE}
plot(cars, type="o", col="red", ylim=c(0.5,5),
xlab="Months", ylab="Sales in Millions",
main="Autos Sales")
lines(suvs, type="o", col="blue", lty=2, pch=6)
abline(h=c(3,4), lty=2, col="darkgrey") # h for horizontal
```
## Line Charts - Add Legend
```{r echo=TRUE}
plot(cars, type="o", col="red", ylim=c(0.5,5),
xlab="Months", ylab="Sales in Millions",
main="Autos Sales")
lines(suvs, type="o", col="blue", lty=2, pch=6)
abline(h=c(3,4), lty=2, col="darkgrey") # h for horizontal
legend(4.2, 1.2, c("cars","suvs"), col=c("red","blue"), lty=1:2, pch=c(1,6), bty="n")
```
## Logistic Regression - Data
+ `glm()` is used to fit generalized linear models (logistic regression in this example).
+ Logistic regression is used to model the probability of a binary outcome.
+ For instance, say we want to predict female labor force participation using the `Mroz` data.
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
library(car)
data(Mroz) # load Mroz data
str(Mroz)
```
## Logistic Regression - Data Description
+ lfp: female labor-force participation (factor with levels no/yes)
+ k5: number of children 5 years old or younger
+ k618: number of children 6 to 18 years old
+ age: in years
+ wc: wife's college attendance (factor with levels no/yes)
+ hc: husband's college attendance (factor with levels no/yes)
+ lwg: log expected wage rate
+ for women in the labor force, the actual wage rate
+ for women not in the labor force, an imputed value
+ inc: family income exclusive of wife's income
## Logistic Regression - Exploratory Data Analysis (EDA)
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
summary(Mroz)
```
## Logistic Regression - EDA
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
barplot(table(Mroz$lfp), col = "blue",
main = "Count of Females by Labor Force Participation",
xlab = "Labor Force Participation Status", ylab = "Number of Females")
```
## Logistic Regression - EDA
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
counts <- table(Mroz$lfp, Mroz$k5)
barplot(counts, col = c("blue","red"),
main = "Labor Force Participation by # of Children < 5",
xlab = "# of Children < 5", ylab = "Count",
legend = rownames(counts))
```
## Logistic Regression - EDA
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
counts <- table(Mroz$lfp, Mroz$k5)
barplot(counts, col = c("blue","red"),
main = "Labor Force Participation by # of Children < 5",
xlab = "# of Children < 5", ylab = "Count",
legend = rownames(counts), beside = T)
```
## Logistic Regression - EDA
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
plot(Mroz$age, Mroz$lwg, col = ifelse(Mroz$lfp == "yes", "red", "blue"),
main = "Age vs. Log Wage by LFP", xlab = "Age (Years)", ylab = "Wage")
legend(x = 50, y = -1.5, legend = c("in LF", "Out of LF"),
fill = c("red", "blue"), bty = "n")
```
## Logistic Regression - Model Fit
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
fitLogistic <- glm(lfp ~ k5 + age, family=binomial(logit), data=Mroz)
fitLogistic
```
## Logistic Regression - Model Fit
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
summary(fitLogistic)
```
## Logistic Regression - Model Fit
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
names(fitLogistic)
```
## Logistic Regression - Exponentiated CIs
+ 95% CI for exp(coefficients) (profile likelihood method)
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
exp(confint(fitLogistic, level=0.95))
```
+ 95% CI for exp(coefficients) (Wald confidence interval)
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
exp(confint.default(fitLogistic, level=0.95))
```
## Logistic Regression - Predictions
To get probabilities, be sure to use type = "response"
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
head(Mroz,1)
XB <- as.vector(head(predict(fitLogistic),1))
XB
p <- as.vector(head(predict(fitLogistic, type = "response"),1))
p
exp(XB)/(1 + exp(XB)) # inverse logit function
```
## Logistic Regression - Updates
+ Update model by adding 'inc' and 'lwg'
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
fitLogistic2 = update(fitLogistic, . ~ . + inc + lwg, data=Mroz)
```
+ After update
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
fitLogistic2
```
## Model Comparison
+ Use change of deviance of fitted model
```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
anova(fitLogistic, fitLogistic2, test='LRT')
```
## End of Session 5
Break