-
Notifications
You must be signed in to change notification settings - Fork 3
/
Guide.Rmd
180 lines (138 loc) · 5.03 KB
/
Guide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: '**mixtureReg**: A Quick Start'
output:
pdf_document: default
html_notebook: default
html_document: default
---
In this tutorial, we are going to show how to use **mixtureReg** to model data with two possible regimes.
# Data
The data used for demostration purpose here is the CO2 data set from the **mixtools** package.
```{r}
library(mixtools)
data("CO2data")
head(CO2data)
```
# A simple example
The motivation of mixture of regressions is that there can be two different regimes in the data so we want to fit two lines through the data.
We can easily achieve this by putting two regression formula into a list and feed it into the **mixtureReg** function.
In this case, the message shows that the model converges in 32 iterations.
```{r}
library(mixtureReg)
mx1 <- mixtureReg(
regData = CO2data,
formulaList = list(formula(CO2 ~ GNP),
formula(CO2 ~ GNP)),
mixingProb = "Constant"
)
```
#### The fit
We provide a plot method (S3 method) to visualize the predictions from the model.
The circles below are the original data points and the red lines are predictions from our model.
```{r}
plot(mx1, which = 1)
```
#### The weights
Other than predictions, The mixture of regressions also produces weight estimates for each data point which indicate the posterior probabilities of membership to the regression lines.
We provide another plot method to visualize these weights.
```{r}
plot(mx1, which = 2)
```
#### The iterations
(Mainly for debugging purposes) We also provide a *monitor* component for modelers to learn more about what are happening in iterations.
```{r}
head(mx1$monitor)
```
# Flexible modeling
A nice feature of this package is that we can flexibly specify the formula as we would in **lm**.
For example, we can restrict one regression line to be horizontal with no slope coefficient.
This example also helps to demonstrate the usage of "yName" and "xName" arguments in the plot method.
When not specified, the plot method will search the variables in the first formula, which will not work in this case.
```{r}
mx2 <- mixtureReg(
regData = CO2data,
formulaList = list(formula(CO2 ~ 1),
formula(CO2 ~ GNP)),
mixingProb = "Constant"
)
plot(mx2, yName = "CO2", xName = "GNP", which = 1)
```
We can also specify 2nd order polynomial lines.
```{r}
mx3 <- mixtureReg(
regData = CO2data,
formulaList = list(formula(CO2 ~ GNP + I(GNP^2)),
formula(CO2 ~ GNP + I(GNP^2))),
mixingProb = "Constant"
)
plot(mx3, which = 1)
```
We can also fit three (or more) regressions at the same time.
```{r}
mx4 <- mixtureReg(
regData = CO2data,
formulaList = list(formula(CO2 ~ GNP),
formula(CO2 ~ GNP),
formula(CO2 ~ GNP)),
mixingProb = "Constant"
)
plot(mx4, which = 1)
```
# Predictor dependent mixing probabilities
Benaglia et al. (2009), proposed this modelling strategy through kernal smoothing.
In the **mixtools** package this functionalities is offered in **mixtools::regmixEM.loc()**.
In this package we followed the same spirit but implement the smoothing based on R's own **stats::loess()**
To enable this flexibility, simply specify it in the function call.
```{r}
mx5 <- mixtureReg(
regData = CO2data,
formulaList = list(formula(CO2 ~ GNP),
formula(CO2 ~ GNP)),
mixingProb = "loess"
)
plot(mx5, which = 1)
plot(mx5, which = 2)
```
# Extra plotting power
Using **plot.mixtureRegList()** function, we can compare two or more models.
For example, let's gather a few models and overlay them on top of each other.
```{r}
plot.mixtureRegList(mixtureRegList = list(mx1, mx3, mx4, mx5),
xName = "GNP", yName = "CO2")
```
Another example.
Here we fit two models using two subsets of the data.
```{r}
plot.mixtureRegList(
mixtureRegList = list(
"mx6" = mixtureReg(
regData = CO2data[1:14, ],
formulaList = list(formula(CO2 ~ GNP),
formula(CO2 ~ GNP)),
mixingProb = "Constant"
),
"mx7" = mixtureReg(
regData = CO2data[15:28, ],
formulaList = list(formula(CO2 ~ GNP),
formula(CO2 ~ GNP)),
mixingProb = "Constant"
)
),
xName = "GNP", yName = "CO2")
```
# Comparison with **mixtools**
The main shortcoming of **mixtools** is that it doesn't provide easy to use options to restrict model coefficients like we do in model 2.
The following is an example from Tatiana Benaglia, Didier Chauveau, David R. Hunter, Derek Young (2009).
This example produces similar results with our model 1.
```{r}
compare1 <- mixtools::regmixEM(
CO2data$CO2, CO2data$GNP,
lambda = c(1/4, 3/4),
beta = matrix(c(2, 0, 0, 1), 2, 2),
sigma = c(1,1)
)
plot(compare1, whichplots = 2)
```
# References
de Veaux RD (1989). "Mixtures of Linear Regressions." Computational Statistics and Data Analysis, 8, 227-245.
Tatiana Benaglia, Didier Chauveau, David R. Hunter, Derek Young (2009). mixtools: An R Package for Analyzing Finite Mixture Models. Journal of Statistical Software, 32(6), 1-29. URL http://www.jstatsoft.org/v32/i06/.