Guide.Rmd

---
title: '**mixtureReg**: A Quick Start'
output:
  pdf_document: default
  html_notebook: default
  html_document: default
---

In this tutorial, we are going to show how to use **mixtureReg** to model data with two possible regimes.

# Data
The data used for demostration purpose here is the CO2 data set from the **mixtools** package.
```{r}
library(mixtools)
data("CO2data")
head(CO2data)
```


# A simple example
The motivation of mixture of regressions is that there can be two different regimes in the data so we want to fit two lines through the data.

We can easily achieve this by putting two regression formula into a list and feed it into the **mixtureReg** function.

In this case, the message shows that the model converges in 32 iterations.

```{r}
library(mixtureReg)

mx1 <- mixtureReg(
  regData = CO2data,
  formulaList = list(formula(CO2 ~ GNP),
                     formula(CO2 ~ GNP)),
  mixingProb = "Constant"
)
```

#### The fit
We provide a plot method (S3 method) to visualize the predictions from the model.
The circles below are the original data points and the red lines are predictions from our model.
```{r}
plot(mx1, which = 1)
```

#### The weights
Other than predictions, The mixture of regressions also produces weight estimates for each data point which indicate the posterior probabilities of membership to the regression lines.
We provide another plot method to visualize these weights.

```{r}
plot(mx1, which = 2)
```

#### The iterations
(Mainly for debugging purposes) We also provide a *monitor* component for modelers to learn more about what are happening in iterations.

```{r}
head(mx1$monitor)
```


# Flexible modeling
A nice feature of this package is that we can flexibly specify the formula as we would in **lm**.

For example, we can restrict one regression line to be horizontal with no slope coefficient.

This example also helps to demonstrate the usage of "yName" and "xName" arguments in the plot method.
When not specified, the plot method will search the variables in the first formula, which will not work in this case.

```{r}
mx2 <- mixtureReg(
  regData = CO2data,
  formulaList = list(formula(CO2 ~ 1),
                     formula(CO2 ~ GNP)),
  mixingProb = "Constant"
  )

plot(mx2, yName = "CO2", xName = "GNP", which = 1)
```

We can also specify 2nd order polynomial lines.
```{r}
mx3 <- mixtureReg(
  regData = CO2data,
  formulaList = list(formula(CO2 ~ GNP + I(GNP^2)),
                     formula(CO2 ~ GNP + I(GNP^2))),
  mixingProb = "Constant"
  )

plot(mx3, which = 1)
```

We can also fit three (or more) regressions at the same time.
```{r}
mx4 <- mixtureReg(
  regData = CO2data,
  formulaList = list(formula(CO2 ~ GNP),
                     formula(CO2 ~ GNP),
                     formula(CO2 ~ GNP)),
  mixingProb = "Constant"
  )

plot(mx4, which = 1)
```


# Predictor dependent mixing probabilities
Benaglia et al. (2009), proposed this modelling strategy through kernal smoothing.
In the **mixtools** package this functionalities is offered in **mixtools::regmixEM.loc()**.

In this package we followed the same spirit but implement the smoothing based on R's own **stats::loess()**

To enable this flexibility, simply specify it in the function call.

```{r}
mx5 <- mixtureReg(
  regData = CO2data,
  formulaList = list(formula(CO2 ~ GNP),
                     formula(CO2 ~ GNP)),
  mixingProb = "loess"
  )

plot(mx5, which = 1)
plot(mx5, which = 2)
```

# Extra plotting power
Using **plot.mixtureRegList()** function, we can compare two or more models.

For example, let's gather a few models and overlay them on top of each other.

```{r}
plot.mixtureRegList(mixtureRegList = list(mx1, mx3, mx4, mx5),
                    xName = "GNP", yName = "CO2")
```

Another example.
Here we fit two models using two subsets of the data.

```{r}
plot.mixtureRegList(
  mixtureRegList = list(
    "mx6" = mixtureReg(
      regData = CO2data[1:14, ],
      formulaList = list(formula(CO2 ~ GNP),
                         formula(CO2 ~ GNP)),
      mixingProb = "Constant"
    ),
    "mx7" = mixtureReg(
      regData = CO2data[15:28, ],
      formulaList = list(formula(CO2 ~ GNP),
                         formula(CO2 ~ GNP)),
      mixingProb = "Constant"
    )
  ),
  xName = "GNP", yName = "CO2")
```


# Comparison with **mixtools**
The main shortcoming of **mixtools** is that it doesn't provide easy to use options to restrict model coefficients like we do in model 2.

The following is an example from Tatiana Benaglia, Didier Chauveau, David R. Hunter, Derek Young (2009).
This example produces similar results with our model 1.

```{r}
compare1 <- mixtools::regmixEM(
  CO2data$CO2, CO2data$GNP,
  lambda = c(1/4, 3/4),
  beta = matrix(c(2, 0, 0, 1), 2, 2),
  sigma = c(1,1)
  )

plot(compare1, whichplots = 2)
```

# References

de Veaux RD (1989). "Mixtures of Linear Regressions." Computational Statistics and Data Analysis, 8, 227-245.

Tatiana Benaglia, Didier Chauveau, David R. Hunter, Derek Young (2009). mixtools: An R Package for Analyzing Finite Mixture Models. Journal of Statistical Software, 32(6), 1-29. URL http://www.jstatsoft.org/v32/i06/.