Skip to content

Latest commit

 

History

History
162 lines (106 loc) · 7.53 KB

README.md

File metadata and controls

162 lines (106 loc) · 7.53 KB

torch_arima

ARIMA time series implementation in PyTorch, with optional support for Bayesian inference using the Pyro probablistic programming library, supporting the following model types:

Model Type Location Description
ARIMA ARIMA.ARIMA torch.nn.Module with ARIMA polynomial coefficients as parameters and a forward method that converts observations to innovations and a predict method which converts innovations to observations.
VARIMA ARIMA.VARIMA Same as ARIMA.ARIMA with support for vector innovations and vector observations.
Bayesian ARIMA ARIMA.BayesianARIMA pyro.nn.PyroModule wrapper around ARIMA.ARIMA with support for priors to all polynomial coefficients and innovations distribution parameters.
Bayesian VARIMA ARIMA.BayesianVARIMA Same as ARIMA.VARIMA with support for vector innovations and vector observations.

Installation

For local package installation that enables modifying the package source code in place without reinstalling run

pip install -e .

Tests

Package tests can be executed by running

python -m ARIMA

Additional tests can be executed by running pytest within the directory where the package is installed.

Tutorials

Currently there is only one tutorial:

Unfortunately, most of the basic stuff is covered in the examples found here and described below, which is not as convenient as walking through a tutorial.

Examples

All the examples can be run at once by executing

python -m ARIMA.examples

This will create additional comparisons between the median predictions and confidence intervals of the MLE and Bayesian estimators.

ARIMA Examples

Maximum Likelihood Estimator

Utilizes torch optimizers in order to find the maximum likelihood estimator. Run by executing

python -m ARIMA.examples.mle

The below graphs will be created.

Bayesian Estimator

Utilizes pyro which is based on torch in order to find the Bayesian posterior. Run by executing

python -m ARIMA.examples.bayesian

The Bayesian ARIMA model is described by the below directional graph.

Additionally, the below graphs of the model outputs will be created.

The Bayesian estimator can also estimate missing samples that occur at arbitrary times.

The accuracy of the predicted distribution of the missing samples can be measured using the energy score scoring rule (see equation 22 in Gneiting2007jasa.pdf). The energy score is minimized if and only if the predicted distribution is equal to the true distribution. In case of a fixed prediction the energy score is equal to the Euclidean distance between the multivariate prediction and the multivariate observation, which makes the energy score a natural extension of the Eucidean distance for fixed predictions to a scoring rule for non-fixed predicted distributions. The energy score can also be used as a score in a K-Fold cross validation scheme, where different folds have different missing samples.

Comparison Between the Maximum Likelihood and Bayesian Estimators

It can be seen that the two estimators have different median predictions, and that as less observed data is available the MLE estimator becomes more confident in its predictions, whereas the Bayesian estimator becomes less confident in its predicitons, especially for the short term predictions.

Bayesian VARIMA Example

The example can be run by executing

python -m ARIMA.examples.mortality

The Bayesian VARIMA model is described by the below directional graph.

The below graph shows predicted weekly death counts for males and females. The model captures annual periodic changes in mortality and correlations between female and male death counts.

Viewed as an yearly moving sum the COVID-19 effect on death counts can be viewed more clearly as annual periodic changes in mortality are averaged out.

The effect of COVID-19 on short term death count predictions can be visualized by comparing predictions of a model that did not observe death counts during the COVID-19 pandemic (a.k.a. Pre COVID model), to a model that observed the most up to date data available (a.k.a. Post COVID model).

The importance of using a VARIMA model, rather then a model comprised of two independent ARIMA models (a.k.a. Multiple ARIMA model), can be seen in the graph below where the confidence interval of the VARIMA model is much larger (as should be) than that of the Multiple ARIMA model, as it correctly captures the correlation between death counts of females and males.

Design

An ARIMA(p,d,q) time series is defined by the equation (courtesy of Wikipedia)

with $X_i$ being the observations, $\epsilon_i$ being the innovations, and $L$ is the lag operator.

The determinant of the Jacobian of the transformation from innovations to observations is equal to one since

$$\begin{align} \frac{\partial X_i}{\partial \epsilon_i} &= 1 \text{ for all } i \\\ \frac{\partial X_i}{\partial \epsilon_j} &= 0 \text{ for all } j > i \end{align}$$

This means that the ARIMA transformation can be viewed as a change of random variable from innovations to observations, in which the probability density of innovations is equal to the probability density of the observations, which is how the core of the ARIMA module is implemented in Transform.py.