This project shows how to find anomalies in financial time series data, specifically the stock values of Apple (AAPL), using a LSTM Autoencoder. Stock price anomalies may be a sign of major market events like crashes, surges in volatility, or other unusual activity. The model identifies these anomalies based on reconstruction error, which highlights unusual patterns in the data that deviate from historical trends.
- LSTM (Long Short-Term Memory): A type of Recurrent Neural Network (RNN) ideal for time-series data.
- Autoencoder: A neural network used for unsupervised learning of data representations through compression and reconstruction.
- Anomaly Detection: Identifying data points that differ significantly from the expected behavior in a time-series.
data/: Folder containing the processed data files.notebooks/: Jupyter Notebooks for the whole project.model/: Saved model (lstm_autoencoder_model.h5) for anomaly detection.images/: Acquired data visualizations from the model.README.md: Project documentation.
- Python Version: 3.11.10
- Required libraries:
numpypandastensorflowyfinancescikit-learnmatplotlibtalib
To install the required libraries, run:
pip install -r requirements.txtThe stock price data for Apple (AAPL) was collected from Yahoo Finance using the yfinance library. The dataset includes the following features:
- Open: The opening price of the stock.
- High: The highest price of the stock.
- Low: The lowest price of the stock.
- Close: The closing price of the stock.
- Adj Close: The adjusted closing price of the stock.
- Volume: The total trading volume.
Additionally, several technical indicators were calculated using the TA-Lib library:
- MACD: Moving Average Convergence Divergence
- RSI: Relative Strength Index
- SMA_20: 20-period Simple Moving Average
- EMA_20: 20-period Exponential Moving Average
- ADX: Average Directional Index
These indicators gives the model more features for it to train.
The following preprocessing steps were applied to the data:
- Scaling: The data was scaled using MinMaxScaler from
sklearnto ensure all features are in the range [0, 1]. - Sequence Creation: Time series data was converted into sequences of length 30 to use them as inputs.
- Train-Test Split: The data was split into training and testing sets using
train_test_split.
An LSTM Autoencoder architecture was used to reconstruct the input time series data and detect anomalies. The model consists of:
- Encoder: LSTM layers to compress the input sequences into a latent space representation.
- Decoder: LSTM layers to reconstruct the original sequences from the latent space.
- Reconstruction Loss: The reconstruction error (difference between original and reconstructed data) is used to identify anomalies.
- LSTM units: 128 and 64 units for both the encoder and decoder layers.
- Batch Size: 64
- Epochs: 50
- Activation function: ReLU for the encoder and decoder layers.
Anomalies are detected based on the reconstruction error. A threshold is defined to classify points with higher reconstruction errors as anomalies. The threshold was set by evaluating the reconstruction error distribution on the test set.
- Reconstruction Error: The model computes the reconstruction error for each data point.
- Anomaly Threshold: A threshold is set based on the distribution of reconstruction errors.
- Flag Anomalies: Points with reconstruction errors exceeding the threshold are flagged as anomalies.
- Reconstruction Error Plot: Visualizes the reconstruction error for each data point in the test set.
- Anomaly Plot: Shows detected anomalies along with normal data points.
In the test set, 213 anomalies were detected, which can represent unusual market behavior, significant price shifts, or volatility.
Example output visualizations:
- Training and Validation Loss for Dataset:

- Reconstruction Error for Test Data:

- Example Anomalies Detected:

This project demonstrates how an LSTM Autoencoder can be effectively used for anomaly detection in financial time series data. The model successfully identifies potential anomalies in Apple stock prices, which can be useful for detecting market events like crashes or abnormal price movements.
While the model’s performance could be further evaluated using ground truth labels based on its availibilty, the unsupervised nature of the approach makes it valuable for real-world financial data analysis, where labeled anomalies are often scarce.
- Hyperparameter Tuning: Experiment with different architectures, LSTM units, batch sizes, and epochs to optimize the model.
- Out-of-Sample Testing: Test the model on data from other companies or market segments to evaluate generalization.
- Advanced Anomaly Detection: Implement more advanced techniques like Isolation Forests or Autoencoder Variants for anomaly detection.
-
Clone the repository:
git clone https://github.com/pratycodes/stock_sentry.git
-
Install dependencies:
pip install -r requirements.txt
-
Run the notebook or script:
- For Jupyter Notebook:
jupyter notebook
- Or run the Python script for model training and anomaly detection.
- For Jupyter Notebook:
-
Visualize results and interpret anomalies in the output graphs.
This project is licensed under the MIT License - see the LICENSE file for details.