Anomaly Detection in Stock Prices using LSTM Autoencoder

Project Overview

This project shows how to find anomalies in financial time series data, specifically the stock values of Apple (AAPL), using a LSTM Autoencoder. Stock price anomalies may be a sign of major market events like crashes, surges in volatility, or other unusual activity. The model identifies these anomalies based on reconstruction error, which highlights unusual patterns in the data that deviate from historical trends.

Key Concepts:

LSTM (Long Short-Term Memory): A type of Recurrent Neural Network (RNN) ideal for time-series data.
Autoencoder: A neural network used for unsupervised learning of data representations through compression and reconstruction.
Anomaly Detection: Identifying data points that differ significantly from the expected behavior in a time-series.

Project Structure

data/ : Folder containing the processed data files.
notebooks/ : Jupyter Notebooks for the whole project.
model/ : Saved model (lstm_autoencoder_model.h5) for anomaly detection.
images/ : Acquired data visualizations from the model.
README.md : Project documentation.

Requirements

Python Version: 3.11.10
Required libraries:
- numpy
- pandas
- tensorflow
- yfinance
- scikit-learn
- matplotlib
- talib

To install the required libraries, run:

pip install -r requirements.txt

Data Collection

The stock price data for Apple (AAPL) was collected from Yahoo Finance using the yfinance library. The dataset includes the following features:

Open: The opening price of the stock.
High: The highest price of the stock.
Low: The lowest price of the stock.
Close: The closing price of the stock.
Adj Close: The adjusted closing price of the stock.
Volume: The total trading volume.

Additionally, several technical indicators were calculated using the TA-Lib library:

MACD: Moving Average Convergence Divergence
RSI: Relative Strength Index
SMA_20: 20-period Simple Moving Average
EMA_20: 20-period Exponential Moving Average
ADX: Average Directional Index

These indicators gives the model more features for it to train.

Data Preprocessing

The following preprocessing steps were applied to the data:

Scaling: The data was scaled using MinMaxScaler from sklearn to ensure all features are in the range [0, 1].
Sequence Creation: Time series data was converted into sequences of length 30 to use them as inputs.
Train-Test Split: The data was split into training and testing sets using train_test_split.

Model Architecture

An LSTM Autoencoder architecture was used to reconstruct the input time series data and detect anomalies. The model consists of:

Encoder: LSTM layers to compress the input sequences into a latent space representation.
Decoder: LSTM layers to reconstruct the original sequences from the latent space.
Reconstruction Loss: The reconstruction error (difference between original and reconstructed data) is used to identify anomalies.

Model Hyperparameters:

LSTM units: 128 and 64 units for both the encoder and decoder layers.
Batch Size: 64
Epochs: 50
Activation function: ReLU for the encoder and decoder layers.

Anomaly Detection

Anomalies are detected based on the reconstruction error. A threshold is defined to classify points with higher reconstruction errors as anomalies. The threshold was set by evaluating the reconstruction error distribution on the test set.

Steps to Detect Anomalies:

Reconstruction Error: The model computes the reconstruction error for each data point.
Anomaly Threshold: A threshold is set based on the distribution of reconstruction errors.
Flag Anomalies: Points with reconstruction errors exceeding the threshold are flagged as anomalies.

Results

Visualization:

Reconstruction Error Plot: Visualizes the reconstruction error for each data point in the test set.
Anomaly Plot: Shows detected anomalies along with normal data points.

In the test set, 213 anomalies were detected, which can represent unusual market behavior, significant price shifts, or volatility.

Example output visualizations:

Training and Validation Loss for Dataset:
Reconstruction Error for Test Data:
Example Anomalies Detected:

Conclusion

This project demonstrates how an LSTM Autoencoder can be effectively used for anomaly detection in financial time series data. The model successfully identifies potential anomalies in Apple stock prices, which can be useful for detecting market events like crashes or abnormal price movements.

While the model’s performance could be further evaluated using ground truth labels based on its availibilty, the unsupervised nature of the approach makes it valuable for real-world financial data analysis, where labeled anomalies are often scarce.

Future Improvements

Hyperparameter Tuning: Experiment with different architectures, LSTM units, batch sizes, and epochs to optimize the model.
Out-of-Sample Testing: Test the model on data from other companies or market segments to evaluate generalization.
Advanced Anomaly Detection: Implement more advanced techniques like Isolation Forests or Autoencoder Variants for anomaly detection.

How to Run the Code

Clone the repository:

git clone https://github.com/pratycodes/stock_sentry.git

Install dependencies:
```
pip install -r requirements.txt
```
Run the notebook or script:
- For Jupyter Notebook:
```
jupyter notebook
```
- Or run the Python script for model training and anomaly detection.
Visualize results and interpret anomalies in the output graphs.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection in Stock Prices using LSTM Autoencoder

Project Overview

Key Concepts:

Project Structure

Requirements

Data Collection

Data Preprocessing

Model Architecture

Model Hyperparameters:

Anomaly Detection

Steps to Detect Anomalies:

Results

Visualization:

Conclusion

Future Improvements

How to Run the Code

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
images		images
model		model
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

pratycodes/Stock-Sentry

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection in Stock Prices using LSTM Autoencoder

Project Overview

Key Concepts:

Project Structure

Requirements

Data Collection

Data Preprocessing

Model Architecture

Model Hyperparameters:

Anomaly Detection

Steps to Detect Anomalies:

Results

Visualization:

Conclusion

Future Improvements

How to Run the Code

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages