fmlpy

Welcome to Financial Machine Learning in Python!

This package is used to apply machine learning methods to financial data, which generally has very low SNR(signal to noise ratio) and thus hard to apply ML directly. You can find more detailed explaination of methods implemented in this package in Advances in Financial Machine Learning. Also you can find R version of this package at fmlr.

What is this package doing?

There are mainly three obstacles people may encouter when they are trying to apply machine learning on financial data:

Financial data are usually very heavy. The memory needed to store the limit order book of a single stock is usually at TB scale, so it's extremly slow if the algorithm needs to train a lot of parameters;
Signal to Noise ratio in financial data is very low. Since the market has too many noises, it's hard to detect or even define what signla is, which may easily lead to overfitting;
Financial data are highly correlated, which violates the independent assumption of most machine learning model. Since financial data are mostly time series data, it's hard to do cross validation with it because if we use "future data" to predict "past data", the accuracy can't really show the real performance of the model.

To deal with the three problems above, we use the following scheme before we apply any traditional machine learning algorithm to financial data:

Sample data into information bars. The goal of this step is to reduce the size of data and only preserve the data with information.
Use meta-label method to label the bars. The goal of this step is to build a feature matrix so that traditional machine learning algorithm can be applied.
Split data using purged cross validation. The goal of this step is to avoid information leakage when cross validate the model by training on "future data" and testing on "past data".

Framework

This package included four modules listed below

preprocessing
Used to preprocess raw price series. Including generate all kinds of structured bars, meta-labelling, generate fractionally differentiated series etc.
model
Used to train machine learning models. Mainly deal with cross-validation and sequential boostrap method.
backtest
Used to back test quantatitive investment strategies.
* This module will not be included in the first version
tests
Used to test the correctness of the code during development and provide examples to users after the package is deployed.

Dependecy

pandas 0.24.1
numpy 1.16.1

Installation

Use

pip install fmlpy

to install

Examples

See this pipeline for how to use most of the functions in this packagge.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.idea		.idea
fmlpy		fmlpy
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fmlpy

What is this package doing?

Framework

Dependecy

Installation

Examples

About

Releases

Packages

Contributors 2

Languages

License

crazywiden/fmlpy

Folders and files

Latest commit

History

Repository files navigation

fmlpy

What is this package doing?

Framework

Dependecy

Installation

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages