-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC_2020_project_time_series
This project is about designing and implementing a new API for machine learning with time series.
The main goal of the project is to come up with a design and to write a reference implementation covering a few selected algorithms.
Time series are everywhere in real-world applications and there has been an increase in interest in time series toolboxes recently (see e.g. sktime, tslearn, tsml).
But there are still very few principled time-series libraries out there, so you would be working on something that could be very useful for a large number of people :)
Medium.
You need know
- What time series are
- Some basic algorithms that work on time series
- Have some basic understanding of how major ML libraries deal with time series (e.g. pandas, sktime)
- Know a thing or two about object oriented API design
- Understand how the (new) Shogun API works (see meta examples)
The very first step is to understand that we first and foremost want to develop an API for time series problems. Rather than directly start implementing algorithms, it is most crucial to first have a couple of iterations on deciding the following questions:
- How do other ML libs deal with time series? We need to document and compare, then pick the best approaches.
- What are the use-cases we want to serve with this project? Write down pseudo code for time series applications that users might have. This requires some creativity or web-search to find a problem. Then imagine you already built the framework -- what would the user interface look like? It would be best to write down such use-cases for multiple algorithm classes (classification, prediction, smoothing, etc).
- What base classes have to be added for data representation, transformations, and algorithms? A class diagram would be useful here.
- What are the most basic time series algorithms that you would like to add (Kalman filters for sure, what else?)
On a more abstract level, the task is to design/implement API for defining time series related supervised learning tasks and for specifying corresponding learning strategies that can solve the given task, e.g. prediction algorithms possibly with prior data transformations.
More concretely, this would involve the following tasks:
- Extend existing Shogun data containers (Features) to handle time series/panel data,
- Implement time series/panel data specific transformers that work on new data container (e.g. Fourier transform, auto-correlation, etc.),
- Implement time-series classification algorithms (for a good overview of algorithms, see e.g. this paper; for Python implementations see e.g. sktime; for Java implementations, see e.g. this code repository),
A good starting point that combines all of the above implementations may be time series forest, a generalisation of random forest in which each tree of the ensemble extracts features from random intervals of the time series before fitting a decision tree on the extracted features, so that each tree is no longer a single estimator but rather a pipeline chaining prior transformations with a final estimator.
Finally, this also involves more applied tasks:
- Benchmark and compare implemented algorithms in terms of speed and predictive performance on the collection of UEA/UCR datasets against existing Java and Python implementations.
- Apply and benchmark implemented algorithms on your favourite time series dataset.
- To better understand the predictions of time series forest algorithm, implement/extent the computation of simple feature importances to feature importance curves over time (as described in the paper).
Besides time series classification, there are also other interesting time series/panel data related learning tasks, including:
- Forecasting (simple algorithms are e.g. ARIMA model or Kalman filter),
- Sequential supervised learning (see this paper, for a good introduction and overview).
You like designing nice APIs for ML problems? You enjoy implementing fundamental algorithms and showcase how those are used? You are not afraid of writing some C++? You don't mind writing documentation? Then this project is for you!