-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC_2018_project_modelselection
Following up on one of our very first GSoC (2011) projects, this project intends to clean up, unify, extend, and scale-up Shogun's modelselection and hyper-parameter tuning framework. It is a cool mixture of modernizing existing code, using multi-threaded (and potentially distributed) concepts, and playing with black-box optimization frameworks.
Medium to advanced. Depends on ambitions, but we are flexible on student's abilities.
You need to know about
- Modelselection basics (x-validation, search-algorithms, implementation)
- Shogun's modelselection framework
- Shogun's parameter framework
- C++
- Optimisation frameworks like MOE or cma-es
- Knowledge of other libraries' approaches (sklearn, MLPack)
Every learning algorithm (CMachine
subclass) should work with x-validation ... fast!
This is completely independent of any hyper-parameter tuning.
- All model classes should be systematically tested with x-validation, see issue. This is similar to the trained model tests.
- Identify models that do only perform read-only operations on the features (this will be all models later, depending on the progress offeatures-detox project).
- Enable multi-core x-validation using openmp, via cloning of the underlying learning machine, but with shared features (memory efficiency!).
- Carefully test the chosen models for race-conditions, memory errors, etc.
- Add algorithms on a one-by-one basis.
- Generalise code of the "trained model serialization" tests to a "trained model" tests, where multiple things can be checked for the trained models (serialization, x-validation for now).
- Make sure model-selection has a progress bar, is stoppable, continue-able, etc. See also the black-box project
We recently changed our internal parameter framework ... or more: we are in the progress of changing it. The new framework is cleaner, neater, and as such easier to handle.
Before we start tuning parameters in an automatic way, we need to remove the traces of the old framework. This is a messy task that requires diving deeply into the Shogun core, but don't worry -- we will help you :)
- Remove the
m_modelselection_parameters
field fromCSGObject
- This will break many things (most of all the current model-selection framework). Fix the problems (good initial task)
- Remove the
TParameter
construct and eventually the classParameter
We want to build a better way to specify free parameters to learn, which overlaps with the user experience project. The current way is to build parameter trees whose structure matches the learning machine, see e.g. here We would like to shop around other libraries for ideas on specifying this.
Sergey, could you put some API ideas here?
Some steps:
- Review and compare other libraries ' approaches
- Collect the most common use cases (random search, grid-search, gradient search (e.g. in our Gaussian Process framework))
- Come up with a set of clean API examples / user stories for those cases
- Draft code how to implement this API. This will include ways to annotate the spaces that parameters live in, as well as whether gradients are available.
- Implement and test systematically
- Make sure it works nicely in all target languages.
Bayesian optimisation and stochastic optimisation are powerful frameworks for blackbox optimisation. We aim to integrate bindings for both during the project. There is plenty of external libraries that do the algorithms for us, so this task is mostly about designing interfaces that tell Shogun to cross-validate the algorithm on the next set of parameters and reporting its performance. We aim for both MOE and CMA-ES.
There is hardly any algorithm without free parameters. Currently Shogun only has brute force search to tune them automatically. While this works for SVMs, it it hopeless for anything more than 2 parameters. Certainly, a clean and easy way to quickly tune parameters would massively boost Shogun's usability. The project spans a huge range on topics within and outside of Shogun, including framework internals as well as cutting edge algorithms for optimisation. Super interesting even for ourselves. Be ready to learn a lot.
- Shogun's modelselection classes
- Parameter trees
- MOE are a plus
- CMA-ES
- entrance task on testing xvalidation