fomo - Figuring out Machine Learning Ourselves

Fomo is an ongoing "machine learning bootcamp", based on the idea that ML is gaining traction and honestly we are collectively not super aware of the strengths and weaknesses of how this can be applied in a biological/evolutionary context. In this light, we have established of a collective monthly ML hackathon. We will meet regularly (once a month), sit down together and hash through different ML paradigms, think about how these can be applied to the kinds of data we are generating and the questions we have, and collectively work through tutorials, and maybe even do some preliminary exploratory analysis. For example, one week we might evaluate boosted regression. What are the core ideas of what this is doing? How do you achieve this in R and python? Explore how this can be applied to genetic/morphological/abundance/phylogenetic data.

The truly excellent and useful Python Data Science Handbook.
An excellent 12-part ML Course: Introduction to Machine Learning for Coders, from the makers of fastai. "Unlike many educational materials in the field, our approach is “code first” rather than “math first”."
Practical Machine Learning Tutorial with Python Introduction: An extensive ML online course based on sklearn & TensorFlow. Very, very long.
Machine Learning Online course (Stanford), with exercises on github

Pre-processing/Feature Engineering

Unsupervised methods

A very excellent, visual walk through of how t-SNE works.

Gradient boosting & Random forest

TL;DR Gradient boosting is "better", but random forest is easier to tune, and maybe faster. Additionally, gradient boosting may have trouble when the training data is noisy.

Gradient Boosting

XGBoost has nice documentation, plenty of examples and tutorials and R & python interfaces. It seems this is a very strong ML package, but could be more difficult to use than other options.
[Another explanation of GBM (including a nice visual representation))[https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/]. Also goes into extensive depth explaining all the parameters.

Deep learning

Zhou et al 2018 A primer on deep learning in genomics

Deep learning frameworks

fastai: Python wrapper around pytorch focused on making construction and training of NN fast and easy. Good documentation and examples (focused on vision, text classficiation, and tabular datasets).
keras, a high level python deep learning library. Tons of excellent examples in the repo.

Time series

Time Series Prediction Using LSTM Deep Neural Networks - This person actually implements the LSTM neural network from scratch! Cool for learning about the nuts and bolts.

Other ML-like applications (with special attention to biodiversity analysis)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fomo - Figuring out Machine Learning Ourselves

Pre-processing/Feature Engineering

Unsupervised methods

Gradient boosting & Random forest

Gradient Boosting

Deep learning

Deep learning frameworks

Time series

Other ML-like applications (with special attention to biodiversity analysis)

About

Releases

Packages

isaacovercast/fomo

Folders and files

Latest commit

History

Repository files navigation

fomo - Figuring out Machine Learning Ourselves

Pre-processing/Feature Engineering

Unsupervised methods

Gradient boosting & Random forest

Gradient Boosting

Deep learning

Deep learning frameworks

Time series

Other ML-like applications (with special attention to biodiversity analysis)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages