-
Wikipedia has a list of machine learning text datasets, tabulated with useful information such as dataset size
-
Datahub has lots of datasets, though not all of it is Machine Learning focused
-
Microsoft Research has a collection of datasets (look under the ‘Dataset directory’ tab)
-
SOTA NLP:
-
A small list of well-known standard datasets for common NLP tasks:
-
An alphabetical list of free or public domain text datasets:
-
Datasets for machine translation:
-
Syntactic corpora for many languages:
-
A script to search arXiv papers for a keyword, and extract important information such as performance metrics on a task:
-
StanfordNLP, a Python library providing tokenization, tagging, parsing, and other capabilities.
-
Software from the Stanford NLP Group
-
NLTK, a lightweight Natural Language Toolkit package in Python.
-
spaCy, another Python package that can do preprocessing, but also includes neural models (e.g. Language Models)
-
Machine Learning
- NeurIPS
- Neural Information Processing Systems (formerly abbreviated NIPS). NeurIPS has gotten huge over the past few years as AI has become so important. Has a focus on neural networks, but not exclusively.
- ICML
- International Conference on Machine Learning. Has a general machine learning focus.
- ICLR
- International Conference on Learning Representations. ICLR was really the first conference focused on deep learning. It’s called “learning representations” because the motivation behind deep learning is to automatically learn higher-level features, or representations, that summarize data in useful ways. Deep Learning describes the structure of our current best solution to the problem of learning these representations.
- AISTATS
- NeurIPS
-
Computer Vision
-
Natural Language Processing
-
Data
-
Artificial Intelligence
- Maths for Machine Learning
- Applied Data Science with Python Specialization
- TensorFlow Developer Professional Certificate
- Google Data Analytics Professional Certificate
- CS224W: Machine Learning with Graphs
- CS230: Deep Learning Specialization
- CS231n: Convolutional Neural Networks for Visual Recognition