Machine Learning Research

1. Project Topic

1.1. Exsiting research

Benmark:

https://paperswithcode.com/sota
ACL anthology for NLP papers:

http://www.aclweb.org/anthology/
Online proceedings of major ML conferences:
- NeurIPS
- ICML, ICLR, CVPR, EMNLP, NAACL
Online preprint servers:

http://arxiv.org/
Top paper menioned on Twitter:

http://www.arxiv-sanity.com/
Others:

1.2. Datasets and Tasks

Huggingface Datasets:

https://huggingface.co/datasets
Kaggle has many datasets, though some of them are too small for Deep Learning:

https://www.kaggle.com/datasets
SOTA NLP:

https://paperswithcode.com/sota

https://nlpprogress.com/

https://gluebenchmark.com/leaderboard

https://www.conll.org/previous-tasks
A small list of well-known standard datasets for common NLP tasks: https://machinelearningmastery.com/datasets-natural-language-processing/
An alphabetical list of free or public domain text datasets:

https://github.com/niderhoff/nlp-datasets
Wikipedia has a list of machine learning text datasets, tabulated with useful information such as dataset size: https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research#Text_data
Datahub has lots of datasets, though not all of it is Machine Learning focused:

https://datahub.io/collections
Microsoft Research has a collection of datasets (look under the ‘Dataset directory’ tab): https://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fprojects%2Fdata-science-initiative%2F%20datasets.aspx#!dataset-directory
A script to search arXiv papers for a keyword, and extract important information such as performance metrics on a task:

https://huyenchip.com/2018/10/04/sotawhat.html
Datasets for machine translation:

http://statmt.org
Syntactic corpora for many languages:

https://universaldependencies.org

2. Project Advice

Processing Data

StanfordNLP: a Python library providing tokenization, tagging, parsing, and other capabilities:

https://stanfordnlp.github.io/stanfordnlp/
Other software from the Stanford NLP group: http://nlp.stanford.edu/software/index.shtml
NLTK, a lightweight Natural Language Toolkit package in Python: http://nltk.org/
spaCy, another Python package that can do preprocessing, but also includes neural models (e.g. Language Models):

https://spacy.io/

3. Top Tiers ML&AI Conferences

Site
- ML
  - NeurIPS
  - ICML
  - ICLR
- CV
  - CVPR
  - ICCV
  - ECCV
- NLP
  - ACL
  - NAACL
  - EMNLP
- Data
  - KDD
  - CIKM
  - ICDM
  - SDM
  - PAKDD
  - PKDD/ECML
  - RECSYS
  - SIGIR
  - WWW
  - WSDM
- AI
  - AAAI
  - AISTATS
  - ICANN
  - IJCAI
  - UAI
NeurIPS: Neural Information Processing Systems (formerly abbreviated NIPS). NeurIPS has gotten huge over the past few years as AI has become so important. Has a focus on neural networks, but not exclusively.

https://nips.cc
ICML: International Conference on Machine Learning. Has a general machine learning focus.

https://icml.cc
ICLR: International Conference on Learning Representations. ICLR was really the first conference focused on deep learning. It’s called “learning representations” because the motivation behind deep learning is to automatically learn higher-level features, or representations, that summarize data in useful ways. Deep Learning describes the structure of our current best solution to the problem of learning these representations.

https://iclr.cc
AAAI: Association for the Advancement of Artificial Intelligence. AAAI is a little more applications focused, and a little less theoretical than some of the other AI conferences.

http://www.aaai.org
CVPR: Computer Vision and Pattern Recognition.

https://www.thecvf.com
ICCV: International Conference on Computer Vision.

https://www.thecvf.com

4. Reference

Practical Tips for Final Projects Notes

List of great ML/AI conferences

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Research

1. Project Topic

1.1. Exsiting research

1.2. Datasets and Tasks

2. Project Advice

Processing Data

3. Top Tiers ML&AI Conferences

4. Reference

About

Releases

Packages

License

serendipity-cmyk/MLR

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Research

1. Project Topic

1.1. Exsiting research

1.2. Datasets and Tasks

2. Project Advice

Processing Data

3. Top Tiers ML&AI Conferences

4. Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages