Yet another toxic comment classification
- Python 3.7 or higher
- GNU Make
- CUDA 10.2 or higher
Clone the repo to your local machine:
git clone https://github.com/halecakir/toxic-comment-classification
Build the python virtual environment:
make venv/bin/activate
Fetch wordvec data from multiple sources (glove, google-news, fasttext):
make fetch_all
Train the model with the jigsaw data:
make train ARGS=WORD_VECTOR # WORD_VECTOR ∈ {"google.bin", "fasttext.bin", "glove.txt"})
Test the model:
make test
Remove all model artifacts:
make clean
- Try Attention mechanism
- Try tranformers-based mechanismss
- Try incorporation of hybrid (word level + character level) word vectors for words that have no pretrained vectors
- Try Gradient clipping for exploding gradient
- Add hyperparamerer optimization
- Add sanity tests
- Documentation!