Toxic Comment Classification

Yet another toxic comment classification

Installation

Prerequisites

- Python 3.7 or higher
- GNU Make
- CUDA 10.2 or higher

Cloning

Clone the repo to your local machine:

git clone https://github.com/halecakir/toxic-comment-classification

Installating

Build the python virtual environment:

make venv/bin/activate

Fetching Data

Fetch wordvec data from multiple sources (glove, google-news, fasttext):

make fetch_all

Training

Train the model with the jigsaw data:

make train ARGS=WORD_VECTOR  # WORD_VECTOR ∈ {"google.bin", "fasttext.bin", "glove.txt"})

Testing

Test the model:

make test

Cleaning

Remove all model artifacts:

make clean

Todos

Try Attention mechanism
Try tranformers-based mechanismss
Try incorporation of hybrid (word level + character level) word vectors for words that have no pretrained vectors
Try Gradient clipping for exploding gradient
Add hyperparamerer optimization
Add sanity tests
Documentation!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
classifier.py		classifier.py
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification

Installation

Prerequisites

Cloning

Installating

Fetching Data

Training

Testing

Cleaning

Todos

About

Releases

Packages

Languages

halecakir/toxic-comment-classification

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification

Installation

Prerequisites

Cloning

Installating

Fetching Data

Training

Testing

Cleaning

Todos

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages