News Category Classification with BERT

Identify the type of news based on headlines and short descriptions

Dataset

This dataset contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost. The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of language used in different news articles. Kaggle

Implementations

BERT (Fine-Tuning)
Bi-GRU + CONV
LSTM + Attention

Try it on Colab Notebook

TL;DR

glove.840B.300d (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download) was used as the embedding layer for the Bi-GRU and LSTM models.
bert-base-uncased (12-layer, 768-hidden, 12-heads, 110M parameters) pre-trained model was used.

Resuts

BERT - test_accuracy: 0.72, test_loss: 0.0015671474330127238
Bidirectional GRU + Conv - test_accuracy: 0.6545
LSTM with Attention - test_accuracy: 0.67144

Requirements

Python 3.6
PyTorch 0.4.1/1.0.0 - For the creation of BiLSTM-CRF architecture
pytorch-pretrained-bert - https://github.com/huggingface/pytorch-pretrained-BERT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
Models.ipynb		Models.ipynb
README.md		README.md
bert_classification.py		bert_classification.py
bert_classification_model.ipynb		bert_classification_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Category Classification with BERT

Dataset

Implementations

Try it on Colab Notebook

TL;DR

Resuts

Requirements

About

Releases

Packages

Languages

rootally/News-Category-Classification-with-BERT

Folders and files

Latest commit

History

Repository files navigation

News Category Classification with BERT

Dataset

Implementations

Try it on Colab Notebook

TL;DR

Resuts

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages