This project focuses on classifying news articles using a customized model based on the Transformer architecture. The architecture is designed to process and combine information from both the title and description of news articles, enabling high accuracy in classification.
→ AG News
The dataset used in this project is the AG News dataset, a widely used benchmark dataset for text classification. The dataset contains four classes representing different news topics:
● World
● Sports
● Business
● Sci/Tech
Each entry in the dataset includes both a title and a description of the news article. For this project, I selected the first 15,000 rows from the dataset to train the model.
The text data (both title and description) was preprocessed to ensure consistency and to facilitate efficient training. The preprocessing steps included tokenization, padding, and embedding, along with the incorporation of positional encodings to preserve the order of words.
The architecture is built around the Transformer encoder, with a key feature being the parallel processing of the title and description using separate encoders.
Dual Encoder Structure: The model employs two parallel encoder blocks:
Embedding and Positional Encoding: Both Title and Description inputs undergo tokenization, padding, embedding, and positional encoding before being fed into their respective encoders.
● Multi-Head Self-Attention layers to capture relationships between tokens.
● Feed-Forward Neural Networks (FFN) with Add & Norm layers to enhance the learned representations.
● Concatenation and Global Pooling: The outputs from both encoders are concatenated and passed through a global max pooling layer, followed by a fully connected layer.
The model works by independently extracting patterns from both the Title and Description of a news article. The self-attention mechanism within each encoder block enables the model to understand contextual information by focusing on relevant parts of the input. By processing both the title and description simultaneously, the model gains a richer representation of the content, allowing for more accurate classification.
Once the outputs from both encoders are obtained, they are pooled, concatenated, and passed through a dense layer to predict the news category.
● Embedding & Positional Encoding: Both inputs are converted into fixed-size embeddings, with positional encodings added.
The AG News dataset, while popular, presents challenges due to its brevity in the title and diversity in the description. The dataset’s complexity lies in the nuanced distinctions between the four classes, especially when only a small portion of the data is used, as done in this project (15,000 samples).
By leveraging parallel encoders, the model effectively mitigates these challenges by combining the broader context from the description with the focused keywords from the title.
This project demonstrates a novel approach to text classification using parallel Transformer encoders. By processing both the title and description of a news article separately and then combining their outputs, the model achieves impressive performance, attaining an accuracy of 99.6% on the test set. The combination of attention mechanisms and feed-forward layers allows for a deep understanding of the input data, making this architecture particularly effective for tasks that require nuanced contextual comprehension.