From the beginning, since the first printed newspaper, every news that makes into a page has had a specific section allotted to it. Although pretty much everything changed in newspapers from the ink to the type of paper used, this proper categorization of news was carried over by generations and even to the digital versions of the newspaper. Newspaper articles are not limited to a few topics or subjects, it covers a wide range of interests from politics to sports to movies and so on. For long, this process of sectioning was done manually by people but now technology can do it without much effort. In this hackathon, Data Science and Machine Learning enthusiasts like you will use Natural Language Processing to predict which genre or category a piece of news will fall in to from the story.
Size of training set: 7,628 records Size of test set: 2,748 records
- STORY: A part of the main content of the article to be published as a piece of news.
- SECTION: The genre/category the STORY falls in.
There are four distinct sections where each story may fall in to. The Sections are labelled as follows:
- Politics: 0
- Technology: 1
- Entertainment: 2
- Business: 3
The final score will be calculated based on the number of true predictions using the confusion matrix.
Rank: 2
Score: 0.99163027660
File | Score |
---|---|
predict-the-news-category_v5.ipynb | 0.99053857 |
predict-the-news-category_v9.ipynb | 0.99017467 |
predict-the-news-category_v8.ipynb | 0.98944687 |
final-ensemble.ipynb | 0.99163028 |