Text_Classification_NLP

There are lots of use-cases we can actually explore in this particular domain i.e NLP So let's start with Text Classification for the dataset of the BBC news with the 5 categories.

Here we are focussing on two of the classification algorithms i.e SVM and Naive Bayes.

For Data preprocessing the following major steps are undertaken:

Removing blank rows if any.
Changing all the text format to lower case.
Perform Word tokenisation
Removing stopwords
Removing non-alphabatic text
Word lemmatization

Preparing Dataset for the training

Lets split the dataset in Train-test split with a ratio of 70%-30%.

Perform encoding

For better understanding of the data by converting the Categorial data into the Numeric one.

Perform Word Vectorization

Turning a collection of text documents into numerical feature vectors to get the frequency of each other associated with the particular text statement. The approach I considered here is TF-IDF (Term Frequency - Inverse Document Frequency) The TF-IDF build a vocabulary of words which it has learned from the dataset and it will assign a unique integer number to each of these words.

Applying Classification algorithms to make the classification

SVM & Naive Bayes

Accuracy Measues

SVM : 98.35329341317365 Naive Bayes : 97.30538922155688

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
BBC_Text_Classification.ipynb		BBC_Text_Classification.ipynb
README.md		README.md
bbc-text.csv		bbc-text.csv
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text_Classification_NLP

Preparing Dataset for the training

Perform encoding

Perform Word Vectorization

Applying Classification algorithms to make the classification

Accuracy Measues

First step is made, more changes will be made in this repo.

About

Releases

Packages

Languages

mish1102/Text_Classification_NLP

Folders and files

Latest commit

History

Repository files navigation

Text_Classification_NLP

Preparing Dataset for the training

Perform encoding

Perform Word Vectorization

Applying Classification algorithms to make the classification

Accuracy Measues

First step is made, more changes will be made in this repo.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages