Fake news challenge using ML/DL
If you are running the code for the first time. Run the following commands in your python console.
-> import nltk
-> nltk.download()
-
Install contractions library which is used in the pre-processing part
-> pip install contractions==0.0.18
-
All the other libraries will come along with anaconda.
-
If you want to experiment with the pre processing and feature extraction results you can delete the contents in the folders
preprocessed_data
andfinal_features
. -
You need to run the
main.py
for the whole project to execute.
-
data_import.py
- This file imports the data from the dataset and we use thecompetition_test
files to test the data. -
train_validation_split.py
- The train data that is imported from the csv files is split into 80 % train and 20 % validation to check the results before testing them on the test data. -
preprocess.py
- We use different pre processing techniques like tokenize, stopwords, stem/lemmatize for the text data. -
feature_extraction.py
- we mainly use three different concepts to extract features from the text. They are:1) Sentence Weighting 2) Ngrams (2 grams) 3) TF-IDF Vectorizing
-
models.py
- we mainly use four different ML algorithms to find the class labels. They are:1) Random Forest Classifier 2) Decision Tree Classifier 3) Logistic Regression 4) Naive Bayes
-
metrics.py
- We use different performance metrics likeCompetition Score
,Accuracy
,Precision
,Recall
,F1-score
to evaluate our labels. -
score.py
- This is the official scorer python file that is to be used, to evaluate the results.
- Percentage of correctness for different stances for all the 5 algorithms we compare with:
- The accuracy - score chart for the dataset is:
For detailed information about the repository please find the pdf attached in report/report.pdf