Classify your news articles into two categories of real and fake with this code. Here, we are using TfIdf along with models of LogisticRegression, DecisionTreeClassifier, RandomForestClassifier and GradientBoostingClassifier for News Article Classification.
git clone https://github.com/MitanshuBaranwal/Fake-News-Prediction.git
cd Fake-News-Prediction
Below, you'll find a step-by-step guide on how to use and understand this script.
- Dependencies
- LoadDataset
- LabelData
- RemoveNonImportantColumns
- ShuffleData
- DownloadStopwords
- TextPreprocessing
- CreateXY
- TrainTestSplit
- TextVectorization
- ModelTrainingandEvaluation
- ChooseFinalModel
- Predict
This section imports the necessary Python libraries and modules, including Pandas, NLTK, joblib, and scikit-learn. These libraries are used for data manipulation, text preprocessing, and machine learning.
Here, two datasets are loaded: "True.csv" and "Fake.csv." These datasets contain news articles labelled as either real or fake.
The "True" and "Fake" datasets are labelled with 1 for real news and 0 for fake news, respectively.
Both datasets are combined into a single DataFrame for further processing.
Unnecessary columns such as "text," "subject," and "date" are removed. Only the "title" column is used for the classification.
The data is shuffled to ensure randomness and prevent any inherent order bias.
Common English stopwords are downloaded. These stopwords will be removed during text preprocessing.
Text data is preprocessed by applying stemming, removing non-alphabetical characters, converting to lowercase, and removing stopwords.
The feature (X) and target (Y) variables are defined. X contains the preprocessed "title" column, and Y contains the corresponding labels.
The data is split into training and testing sets for model evaluation. An 80-20 split is used with stratification.
The "title" text data is converted into numerical values using TF-IDF vectorization.
Several machine learning models, including Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting, are trained and evaluated for their performance in classifying news articles.
The Random Forest classifier is chosen as the final model based on its performance.
A function predict_news_type is defined to take text input and predict whether the news is real or fake using the final model.
print(predict_news_type("Pope Francis Just Called Out Donald Trump for his remarks"))
Output: 'The news is Real'
print(predict_news_type("Gaza receives largest aid shipment since Israel-Hamas war began"))
Output: 'The news is Fake' Feel free to use this script for news article classification and make predictions. You can customize it further to suit your needs.
For any enquiries please contact me at :