This repository analyzes and provides an understanding of the famous European credit card fraud dataset. The dataset can be found at https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. The dataset over here is highly “imbalanced” i.e there are way more genuine events than fraud. This resembles a real life scenario where fraud happens in a rare scenario. But it is very important to detect and prevent fraud as its repercussions are very high.
Ensemble techniques like logistic regression, random forest, gradient boost ; binary classification feedforward neural network using ReLU; Support Vector Machine - SVM (Linear Kernel) have been experimented with on both imbalanced and balanced to make observations on there performance.
For balancing the datasets different techniques like undersampling, oversampling and SMOTE have been used.
In-depth study of the different modelling techniques on the imbalanced/balanced has been made.
After performing undersampling,
Oversampling with a random forest classifier has been done,
Finally testing has been done by selecting sequential neural network model,
Mainly precision, recall and f1-score has been used to evaluate the performance of a model as in this particular dataset, confusion matrix accuracy is going to be high due the large imbalance(Far greater number of genuine data points).
Feel free to contribute to this project.