Missing values in datasets should be extracted from the datasets or should be estimated before they are used for classification, association rules or clustering in the preprocessing stage of data mining. In this paper, authors utilize a fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm. In this method, the fuzzy clustering parameters, cluster size and weighting factor are optimized and missing values are estimated. The proposed novel hybrid method yields sufficient and sensible imputation performance results. The results are compared with those of fuzzy c-means genetic algorithm imputation, support vector regression genetic algorithm imputation and zero imputation. This project is an implementation of this method.
- Python 3.7.0
- Python packages:
- numpy
- pandas
- scikit-learn
- scikit-fuzzy
First, check if you already have it installed or not.
python3 --version
If you don't have python 3 in your computer you can use the code below:
sudo apt-get update
sudo apt-get install python3
sudo pip3 install numpy scikit_fuzzy pandas scikit_learn
If you haven't installed pip, you can use the codes below in your terminal:
sudo apt-get update
sudo apt install python3-pip
You should check and update your pip:
pip3 install --upgrade pip