Impute-missing-data-with-XGBoost

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

In this dataset, a lot of "cost" data missing, but they are quite important to predict "price".

If we impute the missing "cost" with its mean or median, there will be a spike in the imputed dataset. In contrast, imputing the missing "cost" with a XGBoost regressor which is embedded in the data transformer and predicts "cost" from other features is very effective.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Impute missing data with XGBoost.ipynb		Impute missing data with XGBoost.ipynb
README.md		README.md
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Impute-missing-data-with-XGBoost

About

Releases

Packages

Languages

hanfei1986/Impute-missing-data-with-XGBoost

Folders and files

Latest commit

History

Repository files navigation

Impute-missing-data-with-XGBoost

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages