DINI: Data Imputation using Neural Inversion for Edge Applications

DINI is a tool to impute tabular multi-input/multi-output data that can have features that are continuous, categorical, or a combination thereof. DINI takes in data with missing values, and iteratively imputes it while training a surrogate model that could be leveraged for downstream tasks. This facilitates machine learning with corrupted/missing data by state-of-the-art imputation. It works with any dataset and any PyTorch model.

Environment setup

Clone this repository

git clone --recurse-submodules https://github.com/jha-lab/dini.git
cd dini

Setup python environment

The python environment setup is based on conda. The script below creates a new environment named dini or updates an existing environment on the macOS-arm64 platform:

source setup/env_step.sh

For any other platform, you can use the environment files. For pip installation:

pip install --requirement setup/requirements.txt

For conda installation:

conda env create --file setup/environment.yaml
conda activate dini

Replicating results

To generate corrupt data:

python3 corrupt.py --dataset <dataset> --strategy <strategy>

where <dataset> can either be breast, diabetes, diamonds, energy, flights, or yacht. The flag <strategy> can be any one of MCAR, MAR, MNAR, MSAR, or MPAR.

To run DINI model:

python3 dini.py --model <model> --dataset <dataset> --retrain

where <model> can either be FCN, FCN2, LSTM2, or TXF2. The one used in the paper is FCN2. To model uncertainties using an MC dropout layer, use the flag --model_unc. You can also define the fraction to start imputing on using --impute_fraction <fracion>, where <fraction> is a number between 0 and 1 (see Table 3 in the paper).

To run imputation using all baselines, including DINI:

python3 impute.py --dataset <dataset> --strategy <strategy>

To run surrogate modeling on imputed data, for three case studies:

python3 model.py --dataset <case_dataset> --strategy <strategy>

where <case_dataset> can either be gas, swat, or covid_cxr. Note that swat dataset is not public and will have to be downloaded into data/swat/ directory. To do this, get access to the dataset using this link. Then, save SWaT_Dataset_Attack_v0.csv to data/swat/ directory.

Hacking DINI

To run any PyTorch model, you can modify the src/models.py file. See examples (namely models FCN, FCN2, LSTM2, or TXF2) in that file. To use any dataset, convert it to a data.csv file, placed in data/<dataset> directory. Then, the following lines can be added to the process function in corrupt.py:

elif dataset == <dataset>:
	def split(df):
		return df.iloc[:, :-<out_col>].values, df.iloc[:, -<out_col>:].values

where <dataset> is the name of the dataset, and <out_col> is the number of output columns in the chosen dataset.

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

Cite our work using the following bitex entry:

@article{tuli2022sr,
      title={{DINI}: Data Imputation using Neural Inversion for Edge Applications}, 
      author={Tuli, Shikhar and Jha, Niraj K.},
      journal={Scientific Reports},
      volume={12},
      pages={20210},
      year={2022},
      publisher={Nature Publishing Group}
}

License

See License file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
GRAPE @ 0ea0c59		GRAPE @ 0ea0c59
data		data
setup		setup
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
cl.py		cl.py
corrupt.py		corrupt.py
dini.py		dini.py
gain.py		gain.py
gmm.py		gmm.py
grape.py		grape.py
impute.py		impute.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DINI: Data Imputation using Neural Inversion for Edge Applications

Table of Contents

Environment setup

Clone this repository

Setup python environment

Replicating results

Hacking DINI

Developer

Cite this work

License

About

Releases

Packages

Contributors 2

Languages

License

jha-lab/dini

Folders and files

Latest commit

History

Repository files navigation

DINI: Data Imputation using Neural Inversion for Edge Applications

Table of Contents

Environment setup

Clone this repository

Setup python environment

Replicating results

Hacking DINI

Developer

Cite this work

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages