llm-jp-membership-inference

This is the repository for llm jp membership inference attack.

Description

Membership inference attack is a type of privacy attack that aims to determine whether a specific data sample was used to train a machine learning model.
In this project, we implement current representative membership inference attacks.
For a specific method, usually, it caculates a feature value for every input sample. Then each MIA method has its own hypothesis in how to seperate trained or un-trained samples, for exmaple, the loss method would assume that the loss of trained samples is smaller than un-trained samples. Then it would use the feature value to predict the label of the sample.

Attack Methods

We implemented the following MIA methods:

Gray-box Method:

Loss-based MIA
Gradient-based MIA
Perplexity-based MIA
Reference-based MIA
Min-k%
Max-k% ++
EDA-PAC
Recall
Neighbourhood-based MIA
DC-PDD
- This requires a external token probability file, which is not provided in this repository due to the size of this file. You should prepare this file by yourself.

Black-box Method:

SaMIA
CDD

How to run the code

Please refer to the test.py file for how to run the code.

Code Structure

The code is structured as follows:

mia_dataset.py is the dataset class for the MIA attack.
mia_model.py is the attacked model class for the MIA attack.
mia_methods.py is the MIA attack methods class.
test.py is the example running file for the MIA attack.
utils.py is the utility functions and evaluation codes for the MIA attack.

How to use

You may refer to the test.py for a simple example usage.
In general, the codes are used as follows:

Load the dataset for the MIA attack. This could be a dataset that is already processed in current codes (WikiMIA), but also can be a dataset that you have prepared by your own. A MIA dataset has two list of samples, one is for the member samples and another one is for the non-member samples. You can access those two samples by using dataset.member and datast.non_member.
Create the target model for the MIA attack. We have already prepared GPTNeox as the initial model in the test.py. You can use this model as the target model, or you can use your own model. The mia_model.py uses AutoModelForCausalLM as the target model, so you can use any model that is compatible with this class. If you are using other models that are not supported by this class, you may need to modify the mia_model.py to adapt to your model.
Load MIA method from mia_methods.py. Some methods have hyperparameters, you should refer to this py file to check related hyperparameters as it may have certain influences on the performance of MIA attack.
Run the MIA by using mia_model.collect_outputs(dataset.member, mia_method) and mia_model.collect_outputs(dataset.non_member, mia_method). This will return the feature value caculated for every sample in both member and non-member set. Then you can use this feature value to predict the label of the sample.
The evaluation codes in the utils.py can be used to evaluate the performance of the MIA attack. You can use utils.evaluate_mia to evaluate the performance of the MIA attack. This evaluation codes will return the distribution distance between member and non-member samples, and the ROC-AUC score of the MIA attack.

To-do

Add more MIA methods.
Add more datasets for the MIA attack.
Add evaluation metrics for the MIA attack.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
For any wanted MIA methods that are not implemented, please feel free to open an issue.

Caution

The analysis results of a MIA method does not mean the text is absolutely trained by the model.
The result should be only used as a reference rather than an evidence to support conclusion like "my novel is trained by this LLM".
Current SotA LLMs usually use a close-source training data, so it is impossible to get the ground truth of the training data. Thus, the MIA is only doing inference rather than a proof.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.idea		.idea
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mia_dataset.py		mia_dataset.py
mia_method.py		mia_method.py
mia_model.py		mia_model.py
requirements.txt		requirements.txt
test.py		test.py
test.sh		test.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-jp-membership-inference

Description

Attack Methods

How to run the code

Code Structure

How to use

To-do

Contributing

Caution

About

Releases

Packages

Languages

License

llm-jp/llm-jp-membership-inference

Folders and files

Latest commit

History

Repository files navigation

llm-jp-membership-inference

Description

Attack Methods

How to run the code

Code Structure

How to use

To-do

Contributing

Caution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages