This is the repository for llm jp membership inference attack.
Membership inference attack is a type of privacy attack that aims to determine whether a specific data sample was used to train a machine learning model.
In this project, we implement current representative membership inference attacks.
For a specific method, usually, it caculates a feature value for every input sample. Then each MIA method has its own hypothesis in how to seperate trained or un-trained samples, for exmaple, the loss method would assume that the loss of trained samples is smaller than un-trained samples. Then it would use the feature value to predict the label of the sample.
We implemented the following MIA methods:
Gray-box Method:
- Loss-based MIA
- Gradient-based MIA
- Perplexity-based MIA
- Reference-based MIA
- Min-k%
- Max-k% ++
- EDA-PAC
- Recall
- Neighbourhood-based MIA
- DC-PDD
- This requires a external token probability file, which is not provided in this repository due to the size of this file. You should prepare this file by yourself.
Black-box Method:
- SaMIA
- CDD
Please refer to the test.py file for how to run the code.
The code is structured as follows:
mia_dataset.py
is the dataset class for the MIA attack.mia_model.py
is the attacked model class for the MIA attack.mia_methods.py
is the MIA attack methods class.test.py
is the example running file for the MIA attack.utils.py
is the utility functions and evaluation codes for the MIA attack.
You may refer to the test.py for a simple example usage.
In general, the codes are used as follows:
- Load the dataset for the MIA attack. This could be a dataset that is already processed in current codes (WikiMIA), but also can be a dataset that you have prepared by your own. A MIA dataset has two list of samples, one is for the member samples and another one is for the non-member samples. You can access those two samples by using
dataset.member
anddatast.non_member
. - Create the target model for the MIA attack. We have already prepared GPTNeox as the initial model in the
test.py
. You can use this model as the target model, or you can use your own model. Themia_model.py
usesAutoModelForCausalLM
as the target model, so you can use any model that is compatible with this class. If you are using other models that are not supported by this class, you may need to modify themia_model.py
to adapt to your model. - Load MIA method from
mia_methods.py
. Some methods have hyperparameters, you should refer to this py file to check related hyperparameters as it may have certain influences on the performance of MIA attack. - Run the MIA by using
mia_model.collect_outputs(dataset.member, mia_method)
andmia_model.collect_outputs(dataset.non_member, mia_method)
. This will return the feature value caculated for every sample in both member and non-member set. Then you can use this feature value to predict the label of the sample. - The evaluation codes in the utils.py can be used to evaluate the performance of the MIA attack. You can use
utils.evaluate_mia
to evaluate the performance of the MIA attack. This evaluation codes will return the distribution distance between member and non-member samples, and the ROC-AUC score of the MIA attack.
- Add more MIA methods.
- Add more datasets for the MIA attack.
- Add evaluation metrics for the MIA attack.
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
For any wanted MIA methods that are not implemented, please feel free to open an issue.
The analysis results of a MIA method does not mean the text is absolutely trained by the model.
The result should be only used as a reference rather than an evidence to support conclusion like "my novel is trained by this LLM".
Current SotA LLMs usually use a close-source training data, so it is impossible to get the ground truth of the training data.
Thus, the MIA is only doing inference rather than a proof.