Skip to content

Kekkodf/DP-COMET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DP-COMET
DP-COMET: A Differential Privacy Context Obfuscation MEchanism for Textual Data

DP-COMET repository of Short-Paper accepted at CIKM'25. ID Paper: 5615

License Python Version Conda Version OS Release Version

⭐ 🏆 The paper has won the Best Short Paper Award at the CIKM'25 Conference. You can read the full paper here. 🏆 ⭐

📂 Directory Structure

├── config
│   ├── environment.yml
│   ├── requirements.txt
├── data
│   ├── IR
│      ├── effectiveness
│      ├── obfuscatedQueries
│      ├── privacy
│   ├── sentimentAnalysis
│      ├── effectiveness
│      ├── obfuscatedQueries
│      ├── privacy
├── img
│   ├── repo
├── logs # Directory for storing logs dinamically generated by the script
├── src
│   ├── __init__.py
│   ├── dp-comet.py
│   ├── utils
│      ├── __init__.py
│      ├── mylogger.py
├── test
├── README.md
├── LICENSE
├── .gitignore

The main.py script will generate a sample test folder, apply the DP-COMET mechanism, perform the experiments and store the results in it. The results logs will be saved in the logs/ folder.

🌍 Setup Environment

  1. Clone the repository

  2. Generate the environment using the environment.yml file in ./config/ folder. You can use the following command to create a conda environment:

conda env create -f config/environment.yml

Then verify the environment:

conda env list
  1. Activate the environment:
conda activate dp-comet

Now you can run the code in the repository with all the dependencies installed. Once you are done, you can deactivate the environment:

conda deactivate

In the ./config/requirements.txt file, you can find the list of packages used in the project.

📖 General Psudocode

The DP-COMET mechanism is designed to obfuscate textual data while preserving privacy considering the context of the texts. The pseudocode below outlines the main steps of the DP-COMET mechanism:

DP-COMET Pseudocode

🧪 Test DP-COMET

Warning

Due to the large size of the embeddings, the config folder does not contain the file used by COMET. However, upon acceptance of the paper, the embeddings will be provided in the de-anonymized repository. If you want to test the COMET mechanisms, you can use the main.py script to generate the obfuscated queries, but the time will be longer than the one reported in the paper, as the embeddings will be generated on-the-fly.

To test the contextual obfuscation mechanism, you can use the main.py script and run the following command:

python3 main.py

An example with the obfuscated queries for the Information Retrieval task is shown below:

python3 main.py --collection dl20 --iterations 10 --mechanism Mhl --epsilons 4 16

demo

📏 Effectiveness & Privacy

The results can be found in the data/ folder, where you will find the obfuscated queries and the results of the experiments. The results are organized in subfolders for each task, such as IR (Information Retrieval) and sentimentAnalysis, with further subfolders for effectiveness, obfuscated queries, and privacy.

🔐 Full Privacy analysis results

Because of the limited space on the paper, here we provide the full privacy analysis results for the DP-COMET mechanism on the IR Task.

Privacy Analysis

⌛ Efficiency

The DP-COMET mechanism is designed to be efficient in terms of computational resources. The tests has been performed on a machine with the following specifications:

Processor: 13th Gen Intel i9-13900H (20) @ 5.200GHz
GPU: Intel Raptor Lake-P [Iris Xe Graphics]
RAM: 16 GB
Storage: 1 TB SSD
Operating System: Ubuntu 24.04.2 LTS x86_64

The btop screenshot below shows the CPU and RAM usage during the execution of the main.py script:

btop

To obfuscate the queries, the DP-COMET prints the progress of the obfuscation, showing the number of queries processed and the total number of queries to be obfuscated. At the end of the obfuscation, the script will print the total time taken to obfuscate the queries.

In the example above, the obfuscation of 10 obfuscations for 43 queries each and 2 epsilons took approximately 5.82s.

🆘 Support

The corresponding author is Francesco L. De Faveri. Please visit his homepage for more information.

📜 License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.


De Faveri, Faggioli, and Ferro (@IIIA Hub)  ·  GitHub @kdf_7

About

DP-COMET repository: Code & Results

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages