DP-COMET: A Differential Privacy Context Obfuscation MEchanism for Textual Data

DP-COMET repository of Short-Paper accepted at CIKM'25. ID Paper: 5615

⭐ 🏆 The paper has won the Best Short Paper Award at the CIKM'25 Conference. You can read the full paper here. 🏆 ⭐

📂 Directory Structure

├── config
│   ├── environment.yml
│   ├── requirements.txt
├── data
│   ├── IR
│      ├── effectiveness
│      ├── obfuscatedQueries
│      ├── privacy
│   ├── sentimentAnalysis
│      ├── effectiveness
│      ├── obfuscatedQueries
│      ├── privacy
├── img
│   ├── repo
├── logs # Directory for storing logs dinamically generated by the script
├── src
│   ├── __init__.py
│   ├── dp-comet.py
│   ├── utils
│      ├── __init__.py
│      ├── mylogger.py
├── test
├── README.md
├── LICENSE
├── .gitignore

The main.py script will generate a sample test folder, apply the DP-COMET mechanism, perform the experiments and store the results in it. The results logs will be saved in the logs/ folder.

🌍 Setup Environment

Clone the repository
Generate the environment using the environment.yml file in ./config/ folder. You can use the following command to create a conda environment:

conda env create -f config/environment.yml

Then verify the environment:

conda env list

Activate the environment:

conda activate dp-comet

Now you can run the code in the repository with all the dependencies installed. Once you are done, you can deactivate the environment:

conda deactivate

In the ./config/requirements.txt file, you can find the list of packages used in the project.

📖 General Psudocode

The DP-COMET mechanism is designed to obfuscate textual data while preserving privacy considering the context of the texts. The pseudocode below outlines the main steps of the DP-COMET mechanism:

🧪 Test DP-COMET

Warning

Due to the large size of the embeddings, the config folder does not contain the file used by COMET. However, upon acceptance of the paper, the embeddings will be provided in the de-anonymized repository. If you want to test the COMET mechanisms, you can use the main.py script to generate the obfuscated queries, but the time will be longer than the one reported in the paper, as the embeddings will be generated on-the-fly.

To test the contextual obfuscation mechanism, you can use the main.py script and run the following command:

python3 main.py

An example with the obfuscated queries for the Information Retrieval task is shown below:

python3 main.py --collection dl20 --iterations 10 --mechanism Mhl --epsilons 4 16

📏 Effectiveness & Privacy

The results can be found in the data/ folder, where you will find the obfuscated queries and the results of the experiments. The results are organized in subfolders for each task, such as IR (Information Retrieval) and sentimentAnalysis, with further subfolders for effectiveness, obfuscated queries, and privacy.

🔐 Full Privacy analysis results

Because of the limited space on the paper, here we provide the full privacy analysis results for the DP-COMET mechanism on the IR Task.

⌛ Efficiency

The DP-COMET mechanism is designed to be efficient in terms of computational resources. The tests has been performed on a machine with the following specifications:

Processor: 13th Gen Intel i9-13900H (20) @ 5.200GHz
GPU: Intel Raptor Lake-P [Iris Xe Graphics]
RAM: 16 GB
Storage: 1 TB SSD
Operating System: Ubuntu 24.04.2 LTS x86_64

The btop screenshot below shows the CPU and RAM usage during the execution of the main.py script:

To obfuscate the queries, the DP-COMET prints the progress of the obfuscation, showing the number of queries processed and the total number of queries to be obfuscated. At the end of the obfuscation, the script will print the total time taken to obfuscate the queries.

In the example above, the obfuscation of 10 obfuscations for 43 queries each and 2 epsilons took approximately 5.82s.

🆘 Support

The corresponding author is Francesco L. De Faveri. Please visit his homepage for more information.

📜 License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

De Faveri, Faggioli, and Ferro (@IIIA Hub) · GitHub @kdf_7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DP-COMET: A Differential Privacy Context Obfuscation MEchanism for Textual Data

DP-COMET repository of Short-Paper accepted at CIKM'25. ID Paper: 5615

📂 Directory Structure

🌍 Setup Environment

📖 General Psudocode

🧪 Test DP-COMET

📏 Effectiveness & Privacy

🔐 Full Privacy analysis results

⌛ Efficiency

🆘 Support

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
config		config
data		data
img/repo		img/repo
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_sentiment.py		main_sentiment.py
tweets_df.csv		tweets_df.csv

License

Kekkodf/DP-COMET

Folders and files

Latest commit

History

Repository files navigation

DP-COMET: A Differential Privacy Context Obfuscation MEchanism for Textual Data

DP-COMET repository of Short-Paper accepted at CIKM'25. ID Paper: 5615

📂 Directory Structure

🌍 Setup Environment

📖 General Psudocode

🧪 Test DP-COMET

📏 Effectiveness & Privacy

🔐 Full Privacy analysis results

⌛ Efficiency

🆘 Support

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages