DP-COMET repository of Short-Paper accepted at CIKM'25. ID Paper: 5615
⭐ 🏆 The paper has won the Best Short Paper Award at the CIKM'25 Conference. You can read the full paper here. 🏆 ⭐
├── config
│ ├── environment.yml
│ ├── requirements.txt
├── data
│ ├── IR
│ ├── effectiveness
│ ├── obfuscatedQueries
│ ├── privacy
│ ├── sentimentAnalysis
│ ├── effectiveness
│ ├── obfuscatedQueries
│ ├── privacy
├── img
│ ├── repo
├── logs # Directory for storing logs dinamically generated by the script
├── src
│ ├── __init__.py
│ ├── dp-comet.py
│ ├── utils
│ ├── __init__.py
│ ├── mylogger.py
├── test
├── README.md
├── LICENSE
├── .gitignoreThe main.py script will generate a sample test folder, apply the DP-COMET mechanism, perform the experiments and store the results in it. The results logs will be saved in the logs/ folder.
-
Clone the repository
-
Generate the environment using the
environment.ymlfile in./config/folder. You can use the following command to create a conda environment:
conda env create -f config/environment.ymlThen verify the environment:
conda env list- Activate the environment:
conda activate dp-cometNow you can run the code in the repository with all the dependencies installed. Once you are done, you can deactivate the environment:
conda deactivateIn the ./config/requirements.txt file, you can find the list of packages used in the project.
The DP-COMET mechanism is designed to obfuscate textual data while preserving privacy considering the context of the texts. The pseudocode below outlines the main steps of the DP-COMET mechanism:
Warning
Due to the large size of the embeddings, the config folder does not contain the file used by COMET. However, upon acceptance of the paper, the embeddings will be provided in the de-anonymized repository. If you want to test the COMET mechanisms, you can use the main.py script to generate the obfuscated queries, but the time will be longer than the one reported in the paper, as the embeddings will be generated on-the-fly.
To test the contextual obfuscation mechanism, you can use the main.py script and run the following command:
python3 main.pyAn example with the obfuscated queries for the Information Retrieval task is shown below:
python3 main.py --collection dl20 --iterations 10 --mechanism Mhl --epsilons 4 16The results can be found in the data/ folder, where you will find the obfuscated queries and the results of the experiments. The results are organized in subfolders for each task, such as IR (Information Retrieval) and sentimentAnalysis, with further subfolders for effectiveness, obfuscated queries, and privacy.
Because of the limited space on the paper, here we provide the full privacy analysis results for the DP-COMET mechanism on the IR Task.
The DP-COMET mechanism is designed to be efficient in terms of computational resources. The tests has been performed on a machine with the following specifications:
Processor: 13th Gen Intel i9-13900H (20) @ 5.200GHz
GPU: Intel Raptor Lake-P [Iris Xe Graphics]
RAM: 16 GB
Storage: 1 TB SSD
Operating System: Ubuntu 24.04.2 LTS x86_64The btop screenshot below shows the CPU and RAM usage during the execution of the main.py script:
To obfuscate the queries, the DP-COMET prints the progress of the obfuscation, showing the number of queries processed and the total number of queries to be obfuscated. At the end of the obfuscation, the script will print the total time taken to obfuscate the queries.
In the example above, the obfuscation of 10 obfuscations for 43 queries each and 2 epsilons took approximately 5.82s.
The corresponding author is Francesco L. De Faveri. Please visit his homepage for more information.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.




