First Phase Attack Experiment

# Phase 1: Experimenting Initial Attack using BigCodeBench Dataset

This phase aims to simulate a **retrieval poisoning and up-ranking attack** against LLM-based code retrieval or RAG (Retrieval-Augmented Generation) systems. The pipeline leverages the **BigCodeBench** dataset and combines embedding-based similarity scoring, up-ranking optimization, and content-based injection methods.

---

## 1. Dataset Preparation

### Source:
- Collect the **BigCodeBench** dataset from Hugging Face and convert it into `.csv` format.
- The dataset contains:
  - **Queries (Q):** Natural language programming tasks or prompts.
  - **Code Snippets (D):** Ground-truth code solutions.

### Splitting:
1. Separate dataset into:
   - **Queries Pool**
   - **Code Pool**
2. From the queries pool:
   - Split into **Uprank Queries (80%)** and **Test Queries (20%)**.
3. The Code Pool will serve as the **document base (D₁…Dₖ)**.

---

## 2. Code Document Preparation

- Each document (**D₁, D₂, ..., Dₖ**) corresponds to a code snippet.
- Merge approximately **50 code snippets** into one text file to simulate realistic code documents.
- These merged files will be used as the **base corpus** for embedding and similarity computation.

---

## 3. Embedding Generation

### Embedding Models:
- Use two models:
  - **BGE-M3**
  - **mGTE**
- For each query and document:
  - Generate embedding vectors using both models.
  - The **final embedding vector** = average of the two embeddings.

### Purpose:
- Create a unified vector space for both queries and documents to compute semantic similarity.

---

## 4. Query-Document Matching

### Step 1 — Similarity Calculation:
- Compute **cosine similarity** between each query and every document.
- For each query `Qᵢ`, select the **Top-5 most similar documents** (`D₁…D₅`).

### Step 2 — Target Document Set:
- Merge all Top-5 documents across all queries to form the **Targeted Document Set (TDS)**.

### Step 3 — Proxy Query Mapping:
- For each document in TDS:
  - Identify all queries for which the document appears in the Top-5.
  - These are considered the document’s **upranking proxy queries**.

---

## 5. HotFlip Optimization

### Objective:
To **up-rank** the targeted documents during retrieval.

### Algorithm:
- Apply the **HotFlip** method as described in:
  > *HotFlip: White-Box Adversarial Examples for Text Classification*  
  > [https://arxiv.org/pdf/1712.06751](https://arxiv.org/pdf/1712.06751)

### Steps:
1. For each targeted document and its proxy queries, perform token-level optimization using HotFlip.
2. Iteratively modify text tokens (code comments, descriptions) to maximize cosine similarity with corresponding queries.
3. Evaluate using **Precision@k** after each optimization round to ensure ranking improvement.

---

## 6. Inducing Generator (Web Cloaking Integration)

### Objective:
Generate adversarial textual content to **induce LLMs to follow malicious links**.

### Reference:
> *Footprint-Based Web Cloaking Induction for LLMs*  
> [https://arxiv.org/html/2509.00124v1](https://arxiv.org/html/2509.00124v1)

### Process:
1. For each document in the targeted set:
   - Generate a **contextual paragraph** designed to entice LLMs or retrieval systems to click or reference external malicious links.
   - The content mimics **overview**, **context**, or **description** of the code.
2. Insert this generated paragraph:
   - At the **beginning of the document** (as an introduction).
   - Within the **code itself** as comments or docstrings.

---

## 7. Document Synthesis and Injection

### Using LaTeX Generator:
- Employ **LLMs and PDF APIs** to produce polished PDF files from the modified code using LaTeX templates.
- Combine all injected documents into a **Poisoned RAG Database**.

### Injection Ratio:
- Inject **0.01% to 3%** of the poisoned documents into the **original clean corpus**.

---

## 8. Evaluation

### Objectives:
1. **Upranking Evaluation:**
   - Recalculate similarity between poisoned documents and queries.
   - Measure **Precision@k** to determine the improvement in ranking positions.
2. **ASR (Attack Success Rate):**
   - Evaluate whether induced content successfully triggers LLMs to follow or quote malicious links.

---

## 9. Summary of Phase 1 Workflow

1. Collect and preprocess **BigCodeBench** dataset.  
2. Create merged **code documents**.  
3. Generate dual-model embeddings using **BGE-M3** and **mGTE**.  
4. Match queries and compute **cosine similarity**.  
5. Identify **targeted documents** and their **proxy queries**.  
6. Apply **HotFlip** optimization to up-rank these documents.  
7. Generate **inducing content** using web cloaking methodology.  
8. Convert to **LaTeX/PDF** and inject small percentage into clean corpus.  
9. Measure **Precision@k** and **ASR** to evaluate attack success.

---

## 10. References

1. HotFlip: *White-Box Adversarial Examples for Text Classification*  
   [https://arxiv.org/pdf/1712.06751](https://arxiv.org/pdf/1712.06751)

2. Web Cloaking and Inducing Footprint Techniques for LLMs  
   [https://arxiv.org/html/2509.00124v1](https://arxiv.org/html/2509.00124v1)

3. BigCodeBench Dataset (Hugging Face)  
   [https://tinyurl.com/t4u2jrrp](https://tinyurl.com/t4u2jrrp)

---

## Notes

- The embeddings are averaged to reduce model-specific biases and stabilize similarity space.
- HotFlip is adapted from NLP to handle structured code and documentation content.
- The poisoned corpus simulates realistic RAG environments where a small number of malicious entries can manipulate retrieval outcomes.
<img width="1410" height="832" alt="Image" src="https://github.com/user-attachments/assets/120a3448-f913-4f5e-8204-b720c4845c41" />

First Phase Attack Experiment #3

Description

Phase 1: Experimenting Initial Attack using BigCodeBench Dataset

1. Dataset Preparation

Source:

Splitting:

2. Code Document Preparation

3. Embedding Generation

Embedding Models:

Purpose:

4. Query-Document Matching

Step 1 — Similarity Calculation:

Step 2 — Target Document Set:

Step 3 — Proxy Query Mapping:

5. HotFlip Optimization

Objective:

Algorithm:

Steps:

6. Inducing Generator (Web Cloaking Integration)

Objective:

Reference:

Process:

7. Document Synthesis and Injection

Using LaTeX Generator:

Injection Ratio:

8. Evaluation

Objectives:

9. Summary of Phase 1 Workflow

10. References

Notes

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions