Skip to content

Experiment embedding ,dual chain retrieval, HotFlip upranking and inducing sequence generator #4

@hieunguyen-cyber

Description

@hieunguyen-cyber

Phase 1 — Diagram-only Summary

Extracted directly from the experiment diagram.


1. Experiment Embedding

  • Dataset: BigCodeBench
  • Models: BGE-M3 + mGTE
  • Fusion: Average embeddings from both models
  • Output: Query embeddings and Document embeddings

2. Dual-chain Retrieval (2-way)

  • Compute cosine similarity between all query–document pairs
  • For each query → select Top-5 documents
  • Merge all Top-5 results → form Targeted Document Set (TDS)
  • Build proxy query mapping: for each document in TDS, list queries where it appears in Top-5

3. HotFlip Upranking

  • Apply HotFlip optimization on targeted documents
  • Perform token-level edits guided by similarity gradients
  • Goal: Uprank documents in retrieval results (increase cosine similarity with proxy queries)

4. Inducing Sequence Generator

  • Generate inducing paragraphs/comments/docstrings with embedded URLs
  • Reference method: web cloaking induction for LLMs
  • Use LaTeX generator to produce polished PDF files
  • Inject 0.01% – 3% of poisoned documents into the clean RAG database

Evaluation

  • Compute Precision@k (before vs after attack)
  • Measure ASR (Attack Success Rate) — rate of agent follow/quote on malicious links

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions