Skip to content

jaffatherealest/img-spoofer

Repository files navigation

img-spoofer

Generate multiple image variants that pass perceptual hash checks while looking identical to humans. Built for marketing teams who need to post similar content without triggering duplicate detection.

What it does

Takes your images and creates N variants of each. Every variant:

  • Looks the same to human eyes (marketing-quality output)
  • Has a different PDQ hash (Facebook's perceptual hashing algorithm)
  • Gets saved to organized subfolders with full metadata

PDQ hashes with distance ≤31 are considered "similar" by most platforms. This tool generates variants with distances of 80-130+, making them register as completely different images.

Quick Start

Choose your OS for full setup:

Already have Python 3.10+ and git?

git clone https://github.com/jaffatherealest/img-spoofer.git
cd img-spoofer
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Drop images in input/, then:

python main.py spoof --input-dir input --output-dir output --variants 10

Usage

Basic Commands

# Generate 10 variants per image (default)
python main.py spoof --input-dir input --variants 10

# Dry run - see what would be generated without saving files
python main.py spoof --input-dir input --variants 10 --dry-run

# Target size for Threads/Instagram (1080x1350)
python main.py spoof --input-dir input --variants 12 --target-size 1080x1350

# Higher PDQ distance threshold (stricter uniqueness)
python main.py spoof --input-dir input --min-distance 50

# Scan a folder for duplicate/similar images
python main.py scan ~/Pictures/content

# Check a single image's PDQ hash
python main.py check path/to/image.jpg

# Compare two images
python main.py compare original.jpg variant.jpg

CLI Options

Option Description Default
--input-dir, -i Folder with your original images input/
--output-dir, -o Where variants get saved output/
--variants, -n How many variants per image 10
--min-distance Minimum PDQ distance from original 32
--dry-run Preview mode, no files written off
--target-size Resize output (e.g., 1080x1350) original size
--quality, -q JPEG quality 1-100 95
--verbose, -v Show detailed progress off

Scan Command

Scan a folder to find duplicate or similar images by comparing all images against each other.

# Basic scan - shows only similar pairs (distance <= 31)
python main.py scan /path/to/folder

# Show all comparisons
python main.py scan /path/to/folder --all

# Custom similarity threshold
python main.py scan /path/to/folder --threshold 50

# Save results to JSON
python main.py scan /path/to/folder --output results.json
Option Description Default
--threshold, -t Distance threshold for "similar" 31
--output, -o Save full results to JSON file none
--all, -a Show all pairs, not just similar ones off

Output Structure

output/
├── beach_photo/
│   ├── beach_photo_v01.jpg
│   ├── beach_photo_v02.jpg
│   └── ...
├── product_shot/
│   ├── product_shot_v01.jpg
│   └── ...
└── manifest.json

The manifest.json contains PDQ hashes, distances, and which augmentation recipe was used for each variant. Useful for tracking and debugging.

Configuration

Most settings work via CLI flags. For repeated workflows, edit config.yaml:

pdq:
  min_distance_from_original: 32
  min_distance_between_variants: 20

output:
  quality: 95
  target_size: null  # or "1080x1350"

generation:
  variants_per_image: 10
  max_attempts_per_variant: 100

Examples

Weekly content batch for Threads:

python main.py spoof -i ~/Downloads/weekly_photos -o ~/Content/variants -n 12 --target-size 1080x1350

Test run on a few images:

python main.py spoof -i input -n 3 --dry-run --verbose

Verify a variant is different enough:

python main.py compare input/original.jpg output/original/original_v01.jpg
# Output: Hamming Distance: 98 - DIFFERENT (distance > 31)

Find duplicates in your content library:

python main.py scan ~/Pictures/marketing_assets --output duplicates.json
# Shows all similar pairs and saves full report

Supported Formats

Input: JPG, JPEG, PNG, WebP, HEIC Output: JPEG (high quality, optimized)

Troubleshooting

"No supported images found" Check your input folder has images with supported extensions. The tool scans for .jpg, .jpeg, .png, .webp, .heic.

Variants look too different The augmentation recipes are tuned for subtlety, but you can reduce intensity by editing augmentations.py. The micro_transform and quality_shift recipes make the smallest visual changes.

Low PDQ quality warnings Some images (very simple, solid colors, tiny) produce low-quality PDQ hashes. These get skipped automatically. Use images with actual content/texture.

High attempt counts If it's taking many attempts per variant, your images might be very uniform. This is normal for solid backgrounds or simple graphics.

How It Works

See Technical Documentation for details on PDQ hashing and the augmentation pipeline.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages