Document Duplicate Detection & Merging

## Description

Implement duplicate detection for documents based on filename similarity, content similarity, and metadata matching, with options to merge, keep both, or delete duplicates.

---

## Requirements

- [ ] Detect similar document names (fuzzy matching)
- [ ] Detect similar content (hash comparison)
- [ ] Detect exact duplicates (file hash)
- [ ] Show duplicate warnings on upload
- [ ] Provide merge or keep options
- [ ] Version comparison view
- [ ] Batch duplicate detection
- [ ] Smart merge with conflict resolution

---

## Detection Methods

### 1. Exact Duplicates (File Hash)
- Calculate MD5/SHA-256 hash of file
- Compare with existing document hashes
- 100% match = exact duplicate

### 2. Filename Similarity (Fuzzy Matching)
- Levenshtein distance < 3
- Same base name with version suffix
- Example: "Resume_v1.pdf" vs "Resume_v2.pdf"

### 3. Content Similarity
- Extract text from both documents
- Calculate similarity score (0-100%)
- Threshold: >85% = likely duplicate

---

## UI Design

### Duplicate Warning on Upload
```
┌────────────────────────────────────────┐
│ ⚠️  Potential Duplicate Detected        │
├────────────────────────────────────────┤
│ Senior_Dev_Resume_2025.pdf             │
│                                        │
│ Similar to:                            │
│ 📄 Senior_Dev_Resume.pdf               │
│    Uploaded: Oct 15, 2025              │
│    87% content match                   │
│                                        │
│ What would you like to do?             │
│                                        │
│ ◉ Keep both as separate versions       │
│ ○ Replace old with new                 │
│ ○ Keep old, discard new                │
│ ○ View comparison first                │
│                                        │
│ [Cancel]               [Proceed]       │
└────────────────────────────────────────┘
```

---

## Acceptance Criteria

- [ ] Detects exact duplicate files
- [ ] Detects similar filenames
- [ ] Detects similar content
- [ ] Shows duplicate warnings
- [ ] Can compare documents side-by-side
- [ ] Can merge duplicates
- [ ] Can keep both versions
- [ ] Batch duplicate scan works

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document Duplicate Detection & Merging #234

Description

Requirements

Detection Methods

1. Exact Duplicates (File Hash)

2. Filename Similarity (Fuzzy Matching)

3. Content Similarity

UI Design

Duplicate Warning on Upload

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Document Duplicate Detection & Merging #234

Description

Description

Requirements

Detection Methods

1. Exact Duplicates (File Hash)

2. Filename Similarity (Fuzzy Matching)

3. Content Similarity

UI Design

Duplicate Warning on Upload

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions