Skip to content

feat: persist RAPTOR layer metadata on summary chunks#13286

Open
yuch85 wants to merge 1 commit intoinfiniflow:mainfrom
yuch85:feat/raptor-layer-metadata
Open

feat: persist RAPTOR layer metadata on summary chunks#13286
yuch85 wants to merge 1 commit intoinfiniflow:mainfrom
yuch85:feat/raptor-layer-metadata

Conversation

@yuch85
Copy link

@yuch85 yuch85 commented Mar 1, 2026

Summary

RAPTOR's recursive clustering builds a layers list tracking (start_idx, end_idx) boundaries per level, but currently discards this information — only the flat chunks list is returned. This makes it impossible to distinguish leaf-level summaries from top-level ones.

This PR:

  • Returns (chunks, layers) tuple from raptor.py's __call__
  • Annotates each RAPTOR summary chunk with raptor_layer_int (1 = first summary level, 2 = summary-of-summaries, etc.)
  • Adds raptor_layer_int to infinity_mapping.json (Elasticsearch handles it via existing *_int dynamic template)

Why this matters

Downstream features need to know which RAPTOR layer a summary belongs to:

Changes

File Change LOC
rag/raptor.py Return (chunks, layers) tuple ~3
rag/svr/task_executor.py Build chunk_layer mapping, set raptor_layer_int ~12
conf/infinity_mapping.json Add raptor_layer_int integer field ~1

Backward compatibility

  • Additive only — no existing fields or behavior changed
  • Existing RAPTOR chunks continue to work (they'll have raptor_layer_int = 0 by default)
  • New RAPTOR chunks get layer metadata automatically

Test plan

  • Parse a document with RAPTOR enabled, verify raptor_layer_int is set on indexed chunks
  • Verify raptor_layer_int values increase with abstraction level (layer 1 < layer 2 < ...)
  • Verify existing RAPTOR deletion (delete by raptor_kwd) still works
  • Verify Infinity backend accepts the new field

Fixes #7488
Related: #4104, #11191, #10951

🤖 Generated with Claude Code

RAPTOR's recursive clustering builds a `layers` list tracking
`(start_idx, end_idx)` boundaries per level, but currently discards
this information — only the flat `chunks` list is returned.

This makes it impossible to distinguish leaf-level summaries from
top-level ones, which downstream features need (e.g. retrieving
only the highest-level document summary for entity extraction or
search result snippets).

Changes:
- `rag/raptor.py`: Return `(chunks, layers)` tuple from `__call__`
- `rag/svr/task_executor.py`: Compute `raptor_layer_int` per summary
  chunk using the layer boundaries. Layer 1 = first summary level,
  layer 2 = summary-of-summaries, etc.
- `conf/infinity_mapping.json`: Add `raptor_layer_int` integer field
  (Elasticsearch handles this via existing `*_int` dynamic template)

Fixes infiniflow#7488
Related: infiniflow#4104, infiniflow#11191, infiniflow#10951
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. ♾️infinity Pull requests that‘s involved with infinity(DB) labels Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

♾️infinity Pull requests that‘s involved with infinity(DB) size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Persist layers hierarchy in RAPTOR

1 participant