Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Aug 21, 2025

  • Zapx now handles nested documents by creating and storing an edge list indicating document relationships.
  • Exposed new APIs:
    • Ancestors that allows the caller to fetch the ancestry chain of the requested document.
    • CountRoot which returns the number of root documents in the segment (excluding nested documents).
    • AddNestedDocuments which updates the deleted bitmap from the segment snapshot to contain nested documents as well.
  • The total number of documents in the segment will include the nested documents as well.

Requires:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for nested fields within the Zapx indexing API, enabling indexing and querying of hierarchical data structures. The implementation includes functionality to handle nested document relationships through parent-child edges and provides efficient caching mechanisms.

Key changes include:

  • Added nested document indexing with parent-child relationship tracking through edge lists
  • Implemented nested index cache system for efficient ancestry and descendant lookups
  • Modified document counting to distinguish between root documents and nested sub-documents

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
segment.go Core segment functionality with nested index cache integration and document relationship methods
nested_cache.go New caching system for nested document relationships with ancestry/descendant tracking
new.go Document flattening logic and edge list serialization for nested structures
merge.go Merge operations extended to handle nested document edge lists and sub-document drops
read.go Added method to calculate edge list offset in segment data
dict.go Dictionary operations updated to include sub-documents in exclusion handling
faiss_vector_posting.go Vector index operations updated to handle sub-document exclusions
build.go Segment base initialization updated to include nested index cache
section_inverted_text_index.go Reset functionality updated to clear field addresses map
section_synonym_index.go Reset functionality updated to clear thesaurus addresses map

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

merge.go:1

  • The word 'descendent' should be spelled 'descendant' to match the correct spelling used elsewhere in the codebase.
//  Copyright (c) 2017 Couchbase, Inc.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@abhinavdangeti
Copy link
Member

abhinavdangeti commented Nov 3, 2025

Per plan, let's target this to the unstable-v17 branch.

@CascadingRadium CascadingRadium changed the base branch from master to v17Stub November 13, 2025 09:35
@CascadingRadium CascadingRadium changed the title MB-27666: Nested Fields [v17] MB-27666: Nested Fields Nov 13, 2025
Base automatically changed from merge to unstable-v17 December 13, 2025 18:26
@CascadingRadium CascadingRadium moved this from Todo to Done in Hierarchy Search Dec 28, 2025
@abhinavdangeti
Copy link
Member

Let's update/rebase this PR and the go.mod entries for bleve_index_api, scorch_segment_api after we take in #356

@abhinavdangeti
Copy link
Member

Time for rebase, also upgrade to [email protected] and [email protected].

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 20 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

// initialize any of the caches if needed
err = rv.nstIndexCache.initialize(rv.numDocs, rv.getEdgeListOffset(), rv.mem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the typical footprint of this? Checking to see if it'd be better to move this to the first time the nstIndexCache is accessed instead like how we do with the other caches.

// such that parents always appear before their children. A reusable edgeList
// can be provided to avoid allocations across multiple calls.
func flattenNestedDocuments(docs []index.Document, edgeList map[uint64]uint64) (
[]index.Document, map[uint64]uint64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility for an early exit here when there're no nested documents, before the allocations below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants