Skip to content

Conversation

@lvca
Copy link
Contributor

@lvca lvca commented Nov 25, 2025

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios

@lvca lvca self-assigned this Nov 25, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lvca, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a substantial refactoring of the index subsystem, primarily to enhance the management and replication of index metadata, with a strong focus on LSM Vector Indexes. By introducing a dedicated IndexMetadata structure, the system ensures consistent handling of index properties across different components. The changes also involve significant internal improvements to the LSM Vector Index, optimizing its data storage, compaction processes, and integration with the transaction manager. These updates aim to make index operations more robust and scalable, especially in replicated environments.

Highlights

  • Index Metadata Refactoring: Introduced a new IndexMetadata object and its specialized LSMVectorIndexMetadata subclass to centralize index configuration, streamlining creation, persistence, and replication of index parameters.
  • LSM Vector Index Enhancements: The LSMVectorIndex has been significantly refactored to utilize the new metadata structure, replace LSMVectorIndexComponent with LSMVectorIndexMutable, and improve its compaction and page management logic, including marking the in-memory graph index as dirty when pages are updated.
  • Float Type Support in Binary Operations: Added putFloat and getFloat methods to Binary.java and BinaryStructure.java, along with readFloat in BasePage.java, providing foundational support for efficient handling of floating-point numbers, crucial for vector indexes.
  • Improved Index Builder Pattern: New builder classes (TypeLSMVectorIndexBuilder, BucketLSMVectorIndexBuilder) have been introduced, and existing IndexBuilder classes updated, to provide a more structured and extensible way to define and create various index types, especially for vector indexes.
  • Consolidated Temporary File Handling: The TEMP_EXT constant and removeTempSuffix() method for managing temporary files during index operations (like compaction/splitting) have been moved from LSMTreeIndexAbstract to the more general PaginatedComponent.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Nov 25, 2025

🧪 CI Insights

Here's what we observed from your CI run for e663a5a.

🟢 All jobs passed!

But CI Insights is watching 👀

@lvca lvca requested a review from Copilot November 25, 2025 16:20
@lvca lvca added the enhancement New feature or request label Nov 25, 2025
@lvca lvca added this to the 25.11.1 milestone Nov 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the index management system, particularly for vector indexes. The introduction of the IndexMetadata class hierarchy and the redesigned IndexBuilder pattern greatly improve the code's structure, type safety, and maintainability. The changes to LSMVectorIndex make it more robust, especially in its compaction and page management logic. I've found a couple of minor issues to address, but overall, this is a high-quality contribution.

lvca and others added 2 commits November 25, 2025 11:24
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the index metadata handling architecture by introducing dedicated IndexMetadata and LSMVectorIndexMetadata classes to centralize metadata management. The refactoring replaces the previous LSMVectorIndexBuilder class with TypeLSMVectorIndexBuilder and BucketLSMVectorIndexBuilder, and updates all index implementations to use the new metadata pattern. Additionally, new replication tests are added for LSM vector indexes.

Key Changes:

  • Introduced IndexMetadata and LSMVectorIndexMetadata classes for centralized metadata management
  • Refactored TypeIndexBuilder and BucketIndexBuilder to use the builder pattern with proper type transitions for vector indexes
  • Replaced LSMVectorIndexBuilder with TypeLSMVectorIndexBuilder and BucketLSMVectorIndexBuilder
  • Updated IndexInternal interface with new getMetadata() and setMetadata() methods
  • Added comprehensive replication tests for vector index creation and compaction

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
engine/src/main/java/com/arcadedb/schema/IndexMetadata.java New base class for index metadata containing type name, property names, and bucket ID
engine/src/main/java/com/arcadedb/schema/LSMVectorIndexMetadata.java New class extending IndexMetadata with vector-specific fields (dimensions, similarity, etc.)
engine/src/main/java/com/arcadedb/schema/TypeLSMVectorIndexBuilder.java New builder for type-level LSM vector indexes replacing old LSMVectorIndexBuilder
engine/src/main/java/com/arcadedb/schema/BucketLSMVectorIndexBuilder.java New builder for bucket-level LSM vector indexes
engine/src/main/java/com/arcadedb/schema/TypeIndexBuilder.java Updated to use IndexMetadata and return appropriate builder types
engine/src/main/java/com/arcadedb/index/vector/LSMVectorIndex.java Refactored to use LSMVectorIndexMetadata instead of individual fields
engine/src/main/java/com/arcadedb/index/IndexInternal.java Updated interface with new metadata methods
server/src/test/java/com/arcadedb/server/ha/IndexCompactionReplicationIT.java Added new tests for vector index replication and compaction
engine/src/main/java/com/arcadedb/engine/PaginatedComponent.java Moved removeTempSuffix method to base class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +20
/*
* Copyright © 2021-present Arcade Data Ltd ([email protected])
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header should be placed at the beginning of the file, before the package statement. The standard Java convention is: copyright header → package → imports → class definition.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +38 to +39
if (metadata.has("similarity"))
this.similarityFunction = VectorSimilarityFunction.valueOf(metadata.getString("similarity"));
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fromJSON method doesn't handle the case where metadata.getString("similarity") returns an invalid enum value. The valueOf() call will throw IllegalArgumentException if the string doesn't match any VectorSimilarityFunction enum constant. Consider adding proper error handling or validation.

Suggested change
if (metadata.has("similarity"))
this.similarityFunction = VectorSimilarityFunction.valueOf(metadata.getString("similarity"));
if (metadata.has("similarity")) {
String similarityStr = metadata.getString("similarity");
try {
this.similarityFunction = VectorSimilarityFunction.valueOf(similarityStr);
} catch (IllegalArgumentException e) {
// Invalid enum value, fallback to default and log warning
System.err.println("Warning: Invalid similarity function '" + similarityStr + "'. Using default: " + VectorSimilarityFunction.COSINE);
this.similarityFunction = VectorSimilarityFunction.COSINE;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +20
/*
* Copyright © 2021-present Arcade Data Ltd ([email protected])
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header should be placed at the beginning of the file, before the package statement. The standard Java convention is: copyright header → package → imports → class definition.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Contributor

Copilot AI commented Nov 25, 2025

@lvca I've opened a new pull request, #2829, to work on those changes. Once the pull request is ready, I'll request review from you.

@codacy-production
Copy link

codacy-production bot commented Nov 25, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+1.01% 69.88%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (a49d1e0) 74639 46264 61.98%
Head commit (409968b) 74651 (+12) 47023 (+759) 62.99% (+1.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#2828) 415 290 69.88%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Copy link
Contributor

Copilot AI commented Nov 25, 2025

@lvca I've opened a new pull request, #2830, to work on those changes. Once the pull request is ready, I'll request review from you.

@lvca lvca merged commit 1a2273b into main Nov 25, 2025
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants