Skip to content

Bug: Duplicate neighbor slots produced during incremental indexing(update, remove) in HNSW reverse links #698

@yoonseok-kim

Description

@yoonseok-kim

Describe the bug

There is a bug in the reverse-link formation logic that can produce duplicate neighbor entries in HNSW graphs.

In form_reverse_links_, when the neighbor list size is below connectivity_max, the new slot is appended without checking whether it already exists:

// If `new_slot` is already present in the neighboring connections of `close_slot`
// then no need to modify any connections or run the heuristics.
if (close_header.size() < connectivity_max) {
close_header.push_back(new_slot);
continue;
}

In incremental update scenarios(frequent update + remove), the same slot may be inserted multiple times. (despite the comment stating that nothing should be done if the slot already exists.)

For reference, It is likely that this issue does not occur when the number of connections of a neighbor node exceeds M (or 2M). In such cases, the refinement(heuristic) process(refine_) is expected to naturally eliminate duplicate slots.

Fix

Adding a uniqueness check in this branch prevents duplicate neighbor entries.

if (close_header.size() < connectivity_max) {
    if (std::all_of(close_header.begin(), close_header.end(),
                    [new_slot](compressed_slot_t slot) {
                        return slot != new_slot;
                    })) {
        close_header.push_back(new_slot);
    }
    continue;
}

I have submitted a bugfix PR(#699) for this issue. I would appreciate it if you could review it.

Steps to reproduce

Perform repeated update(with different vector values) and remove operations on the same slot in index_dense_gt. This would leads to a situation where the same slot can be inserted multiple times during reverse-link formation.
(In my case, I reproduced the issue by implementing deletion manually on top of index_gt, but it is expected to occur in index_dense_gt as well.)

Expected behavior

No duplicate slots should exist in the neighbor list.

USearch version

v2.23.0

Operating System

Red Hat Enterprise Linux release 8.6

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

[email protected]

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions