Skip to content

Unexpected Behavior in InfoNCE Loss Implementation for Qwen Embedding Training #6969

@Alireza3242

Description

@Alireza3242

Describe the bug

I'm training a Qwen embedding model using the InfoNCE loss function and have encountered what seems to be a bug or unintended behavior in its implementation.

Let’s define the batch components: queries as q[0], q[1], ..., q[B-1], positive documents as p[0], p[1], ..., p[B-1], and negative documents as n[0][0:k], ..., n[B-1][0:k]. So, each query has k associated negative documents.

Issue 1: INFONCE_USE_BATCH Flag Behaves Incorrectly
When I set INFONCE_USE_BATCH = True, the loss function incorrectly computes the similarity between q[i] and n[j][0:k]—the negative documents from other samples in the batch.
The expected behavior is that when this flag is enabled, q[i] should only be compared to the positive documents from other batch samples. It should not involve negatives from other queries. However, the current implementation seems to incorporate both positive and negative documents from all batch samples, which appears to be a bug.

Issue 2: INFONCE_MASK_FAKE_NEGATIVE Incorrectly Masks Hard Negatives
When I enable INFONCE_MASK_FAKE_NEGATIVE, the function applies its "fake negative" masking check even to my hard negatives.
This is problematic because:

  • I have explicitly curated these hard negatives. While they are challenging for the model, I am certain they are true negatives.

  • In contrast, I am not certain if documents from other random samples in the batch are truly negative. The masking logic is intended for this latter, uncertain case.
    The ideal (and expected) behavior is for the masking check to be applied only to documents from other batch samples, not to the pre-defined hard negatives. Including hard negatives in this mask weakens the training signal.

In addition to the functional bugs previously mentioned, there is a significant issue regarding code maintainability and understandability.
The function implementing the InfoNCE loss is very long and complex. This makes it difficult to read, debug, and modify.
To improve code clarity and reduce its cognitive complexity (as measured by tools like SonarQube), this monolithic function should be refactored into smaller, well-named, and focused helper functions. These helper functions would then be called within a main, high-level compute_infonce_loss function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions