Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

Open
Johnsonisaacn opened this issue Aug 13, 2024 · 8 comments
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label v2.19.0

Comments

@Johnsonisaacn
Copy link

Introduction

This document outlines the design for implementing Reciprocal Rank Fusion, RRF, within the neural search plugin. RRF, first outlined in a 2009 paper by Cormac, Clarke, and Büttcher, is a method for combining rankings from several subqueries, providing better search results than other systems. Risk-reward tradeoffs for different fusion techniques are analyzed in this paper. The general formula for RRF, where k = rank constant and query_j_rank is the ranking for a document when it is returned in a query method in hybrid query, is as follows:

rankScore(document_i) = sum((1/(k + query_1_rank), (1/(k + query_2_rank), ..., 
(1/(k + query_j_rank))

Background

The OpenSearch neural search plugin provides semantic search query ability to users by using ML models for embedded text. It supports hybrid queries, which combine text-based search with neural vector search. This design document describes implementation of the RRF technique to combine results from different parts of hybrid query to improve the search relevance.

One advantage of RRF is that it is simple and provides a standard way of processing results. It is scalable and doesn’t require tuning (other than adjusting rank constant). Its simplicity makes it intuitive for users. The algorithm has also been used successfully by several other organizations already. Moreover, incorporating RRF into hybrid query will facilitate customer migration to OpenSeach from platforms where they are already using RRF. Some customers have already requested it to be added as a technique. Based on these reasons, we believe that RRF will provide a benefit to users performing hybrid query.

Functional Requirements

  1. Reciprocal Rank Fusion (RRF) Implementation: Develop and integrate the RRF technique to use document ranks in individual subquery results to calculate scores, and sum those scores across all subqueries for each document to provide “rank score” to combine hybrid search results
  2. Hybrid Search Integration: Ensure the RRF technique can be seamlessly integrated into the existing hybrid search framework. Prevent interference with sorting(by different fields), concurrent segment search, and aggregations.
  3. Normalization Adjustment: Modify or enhance the current normalization process to utilize ranks of documents in subquery result lists, reducing potential biases from score-based normalization.
  4. Configuration Options: Allow users to control RRF algorithm parameter rank constant (rank constant must not be less than 1). Need to prevent invalid combinations of values in configuration (e.g. mixing current normalization options with RRF options)

Non Functional Requirements

  1. Smooth integration into existing neural search interface and clear instructions for use for customer
  2. Keep latency same within the range of the error

Document Scope

In this document we propose a solution for questions below:

  1. Incorporating RRF into hybrid query under a new processor
  2. Using 1 parameter, rank constant, with a default value of 60
  3. Only change to API is for search pipeline configuration

Out of Document Scope

  1. Providing ability to chain more that one processor together
  2. Adding weights as a parameter for combining document results from different subqueries

Solution Overview

Implement RRF within the neural search plugin by adding two classes, RRFProcessor and RRFProcessorFactory, and calling them from the NeuralSearch class. Rank constant could be set by user during query but have a default value. A value of 60 for rank constant is common and was used in the original paper describing the technique. It must be greater than or equal to 1. We can include guidance for customers showing how the rank scores shrink as the rank constant grows, and vice versa, to help inform their choice.

Solution HLD: Architectural and Component Design

Proposed Solution

  • Modify the hybrid query execution flow to capture the individual result sets from each subquery
  • Implement the RRF algorithm to calculate a combined score for each document based on its ranks across the different result sets.
  • Merge and sort the final result set based on the calculated RRF Rank Scores.
  • Return blended ranks of RRF in form of document scores.

Pros:

  • Two-way door in case implementation has issues, can easily be rolled back
  • Can customize to fit RRF requirements if existing architecture isn’t ideal

Cons:

  • Some duplicating of workflows to set up as new processor
  • Might cause confusion as RRF can be conceptualized as a normalization/combination technique but set up outside of NormalizationProcessor

Potential Issues

Risks / Known limitations and Future extensions

  • Performance impact due to additional computations for RRF scoring and result merging.
  • Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
  • Potential edge cases or corner cases that need to be handled gracefully.


New-RRF-Sequence-Diagram drawio

RRF_classes_8_6 drawio(2)

The following diagram shows the high-level approach for implementing RRF in hybrid query

RRF High Level

Solution LLD

For more information about hybrid search, see the following https://opensearch.org/blog/hybrid-search/
The following example uses the data from this tutorial: https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/
I’ve copied over the scores from the match query and neural query into the table below and used the resulting numbers to calculate the ranks and rank scores

RRF used in hybrid query setting up search pipeline:

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "score-ranker-processor": {
        "combination": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        }
      }
    }
  ]
}

Using hybrid query currently, setting up search pipeline (no changes from current setup):

GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
  "_source": {
    "exclude": [
      "passage_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "cowboy rodeo bronco"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "wild west",
              "model_id": "aVeif4oB5Vm0Tdw8zYO2",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

The scores and ranks are based on the example data:

table1 Screenshot 2024-08-12 at 4 56 07 PM

Implementation Details

  • Query portion of the hybrid query can be reused without any changes
  • RRF algorithm parameters can be set in the pipeline processor configuration
    • Input: rank constant. Optional, we can use following default: 60 for rank constant (cited in the original paper and used successfully in existing implementations). The rank constant must be >= 1 to avoid divide by 0 error
  • First part of RRF algorithm is implemented by scoreRanker
    • Input: List of query result sets from each shard. Each shard result will have results of each individual subquery
    • Performs initial processing of documents to calculate rank score for each document in each subquery
    • Output: Sorted list of query result sets with calculated RRF rank scores for each subquery
  • Summation part of RRF algorithm implemented in same class by rankResults:
    • Input: Sorted list of query result sets with calculated RRF rank scores for each subquery
    • Summing up Rank Scores for documents across subqueries and combining lists of document rank scores from all subqueries and sorting to provide final sorted list of documents to send to fetch phase
    • Output: Sorted list of combined query result sets (sorted by combined rank score)

Benefits

  • Possible improved search relevance by combining results from multiple queries effectively.
  • Increased flexibility in hybrid query construction and result fusion.
  • Seamless integration with the existing neural search plugin and hybrid query infrastructure.
  • An additional “normalization and combination” method that customers may find helpful

Alternatives Considered

Alternative 1:

Implementing RRF as a NormalizationTechnique and CombinationTechnique (RRFNormalizationTechnique and RRFCombinationTechnique classes) that would be called by NormalizationProcessor the same way it currently works when configured with, for example, L2NormalizationTechnique and ArithmeticMeanScoreCombinationTechnique. The RRFNormalizationTechnique class would perform the subquery-level reciprocal rank score calculation part of the algorithm and would pass the rank scores of all documents in each subquery to the RRFCombinationTechnique class to combine each document’s rank scores, then merge and sort the results, returning a list of documents sorted by combined RRF rank scores

Pros:

  • Easy to copy structure of existing classes and add new classes into workflow
  • Easier adoption by users, setup similar to current setup

Cons:

  • Potential for creating problems that are hard to contain within just the RRF implementation
  • Would require intervention to prevent users from mixing and matching normalization and combination techniques
  • more complicated logic for parameter parsing/validation in the factory. RRF and existing processors don't share same parameters, will require branching in logic

Example pipeline configuration call

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        },
        "combination": {
          "technique": "rrf",
        }
      }
    }
  ]
}

Alternative 2:

Implementing RRF as a processor, RRFProcessor at the same level as NormalizationProcessor, registered by a RRFProcessorFactory, and the RRFProcessor calling a RRFNormalizationTechnique and RRFCombinationTechnique as described above

Pros:

  • Similar to Alternative 1, easy to copy structure of existing classes and add new classes into workflow
  • Adding processor at NormalizationProcessor level would prevent issues with need for intervention to prevent users from mixing and matching normalization and combination techniques

Cons:

  • Potential for creating problems that are hard to contain within just the RRF implementation

Example search pipeline configuration call

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "score-ranker-processor": {
        "normalization": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        },
        "combination": {
          "technique": "rrf",
        }
      }
    }
  ]
}

Potential Issues

Risks / Known limitations and Future extensions

  • Performance impact due to additional computations for RRF scoring and result merging.
  • Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
  • Potential edge cases or corner cases that need to be handled gracefully.

Backward Compatibility

Will be backward compatible, will ensure that omitting RRF-specific parameters in request configurations will not cause problems

Testability

I will be writing unit tests and integration tests for this implementation

Benchmarking

Benchmarking will involve testing across several datasets used in the initial normalization implementation testing (NFCorpus, Trec-Covid, ArguAna, FiQA, Scifact, DBPedia, Quora, Scidocs, CQADupStack, Amazon ESCI) and comparing the average nDCG@10 score against that of the L2 and Min-Max normalization techniques combined with the Arithmetic, Geometric, and Harmonic combination techniques, using BM25 nDCG@10 scores on these datasets as the baseline. We will also update nightlies benchmark runs to capture performance.

Feedback Required

Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined?
Are there concerns about or objections to incorporating Reciprocal Rank Fusion (RRF) as a new processor, instead of integrating it into the existing NormalizationProcessor? If there are foreseeable problems with this approach, what would a better alternative look like?

@martin-gaievski martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement v2.18.0 and removed untriaged labels Aug 13, 2024
@yuye-aws
Copy link
Member

yuye-aws commented Sep 5, 2024

Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined?

IMO, the customer should provide the weights for different subqueries. Just out of curiosity, how to adapt the RRF formula by user-determined weights?

@yuye-aws
Copy link
Member

yuye-aws commented Sep 5, 2024

  • Potential for creating problems that are hard to contain within just the RRF implementation

Intuitively, alternative 2 makes more sense to me. Can you elaborate more on this con?

@martin-gaievski
Copy link
Member

We can definitely add weights, it's good addition.
As for the Alternative 2 - that implies that rrf is split into normalization technique and combination technique, and that is exposed to the user in a same way as we do for other techniques today. But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

@vamshin vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Sep 30, 2024
@minalsha minalsha added v2.19.0 and removed v2.18.0 labels Oct 14, 2024
@yuye-aws
Copy link
Member

But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

I understand that rrf can only be used with rrf. We can mandate our user to use them when validating parameters.

@yuye-aws
Copy link
Member

yuye-aws commented Oct 15, 2024

Hi @martin-gaievski ! After the PR is almost code ready, could you please showcase the impact of RRF processor on search latency. Just like what Zhichao and Cong have done in the two phase blog: https://opensearch.org/blog/Introducing-a-neural-sparse-two-phase-algorithm/

@martin-gaievski
Copy link
Member

Hi @martin-gaievski ! After the PR is almost code ready, could you please showcase the impact of RRF processor on search latency. Just like what Zhichao and Cong have done in the two phase blog: https://opensearch.org/blog/Introducing-a-neural-sparse-two-phase-algorithm/

We're planning to run the benchmark in regards to relevancy and performance after implementation reaches code-complete state in feature branch. My forecast is - it shouldn't be much different from normalization processor, we're using same approach but less complex computation logic.

@martin-gaievski
Copy link
Member

But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

I understand that rrf can only be used with rrf. We can mandate our user to use them when validating parameters.

Fundamentally score normalization and rank based combination are different. What's the benefit of adding normalization technique parameter now? We can add normalization later if that's needed, it will be backward compatible change.

@yuye-aws
Copy link
Member

We're planning to run the benchmark in regards to relevancy and performance after implementation reaches code-complete state in feature branch. My forecast is - it shouldn't be much different from normalization processor, we're using same approach but less complex computation logic.

Agree that this is similar with the normalization processor. Please also take search latency into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label v2.19.0
Projects
Status: New
Status: 2.19.0
Development

No branches or pull requests

5 participants