[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

Johnsonisaacn · 2024-08-13T00:10:45Z

Introduction

This document outlines the design for implementing Reciprocal Rank Fusion, RRF, within the neural search plugin. RRF, first outlined in a 2009 paper by Cormac, Clarke, and Büttcher, is a method for combining rankings from several subqueries, providing better search results than other systems. Risk-reward tradeoffs for different fusion techniques are analyzed in this paper. The general formula for RRF, where k = rank constant and query_j_rank is the ranking for a document when it is returned in a query method in hybrid query, is as follows:

rankScore(document_i) = sum((1/(k + query_1_rank), (1/(k + query_2_rank), ..., 
(1/(k + query_j_rank))

Background

The OpenSearch neural search plugin provides semantic search query ability to users by using ML models for embedded text. It supports hybrid queries, which combine text-based search with neural vector search. This design document describes implementation of the RRF technique to combine results from different parts of hybrid query to improve the search relevance.

One advantage of RRF is that it is simple and provides a standard way of processing results. It is scalable and doesn’t require tuning (other than adjusting rank constant). Its simplicity makes it intuitive for users. The algorithm has also been used successfully by several other organizations already. Moreover, incorporating RRF into hybrid query will facilitate customer migration to OpenSeach from platforms where they are already using RRF. Some customers have already requested it to be added as a technique. Based on these reasons, we believe that RRF will provide a benefit to users performing hybrid query.

Functional Requirements

Reciprocal Rank Fusion (RRF) Implementation: Develop and integrate the RRF technique to use document ranks in individual subquery results to calculate scores, and sum those scores across all subqueries for each document to provide “rank score” to combine hybrid search results
Hybrid Search Integration: Ensure the RRF technique can be seamlessly integrated into the existing hybrid search framework. Prevent interference with sorting(by different fields), concurrent segment search, and aggregations.
Normalization Adjustment: Modify or enhance the current normalization process to utilize ranks of documents in subquery result lists, reducing potential biases from score-based normalization.
Configuration Options: Allow users to control RRF algorithm parameter rank constant (rank constant must not be less than 1). Need to prevent invalid combinations of values in configuration (e.g. mixing current normalization options with RRF options)

Non Functional Requirements

Smooth integration into existing neural search interface and clear instructions for use for customer
Keep latency same within the range of the error

Document Scope

In this document we propose a solution for questions below:

Incorporating RRF into hybrid query under a new processor
Using 1 parameter, rank constant, with a default value of 60
Only change to API is for search pipeline configuration

Out of Document Scope

Providing ability to chain more that one processor together
Adding weights as a parameter for combining document results from different subqueries

Solution Overview

Implement RRF within the neural search plugin by adding two classes, RRFProcessor and RRFProcessorFactory, and calling them from the NeuralSearch class. Rank constant could be set by user during query but have a default value. A value of 60 for rank constant is common and was used in the original paper describing the technique. It must be greater than or equal to 1. We can include guidance for customers showing how the rank scores shrink as the rank constant grows, and vice versa, to help inform their choice.

Solution HLD: Architectural and Component Design

Proposed Solution

Modify the hybrid query execution flow to capture the individual result sets from each subquery
Implement the RRF algorithm to calculate a combined score for each document based on its ranks across the different result sets.
Merge and sort the final result set based on the calculated RRF Rank Scores.
Return blended ranks of RRF in form of document scores.

Pros:

Two-way door in case implementation has issues, can easily be rolled back
Can customize to fit RRF requirements if existing architecture isn’t ideal

Cons:

Some duplicating of workflows to set up as new processor
Might cause confusion as RRF can be conceptualized as a normalization/combination technique but set up outside of NormalizationProcessor

Potential Issues

Risks / Known limitations and Future extensions

Performance impact due to additional computations for RRF scoring and result merging.
Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
Potential edge cases or corner cases that need to be handled gracefully.

The following diagram shows the high-level approach for implementing RRF in hybrid query

Solution LLD

For more information about hybrid search, see the following https://opensearch.org/blog/hybrid-search/
The following example uses the data from this tutorial: https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/
I’ve copied over the scores from the match query and neural query into the table below and used the resulting numbers to calculate the ranks and rank scores

RRF used in hybrid query setting up search pipeline:

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "score-ranker-processor": {
        "combination": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        }
      }
    }
  ]
}

Using hybrid query currently, setting up search pipeline (no changes from current setup):

GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
  "_source": {
    "exclude": [
      "passage_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "cowboy rodeo bronco"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "wild west",
              "model_id": "aVeif4oB5Vm0Tdw8zYO2",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

The scores and ranks are based on the example data:

Implementation Details

Query portion of the hybrid query can be reused without any changes
RRF algorithm parameters can be set in the pipeline processor configuration
- Input: rank constant. Optional, we can use following default: 60 for rank constant (cited in the original paper and used successfully in existing implementations). The rank constant must be >= 1 to avoid divide by 0 error
First part of RRF algorithm is implemented by scoreRanker
- Input: List of query result sets from each shard. Each shard result will have results of each individual subquery
- Performs initial processing of documents to calculate rank score for each document in each subquery
- Output: Sorted list of query result sets with calculated RRF rank scores for each subquery
Summation part of RRF algorithm implemented in same class by rankResults:
- Input: Sorted list of query result sets with calculated RRF rank scores for each subquery
- Summing up Rank Scores for documents across subqueries and combining lists of document rank scores from all subqueries and sorting to provide final sorted list of documents to send to fetch phase
- Output: Sorted list of combined query result sets (sorted by combined rank score)

Benefits

Possible improved search relevance by combining results from multiple queries effectively.
Increased flexibility in hybrid query construction and result fusion.
Seamless integration with the existing neural search plugin and hybrid query infrastructure.
An additional “normalization and combination” method that customers may find helpful

Alternatives Considered

Alternative 1:

Implementing RRF as a NormalizationTechnique and CombinationTechnique (RRFNormalizationTechnique and RRFCombinationTechnique classes) that would be called by NormalizationProcessor the same way it currently works when configured with, for example, L2NormalizationTechnique and ArithmeticMeanScoreCombinationTechnique. The RRFNormalizationTechnique class would perform the subquery-level reciprocal rank score calculation part of the algorithm and would pass the rank scores of all documents in each subquery to the RRFCombinationTechnique class to combine each document’s rank scores, then merge and sort the results, returning a list of documents sorted by combined RRF rank scores

Pros:

Easy to copy structure of existing classes and add new classes into workflow
Easier adoption by users, setup similar to current setup

Cons:

Potential for creating problems that are hard to contain within just the RRF implementation
Would require intervention to prevent users from mixing and matching normalization and combination techniques
more complicated logic for parameter parsing/validation in the factory. RRF and existing processors don't share same parameters, will require branching in logic

Example pipeline configuration call

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        },
        "combination": {
          "technique": "rrf",
        }
      }
    }
  ]
}

Alternative 2:

Implementing RRF as a processor, RRFProcessor at the same level as NormalizationProcessor, registered by a RRFProcessorFactory, and the RRFProcessor calling a RRFNormalizationTechnique and RRFCombinationTechnique as described above

Pros:

Similar to Alternative 1, easy to copy structure of existing classes and add new classes into workflow
Adding processor at NormalizationProcessor level would prevent issues with need for intervention to prevent users from mixing and matching normalization and combination techniques

Cons:

Potential for creating problems that are hard to contain within just the RRF implementation

Example search pipeline configuration call

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "score-ranker-processor": {
        "normalization": {
          "technique": "rrf",
          "parameters": {
            "rank_constant": 60
          }
        },
        "combination": {
          "technique": "rrf",
        }
      }
    }
  ]
}

Potential Issues

Risks / Known limitations and Future extensions

Performance impact due to additional computations for RRF scoring and result merging.
Compatibility with existing processors - need to identify potential areas of interference between processors. For normalization processor risks are low, the processor that is configured first will publish results, the next one should be ignored and has no effect.
Potential edge cases or corner cases that need to be handled gracefully.

Backward Compatibility

Will be backward compatible, will ensure that omitting RRF-specific parameters in request configurations will not cause problems

Testability

I will be writing unit tests and integration tests for this implementation

Benchmarking

Benchmarking will involve testing across several datasets used in the initial normalization implementation testing (NFCorpus, Trec-Covid, ArguAna, FiQA, Scifact, DBPedia, Quora, Scidocs, CQADupStack, Amazon ESCI) and comparing the average nDCG@10 score against that of the L2 and Min-Max normalization techniques combined with the Arithmetic, Geometric, and Harmonic combination techniques, using BM25 nDCG@10 scores on these datasets as the baseline. We will also update nightlies benchmark runs to capture performance.

Feedback Required

Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined?
Are there concerns about or objections to incorporating Reciprocal Rank Fusion (RRF) as a new processor, instead of integrating it into the existing NormalizationProcessor? If there are foreseeable problems with this approach, what would a better alternative look like?

The text was updated successfully, but these errors were encountered:

yuye-aws · 2024-09-05T00:29:53Z

Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined?

IMO, the customer should provide the weights for different subqueries. Just out of curiosity, how to adapt the RRF formula by user-determined weights?

yuye-aws · 2024-09-05T00:34:33Z

Potential for creating problems that are hard to contain within just the RRF implementation

Intuitively, alternative 2 makes more sense to me. Can you elaborate more on this con?

martin-gaievski · 2024-09-13T00:01:37Z

We can definitely add weights, it's good addition.
As for the Alternative 2 - that implies that rrf is split into normalization technique and combination technique, and that is exposed to the user in a same way as we do for other techniques today. But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

yuye-aws · 2024-10-15T09:44:52Z

But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

I understand that rrf can only be used with rrf. We can mandate our user to use them when validating parameters.

yuye-aws · 2024-10-15T10:06:16Z

Hi @martin-gaievski ! After the PR is almost code ready, could you please showcase the impact of RRF processor on search latency. Just like what Zhichao and Cong have done in the two phase blog: https://opensearch.org/blog/Introducing-a-neural-sparse-two-phase-algorithm/

martin-gaievski · 2024-10-16T15:52:25Z

Hi @martin-gaievski ! After the PR is almost code ready, could you please showcase the impact of RRF processor on search latency. Just like what Zhichao and Cong have done in the two phase blog: https://opensearch.org/blog/Introducing-a-neural-sparse-two-phase-algorithm/

We're planning to run the benchmark in regards to relevancy and performance after implementation reaches code-complete state in feature branch. My forecast is - it shouldn't be much different from normalization processor, we're using same approach but less complex computation logic.

martin-gaievski · 2024-10-16T15:58:06Z

But that will not be supported because parts of rrf cannot be combined with other techniques, I think that's the main issue with that option

I understand that rrf can only be used with rrf. We can mandate our user to use them when validating parameters.

Fundamentally score normalization and rank based combination are different. What's the benefit of adding normalization technique parameter now? We can add normalization later if that's needed, it will be backward compatible change.

yuye-aws · 2024-10-18T06:47:49Z

We're planning to run the benchmark in regards to relevancy and performance after implementation reaches code-complete state in feature branch. My forecast is - it shouldn't be much different from normalization processor, we're using same approach but less complex computation logic.

Agree that this is similar with the normalization processor. Please also take search latency into account.

github-actions bot added the untriaged label Aug 13, 2024

martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement v2.18.0 and removed untriaged labels Aug 13, 2024

Johnsonisaacn mentioned this issue Aug 28, 2024

Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874

Merged

5 tasks

vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Sep 30, 2024

minalsha added v2.19.0 and removed v2.18.0 labels Oct 14, 2024

minalsha assigned martin-gaievski Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

Johnsonisaacn commented Aug 13, 2024

yuye-aws commented Sep 5, 2024

yuye-aws commented Sep 5, 2024

martin-gaievski commented Sep 13, 2024

yuye-aws commented Oct 15, 2024

yuye-aws commented Oct 15, 2024 •

edited

Loading

martin-gaievski commented Oct 16, 2024

martin-gaievski commented Oct 16, 2024

yuye-aws commented Oct 18, 2024

[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865

Comments

Johnsonisaacn commented Aug 13, 2024

Introduction

Background

Functional Requirements

Non Functional Requirements

Document Scope

Out of Document Scope

Solution Overview

Solution HLD: Architectural and Component Design

Proposed Solution

Potential Issues

Solution LLD

Implementation Details

Alternatives Considered

Alternative 1:

Alternative 2:

Potential Issues

Backward Compatibility

Testability

Benchmarking

Feedback Required

yuye-aws commented Sep 5, 2024

yuye-aws commented Sep 5, 2024

martin-gaievski commented Sep 13, 2024

yuye-aws commented Oct 15, 2024

yuye-aws commented Oct 15, 2024 • edited Loading

martin-gaievski commented Oct 16, 2024

martin-gaievski commented Oct 16, 2024

yuye-aws commented Oct 18, 2024

yuye-aws commented Oct 15, 2024 •

edited

Loading