Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Pagination in hybrid query #963

Open
wants to merge 13 commits into
base: Pagination_in_hybridQuery
Choose a base branch
from

Conversation

vibrantvarun
Copy link
Member

@vibrantvarun vibrantvarun commented Oct 23, 2024

Description

This PR contains changes for enabling support for pagination in hybrid query.
The highlight of this PR are

  1. Introduction of a new parameter "pagination_depth" to set a reference of hybrid query search results on which pagination can be applied.
  2. Handling of single shard scenario where fetch phase can run before the normalization process.
  3. Handling of from parameter conditions in Normalization processor.
  4. Disabling scroll operation in hybrid query.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
@vibrantvarun vibrantvarun changed the title Pagination in hybrid query [Feature] Pagination in hybrid query Oct 23, 2024
int paginationDepth;
HybridQuery hybridQuery;
Query query = searchContext.query();
if (query instanceof BooleanQuery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when such scenario can happen? we should not allow hybrid if it's not the top clause

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the comment on top why it is written like that. Basically in case of nested fields and alias filter, hybrid query gets wrapped under bool query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple case with nested fields is taken care of in QueryPhaseSearcher class. This logic doesn't belong here, unless this is a special scenario that we missed in that extract query method I mentioned above.

Query query = searchContext.query();
if (query instanceof BooleanQuery) {
BooleanQuery booleanQuery = (BooleanQuery) query;
hybridQuery = (HybridQuery) booleanQuery.clauses().get(0).getQuery();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks hacky, can we avoid this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the comment on top why it is written like that. Basically in case of nested fields and alias filter, hybrid query gets wrapped under bool query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my previous comment, this should be handled in here

@@ -1,2 +1,2 @@
# This should match the owning team set up in https://github.com/orgs/opensearch-project/teams
* @heemin32 @navneet1v @VijayanB @vamshin @jmazanec15 @naveentatikonda @junqiu-lei @martin-gaievski @sean-zheng-amazon @model-collapse @zane-neo @ylwu-amzn @jngz-es @vibrantvarun @zhichao-aws @yuye-aws
* @heemin32 @navneet1v @VijayanB @vamshin @jmazanec15 @naveentatikonda @junqiu-lei @martin-gaievski @sean-zheng-amazon @model-collapse @zane-neo @vibrantvarun @zhichao-aws @yuye-aws @minalsha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like artifacts of improper rebase, can you please rebase on main properly

@@ -48,16 +50,23 @@ public final class HybridQueryBuilder extends AbstractQueryBuilder<HybridQueryBu
public static final String NAME = "hybrid";

private static final ParseField QUERIES_FIELD = new ParseField("queries");
private static final ParseField DEPTH_FIELD = new ParseField("pagination_depth");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final ParseField DEPTH_FIELD = new ParseField("pagination_depth");
private static final ParseField PAGINATION_DEPTH_FIELD = new ParseField("pagination_depth");

public final class HybridQuery extends Query implements Iterable<Query> {

private final List<Query> subQueries;
private Integer paginationDepth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is not primitive int? Operating with wrapper class is potentially error prone when boxing/unboxing a null value.

querySearchResult.topDocs(updatedTopDocsAndMaxScore, querySearchResult.sortValueFormats());
}

final int from = querySearchResults.get(0).from();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need final?

querySearchResult.topDocs(updatedTopDocsAndMaxScore, querySearchResult.sortValueFormats());
}

final int from = querySearchResults.get(0).from();
if (from > 0 && from > totalScoreDocsCount) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first check looks redundant, can't we rely only on from > totalScoreDocsCount?

@@ -637,6 +639,7 @@ public void testWrappedQueryWithFilter_whenIndexAliasHasFilterAndIndexWithNested

HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder();
hybridQueryBuilder.add(QueryBuilders.existsQuery(TEST_TEXT_FIELD_NAME_1));
// hybridQueryBuilder.paginationDepth(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional?

@SneakyThrows
public void testHybridQuery_whenFromIsSetInSearchRequest_thenFail() {
public void testPaginationDepth_whenSubqueriesCountIsGreaterThanFive_thenFail() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we testing here? check for max sub-query limit is part of the HybridQueryBuilderTests.

@@ -97,6 +109,9 @@ protected void doXContent(XContentBuilder builder, Params params) throws IOExcep
queryBuilder.toXContent(builder, params);
}
builder.endArray();
if (isClusterOnOrAfterMinReqVersionForPaginationInHybridQuery()) {
builder.field(DEPTH_FIELD.getPreferredName(), paginationDepth == null ? DEFAULT_PAGINATION_DEPTH : paginationDepth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we're doing today's logic? from this statement we'll either get user provided value or default which is 10. Today's logic is equivalent to PD equals to 1, I'm not exactly sure, but definitely not 10.

if (paginationDepth != null
&& (paginationDepth < LOWER_BOUND_OF_PAGINATION_DEPTH || paginationDepth > UPPER_BOUND_OF_PAGINATION_DEPTH)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Pagination depth should lie in the range of 1-1000. Received: %s", paginationDepth)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please came up with better error message. The range is static here, but that's different in the logic where you have variables for lower and upper bounds, and we can drop the user-provided value as that's potentially unsafe, actual value should be obvious to user from the search request

}

@SneakyThrows
public void testPaginationOnSingleShard_whenConcurrentSearchEnabled_thenSuccessful() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're basically put all tests for pagination to integ test. Can this be moved to unit tests, so we have only minimal increase in new integ tests. If that's not possible please consider moving them to a new class, this one because way overloaded, about 1000 lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants