sparse: refactor approx dimension max score ratio #1029

sparknack · 2025-01-15T09:02:56Z

Move the dimension max score ratio from build params to search params, and rename it from wand_bm25_max_score_ratio to dim_max_score_ratio.
Remove template param bm25 and add a new SparseMetricType.
Wrap some params of Search() to InvertedIndexSearchParams.

mergify · 2025-01-15T09:03:36Z

@sparknack 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

If you're fixing a bug, label it as kind/bug.
For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

zhengbuqian · 2025-01-15T11:27:31Z

src/index/sparse/sparse_index_node.cc

+        auto refine_factor = cfg.refine_factor.value_or(10);
+        // if no data was dropped during search, no refinement is needed.
+        if (drop_ratio_search == 0) {
+            refine_factor = 1;


a dim_max_score_ratio < 1 also leads to info loss, we still want to do refine

For now, we keep the choice of no refinement. Here's why:

A value of refine_factor higher than 1 tends to cause the result of dim_max_score_ratio to regress, and it doesn't bring obvious benefits to the growth of recall.

Test Results with MSMARCO BM25

max score ratio refine_factor Recall rate of WAND(%) Recall rate of MaxScore(%) Query time of WAND(ms) Query time of MaxScore(ms)

1.1 1 0.996691 0.996576 4002 3861

1 1 0.996691 0.996576 3375 3456

0.9 1 0.988983 0.992708 2599 2908

0.8 1 0.942851 0.981490 1999 2417

0.7 1 0.797994 0.936948 1474 1912

0.6 1 0.568309 0.793997 1080 1409

0.5 1 0.369398 0.546805 827 879

0.9 2 0.993095 0.993897 4810 4691

0.8 2 0.970501 0.987607 4129 4178

0.7 2 0.880444 0.962579 3515 3609

0.6 2 0.709699 0.872235 3014 3018

0.5 2 0.516633 0.668095 2587 2296

zhengbuqian · 2025-01-15T11:28:35Z

src/index/sparse/sparse_inverted_index_config.h

         */
-        KNOWHERE_CONFIG_DECLARE_FIELD(wand_bm25_max_score_ratio)
+        KNOWHERE_CONFIG_DECLARE_FIELD(dim_max_score_ratio)
            .set_range(0.5, 1.3)
            .set_default(1.05)
            .description("ratio to upscale/downscale the max score of each dimension")


if set to below 1, make sure to check if you want to set refine_factor to a lower value

codecov · 2025-01-15T15:19:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.12%. Comparing base (3c46f4c) to head (7f1aecb).
Report is 296 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##           main    #1029       +/-   ##
=========================================
+ Coverage      0   73.12%   +73.12%     
=========================================
  Files         0       82       +82     
  Lines         0     7495     +7495     
=========================================
+ Hits          0     5481     +5481     
- Misses        0     2014     +2014

see 82 files with indirect coverage changes

sparknack · 2025-01-16T03:05:59Z

issue: #1035

sparknack · 2025-01-16T03:55:46Z

/kind improvement

sre-ci-robot · 2025-01-16T03:55:49Z

@sparknack: The label(s) kind/improvment cannot be applied, because the repository doesn't have them.

In response to this:

/kind improvment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sparknack · 2025-01-16T03:56:22Z

/kind improvement

zhengbuqian · 2025-01-16T04:08:02Z

/lgtm
/approve

1. Move the dimension max score ratio from build params to search params, and rename it from `wand_bm25_max_score_ratio` to `dim_max_score_ratio`. 2. Remove template param `bm25` and add a new `SparseMetricType`. 3. Wrap some params of `Search()` to `InvertedIndexApproxSearchParams`. Signed-off-by: Shawn Wang <[email protected]>

Signed-off-by: Shawn Wang <[email protected]>

foxspy

/lgtm

sre-ci-robot · 2025-01-16T11:48:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: foxspy, sparknack, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [foxspy,zhengbuqian]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot requested review from cqy123456 and zhengbuqian January 15, 2025 09:03

sre-ci-robot added the size/L label Jan 15, 2025

mergify bot added the dco-passed label Jan 15, 2025

mergify bot added the do-not-merge/missing-related-issue label Jan 15, 2025

sparknack force-pushed the sparse-approx branch 3 times, most recently from f5f5eb7 to bebde54 Compare January 15, 2025 10:46

zhengbuqian reviewed Jan 15, 2025

View reviewed changes

sparknack force-pushed the sparse-approx branch from bebde54 to 5c86ee0 Compare January 15, 2025 12:47

sre-ci-robot added the kind/improvement label Jan 16, 2025

mergify bot removed the do-not-merge/missing-related-issue label Jan 16, 2025

sre-ci-robot assigned zhengbuqian Jan 16, 2025

sre-ci-robot added lgtm approved labels Jan 16, 2025

sparknack force-pushed the sparse-approx branch from 5c86ee0 to e85e59c Compare January 16, 2025 06:45

sre-ci-robot removed the lgtm label Jan 16, 2025

mergify bot added the ci-passed label Jan 16, 2025

sparse: override CheckAndAdjust for inverted_index_algo config

7f1aecb

Signed-off-by: Shawn Wang <[email protected]>

mergify bot added ci-passed and removed ci-passed labels Jan 16, 2025

foxspy approved these changes Jan 16, 2025

View reviewed changes

sre-ci-robot assigned foxspy Jan 16, 2025

sre-ci-robot added the lgtm label Jan 16, 2025

sre-ci-robot merged commit 7dc867d into zilliztech:main Jan 16, 2025
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse: refactor approx dimension max score ratio #1029

sparse: refactor approx dimension max score ratio #1029

sparknack commented Jan 15, 2025

mergify bot commented Jan 15, 2025

zhengbuqian Jan 15, 2025

sparknack Jan 15, 2025

zhengbuqian Jan 15, 2025

codecov bot commented Jan 15, 2025 •

edited

Loading

sparknack commented Jan 16, 2025

sparknack commented Jan 16, 2025 •

edited

Loading

sre-ci-robot commented Jan 16, 2025

sparknack commented Jan 16, 2025

zhengbuqian commented Jan 16, 2025

foxspy left a comment

sre-ci-robot commented Jan 16, 2025

max score ratio	refine_factor	Recall rate of WAND(%)	Recall rate of MaxScore(%)	Query time of WAND(ms)	Query time of MaxScore(ms)
1.1	1	0.996691	0.996576	4002	3861
1	1	0.996691	0.996576	3375	3456
0.9	1	0.988983	0.992708	2599	2908
0.8	1	0.942851	0.981490	1999	2417
0.7	1	0.797994	0.936948	1474	1912
0.6	1	0.568309	0.793997	1080	1409
0.5	1	0.369398	0.546805	827	879
0.9	2	0.993095	0.993897	4810	4691
0.8	2	0.970501	0.987607	4129	4178
0.7	2	0.880444	0.962579	3515	3609
0.6	2	0.709699	0.872235	3014	3018
0.5	2	0.516633	0.668095	2587	2296

sparse: refactor approx dimension max score ratio #1029

sparse: refactor approx dimension max score ratio #1029

Conversation

sparknack commented Jan 15, 2025

mergify bot commented Jan 15, 2025

zhengbuqian Jan 15, 2025

Choose a reason for hiding this comment

sparknack Jan 15, 2025

Choose a reason for hiding this comment

Test Results with MSMARCO BM25

zhengbuqian Jan 15, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 15, 2025 • edited Loading

Codecov Report

sparknack commented Jan 16, 2025

sparknack commented Jan 16, 2025 • edited Loading

sre-ci-robot commented Jan 16, 2025

sparknack commented Jan 16, 2025

zhengbuqian commented Jan 16, 2025

foxspy left a comment

Choose a reason for hiding this comment

sre-ci-robot commented Jan 16, 2025

codecov bot commented Jan 15, 2025 •

edited

Loading

sparknack commented Jan 16, 2025 •

edited

Loading