# Milvus Hybrid Search 报错 "Group value is nil" #45755

StevenLLM · 2025-11-21T05:43:26Z

StevenLLM
Nov 21, 2025

Milvus Hybrid Search 报错 "Group value is nil" 但数据确实存在 group value

问题描述

在执行 Milvus 混合搜索（Hybrid Search）时，使用 group_by_field 参数进行分组搜索时，Milvus 抛出异常：MilvusException: (code=65535, message=Group value is nil)。但通过验证，我明确知道指定的 file_id 在 chunks 集合中确实存在且有值。

这个问题之前也出现过，当时通过清空对应 collection 的数据后，问题就解决了，可以按照预期搜索到结果，没有报错，过几天就会重复出现。

错误信息

2025-11-21 11:32:45,655 [ERROR][handler]: RPC error: [hybrid_search], <MilvusException: (code=65535, message=Group value is nil)>, <Time:{'RPC start': '2025-11-21 11:32:45.640549', 'RPC error': '2025-11-21 11:32:45.655175'}> (decorators.py:140)
2025-11-21 11:32:45,655 [ERROR][hybrid_search]: Failed to hybrid search collection: chunks (milvus_client.py:359)
2025-11-21 11:32:45.656 | ERROR    | app.infrastructure.logging.logger:error:111 - {"timestamp": "2025-11-21T11:32:45.656289", "trace_id": "no-trace-context", "span_id": "", "message": "Failed to perform hybrid search in collection chunks: <MilvusException: (code=65535, message=Group value is nil)>", "logger": "smart-world", "span_type": "business"}
FAILED

环境版本

pymilvus: 2.6.3
Milvus Docker: v2.6.0 (镜像: docker.1panel.dev/milvusdb/milvus:v2.6.0)
Python: 3.13

Milvus 表结构 (chunks collection)

        schema.add_field(
            field_name="id",
            datatype=DataType.VARCHAR,
            is_primary=True,
            max_length=200
        )

        schema.add_field(
            field_name="embedding",
            datatype=DataType.FLOAT_VECTOR,
            dim=self._embedding_dim
        )

        schema.add_field(
            field_name="file_id",
            datatype=DataType.VARCHAR,
            max_length=50
        )

        schema.add_field(
            field_name="content",
            datatype=DataType.VARCHAR,
            max_length=65535,
            analyzer_params=analyzer_params,
            enable_match=True,
            enable_analyzer=True
        )
            
        schema.add_field(
            field_name="sparse_vector",
            datatype=DataType.SPARSE_FLOAT_VECTOR
        )

        schema.add_field(
            field_name="bm25_vector",
            datatype=DataType.SPARSE_FLOAT_VECTOR
        )

        bm25_function = Function(
            name="bm25",
            function_type=FunctionType.BM25,
            input_field_names=["content"],
            output_field_names=["bm25_vector"]
        )
        
        schema.add_function(bm25_function)


索引：
- `embedding`: HNSW 索引
- `file_id`: TRIE 索引
- `bm25_vector`: SPARSE_INVERTED_INDEX 索引

### Request 构建

```python
# 过滤条件
filter_expr = f'file_id in ["348928174660440064", "348928171699261440"]'

# Dense 搜索请求
dense_request = AnnSearchRequest(
    data=query_embedding,
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 100}},
    limit=ann_limit,
    expr=filter_expr
)

# BM25 搜索请求
bm25_request = AnnSearchRequest(
    data=[query],
    anns_field="bm25_vector",
    param={"metric_type": "BM25"},
    limit=ann_limit,
    expr=filter_expr
)

# RRF Ranker
rrf_ranker = Function(
    name="rrf",
    input_field_names=[],
    function_type=FunctionType.RERANK,
    params={"reranker": "rrf", "k": 60}
)

# 执行混合搜索
hybrid_results = self._client.hybrid_search(
    collection_name="chunks",
    reqs=[dense_request, bm25_request],
    ranker=rrf_ranker,
    limit=len(file_ids),
    output_fields=["*"],
    group_by_field="file_id",
    group_size=k,
    strict_group_size=True
)

问题分析

数据验证：可以确认这两个 file_id (348928174660440064, 348928171699261440) 在 chunks 集合中确实存在，且 file_id 字段有值。
过滤条件：我们在 AnnSearchRequest 中通过 expr 参数设置了过滤条件 file_id in ["348928174660440064", "348928171699261440"]，理论上应该只返回这两个文件的数据。
分组字段：group_by_field="file_id" 指定了按 file_id 字段分组，该字段在 schema 中已定义且有索引。
清空数据后问题消失：
- 这个问题之前出现过
- 清空 collection 数据后，问题就解决了，不再报错

预期行为

应该返回两个分组的结果，每个分组对应一个 file_id，每个分组包含最多 k 个 chunk。

实际行为

抛出异常：MilvusException: (code=65535, message=Group value is nil)

异常发生在执行 hybrid_search 时，Milvus 无法识别某些结果的 group value（即 file_id 字段值），即使通过其他查询方式可以确认这些数据确实存在且 file_id 有值。

注意：此问题在使用混合搜索（Dense + BM25）结合 RRF ranker 和分组功能时出现。单独使用向量搜索或标量查询时，file_id 字段可以正常使用。

Answered by yhmo

Dec 1, 2025

standard默认是英文分词。jieba是中文分词

使用run_analyzer()这个接口可以测试分词器的效果：

    analyzer_params = {"tokenizer": "standard","filter": ["lowercase"] }
    res = client.run_analyzer(
        texts=["采用官方教程提供的分词器"],
        analyzer_params=analyzer_params,
    )
    print("Standard tokenizer:", res)

    analyzer_params = {"tokenizer": "jieba"}
    res = client.run_analyzer(
        texts=["采用官方教程提供的分词器"],
        analyzer_params=analyzer_params,
    )
    print("Jieba tokenizer:", res)

输出结果：

Standard tokenizer: [['采用官方教程提供的分词器']]
Jieba tokenizer: [['采用', '官方', '教程', '提供', '的', '分词', '分词器']]

所以你用standard分词，它会把整句中文当成一个词，那肯定匹配不上。
只有jieba才能正确地把中文句子分词。

View full answer

yhmo · 2025-11-21T06:46:03Z

yhmo
Nov 21, 2025
Collaborator

这个报错在最新的代码里改成了"Group value is nil, this is due to empty results in search reduce phase"

milvus/internal/util/function/rerank/util.go

Line 520 in b0e490b

    
           log.Warn("Group value is nil, this is due to empty results in search reduce phase")

在2.6.5里，客户端不会接收到这个报错，而是返回一个空的结果集。这个报错的本意就是没搜索到任何的结果，结果是空的，所以没法做group by。
一种可能是348928174660440064, 348928171699261440这两条数据还未被处理完成就开始search，在hybrid_search()的参数里加consistency_level="Strong"看一下。
另外你试一下embedding字段使用FLAT 索引，看结果如何。

3 replies

StevenLLM Nov 24, 2025
Author

首先，我能确保我的数据一定是完整的，相同的索引，我用纯向量检索，是检索的出来东西的，也就是说dense都能检索出来东西，为啥dense+bm25混合检索反而没东西检索出来
[1]
name: 测试领域的未来发展趋势.txt
file_id: 348928171699261440
id: 348928171699261440_chunk_0
distance: 0.692036509513855

[2]
name: 测试流程的关键阶段.pdf
file_id: 348928174660440064
id: 348928174660440064_chunk_0
distance: 0.6421241164207458

检索代码如下
search_kwargs = {
"collection_name": collection_name,
"data": query_embedding,
"anns_field": "embedding", # 明确指定要搜索的向量字段
"output_fields": ["*"], # 返回所有字段（包括动态字段）
"search_params": search_params
}

        # 如果指定了file_ids，使用分组检索
        if file_ids:
            search_kwargs["group_by_field"] = "file_id"
            search_kwargs["filter"] = filter_expr
            search_kwargs["limit"] = len(file_ids)  # 返回的分组数量
            search_kwargs["group_size"] = k  # 每个分组返回k个结果
            search_kwargs["strict_group_size"] = True 
        else:
            search_kwargs["limit"] = k
        
        results = self._client.search(**search_kwargs)

StevenLLM Nov 24, 2025
Author

还有问题是，我有另一个数据库，也是相同的索引，相同的代码，只是数据不一样，是能通过我的hybrid search搜索出数据的，这也和我之前提到的一个问题，“我删除数据库，然后重新添加数据，hybrid search能搜索到东西，但是随着数据量的增加，就会报错这个错”类似，所以我认为是不是数据量一大就会报这个错，出错的数据库有400条数据，而我没报错的数据库只有100条数据。

以下是，我在另一个数据库用328374255358246912和10631760487167838643搜索出来的结果，证明我代码没问题

Chunk #1:
ID: 328374255358246912_chunk_2
Score: 0.0164
Content:

Chunk #2:
ID: 328374255358246912_chunk_9
Score: 0.0161
Content:

Chunk #3:
ID: 328374255358246912_chunk_30
Score: 0.0159
Content:

Chunk #4:
ID: 10631760487167838643_chunk_0
Score: 0.0156
Content:

StevenLLM Nov 24, 2025
Author

加入strong参数也没用，还是相同的错误

hybrid_results = self._client.hybrid_search(
collection_name=collection_name,
reqs=[dense_request, bm25_request],
ranker=rrf_ranker,
limit=len(file_ids), # 返回的分组数量（返回多少个文件）
output_fields=["*"],
group_by_field="file_id",
group_size=k, # 每个文件返回k个chunk
strict_group_size=True,
consistency_level="Strong"
)

attu显示file 348928171699261440和348928174660440064存在，并且完整

xiaofan-luan · 2025-11-22T02:16:26Z

xiaofan-luan
Nov 22, 2025
Maintainer

你的用法看起来有点confusing，既然只搜索两个file，为啥还需要group呢？
当k>2的场景，你最多就只能拿到两个file了。k> 60看起来没有太大意义

1 reply

StevenLLM Nov 24, 2025
Author

因为我每个file有好几个chunk，我group是想，每个file搜固定几个chunk，k 60我是看官方默认设置的60，我只搜索两个file只是来说明这个问题，而不是我实际数据量只搜索两个file

xiaofan-luan · 2025-11-22T02:17:51Z

xiaofan-luan
Nov 22, 2025
Maintainer

不知道您是否理解 group_by_field="file_id",
group_size=k,的含义，这意味着我们会返回topk个file，然后每个file返回groupsize个文件

既然这样，filter filed的意义是什么？

17 replies

yhmo Nov 27, 2025
Collaborator

你可以用query()把符合'file_id in ["348928174660440064", "348928171699261440"]'这个条件的数据拉出来看一下，目测一下它们的text字段跟你给的那个"[query]"是否有关联。关联性为零的话确实会是空的结果。
建表的时候预设在text字段上的analyzer也可能有关系，如果分词器不合适(比如中文内容用了英文分词器)，也有可能造成结果不匹配。

StevenLLM Nov 28, 2025
Author

1.肯定有关联，我的query就是“测试”，两个文件都包含“测试”这两个字，按理说一定搜的出来

2.采用官方教程提供的分词器'analyzer_params', {"tokenizer": "standard","filter": ["lowercase"] }，所以只有采用“chinese”才能匹配的到？

analyzer_params = milvus_config.get('analyzer_params', {
"tokenizer": "standard",
"filter": ["lowercase"]
})

            schema.add_field(
                field_name="content",
                datatype=DataType.VARCHAR,
                max_length=65535,
                analyzer_params=analyzer_params,
                enable_match=True,      
                enable_analyzer=True    
            )
            

            schema.add_field(
                field_name="bm25_vector",
                datatype=DataType.SPARSE_FLOAT_VECTOR
            )
            
            bm25_function = Function(
                name="bm25",
                function_type=FunctionType.BM25,
                input_field_names=["content"],
                output_field_names=["bm25_vector"]
            )
            
            schema.add_function(bm25_function)

xiaofan-luan Nov 28, 2025
Maintainer

中文的话，肯定要走中文分词的

yhmo Dec 1, 2025
Collaborator

standard默认是英文分词。jieba是中文分词

使用run_analyzer()这个接口可以测试分词器的效果：

    analyzer_params = {"tokenizer": "standard","filter": ["lowercase"] }
    res = client.run_analyzer(
        texts=["采用官方教程提供的分词器"],
        analyzer_params=analyzer_params,
    )
    print("Standard tokenizer:", res)

    analyzer_params = {"tokenizer": "jieba"}
    res = client.run_analyzer(
        texts=["采用官方教程提供的分词器"],
        analyzer_params=analyzer_params,
    )
    print("Jieba tokenizer:", res)

输出结果：

Standard tokenizer: [['采用官方教程提供的分词器']]
Jieba tokenizer: [['采用', '官方', '教程', '提供', '的', '分词', '分词器']]

所以你用standard分词，它会把整句中文当成一个词，那肯定匹配不上。
只有jieba才能正确地把中文句子分词。

Answer selected by StevenLLM

xiaofan-luan · 2025-11-26T19:41:28Z

xiaofan-luan
Nov 26, 2025
Maintainer

i think we will need verification on how groupby works with sparse index, and how it works can together with groupby.

So far it's not battle tested.

0 replies

# Milvus Hybrid Search 报错 "Group value is nil" #45755

Uh oh!

StevenLLM Nov 21, 2025

Milvus Hybrid Search 报错 "Group value is nil" 但数据确实存在 group value

问题描述

错误信息

环境版本

Milvus 表结构 (chunks collection)

问题分析

预期行为

实际行为

Replies: 4 comments · 21 replies

Uh oh!

Uh oh!

yhmo Nov 21, 2025 Collaborator

Uh oh!

StevenLLM Nov 24, 2025 Author

Uh oh!

StevenLLM Nov 24, 2025 Author

Uh oh!

StevenLLM Nov 24, 2025 Author

Uh oh!

xiaofan-luan Nov 22, 2025 Maintainer

Uh oh!

StevenLLM Nov 24, 2025 Author

Uh oh!

xiaofan-luan Nov 22, 2025 Maintainer

Uh oh!

yhmo Nov 27, 2025 Collaborator

Uh oh!

StevenLLM Nov 28, 2025 Author

Uh oh!

xiaofan-luan Nov 28, 2025 Maintainer

Uh oh!

yhmo Dec 1, 2025 Collaborator

Uh oh!

xiaofan-luan Nov 26, 2025 Maintainer

StevenLLM
Nov 21, 2025

Replies: 4 comments 21 replies

yhmo
Nov 21, 2025
Collaborator

StevenLLM Nov 24, 2025
Author

StevenLLM Nov 24, 2025
Author

StevenLLM Nov 24, 2025
Author

xiaofan-luan
Nov 22, 2025
Maintainer

StevenLLM Nov 24, 2025
Author

xiaofan-luan
Nov 22, 2025
Maintainer

yhmo Nov 27, 2025
Collaborator

StevenLLM Nov 28, 2025
Author

xiaofan-luan Nov 28, 2025
Maintainer

yhmo Dec 1, 2025
Collaborator

xiaofan-luan
Nov 26, 2025
Maintainer