Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Hybrid query returned error "e_o_f_exception read past EOF" #964

Open
martin-gaievski opened this issue Oct 23, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@martin-gaievski
Copy link
Member

What is the bug?

When running hybrid query with a certain complex structure then response contains failures for some shards:

{
    "took": 304,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 1,
        "skipped": 0,
        "failed": 1,
        "failures": [
            {
                "shard": 1,
                "index": "myindex",
                "node": "7LZNpHCCTUmCw5XDObDuHw",
                "reason": {
                    "type": "e_o_f_exception",
                    "reason": "read past EOF (pos=2500): MemorySegmentIndexInput(path=\"/home/ec2-user/opensearch/data/nodes/0/indices/gIcxj1vqRwGyyCGJ3njGOA/1/index/_1w.nvd\") [slice=randomaccess]"
                }
            }
        ]
    },

it may be the issue similar to one reposted for 2.13: #621

How can one reproduce the bug?

for index with text based documents run a hybrid query with a following structure:

          "queries": [
                {
                    "bool": {
                          "boost": 1,
                          "should": {
                                "dis_max": {
                                      "queries": [
                                          "bool": {
                                                    "boost": 1.1,
                                                    "filter": [],
                                                    "minimum_should_match": "1",
                                                    "should": {
                                                        "dis_max": {
                                                            "queries": [
                                                                {
                                                                    "multi_match": { }
                                                                },
                                                                {
                                                                    "multi_match": { }
                                                                },
                                                                {
                                                                    "match": { }
                                                                }
                                                            ],
                                                            "tie_breaker": 0
                                                        }
                                                    }
                                                }

below is example of response :

{
    "took": 304,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 1,
        "skipped": 0,
        "failed": 1,
        "failures": [
            {
                "shard": 1,
                "index": "myindex",
                "node": "7LZNpHCCTUmCw5XDObDuHw",
                "reason": {
                    "type": "e_o_f_exception",
                    "reason": "read past EOF (pos=2500): MemorySegmentIndexInput(path=\"/home/ec2-user/opensearch/data/nodes/0/indices/gIcxj1vqRwGyyCGJ3njGOA/1/index/_1w.nvd\") [slice=randomaccess]"
                }
            }
        ]
    },
    "hits": {
        "total": {
            "value": 6772,
            "relation": "eq"
        },
        "max_score": 0.19298081,
        "hits": [
            {
                "_index": "myindex",
                "_id": "5daa49b84d8bff9cc171c677",
                "_score": 0.19298081,
                "_source": {
                    "name": "Slaughter Of The Vampires"
                },
                "matched_queries": [
                    "channels/channel_match/match_in_categories",

What is your host/environment?

We tested in OS 2.15 and latest main. Same query works correctly in 2.13

Do you have any additional context?

Looks like issue depends on number of documents that are present in the index. Same query works fine with ~20K documents but start failing with 40K documents. My guess it's related to some optimization techniques in one of sub-queries.

@martin-gaievski martin-gaievski added bug Something isn't working untriaged labels Oct 23, 2024
@martin-gaievski
Copy link
Member Author

tried one more set of steps that involve custom analyzer for korean language `nori' https://mvnrepository.com/artifact/org.opensearch.plugin/analysis-nori and much simpler query:

GET /bedrock-rag-chunk-500/_search
{
    "_source": {
      "excludes": "body_chunk_embedding"
    },
    "search_pipeline": {
      "description": "Post processor for hybrid search",
      "phase_results_processors": [
        {
          "normalization-processor": {
            "normalization": {
              "technique": "min_max"
            },
            "combination": {
              "technique": "arithmetic_mean",
              "parameters": {
                "weights": [
                  0.3,
                  0.7
                ]
              }
            }
          }
        }
      ]
    },
    "query": {
        "hybrid": {
            "queries": [
                {
                    "multi_match": {
                        "query": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?",
                        "fields": [
                            "body_chunk_default",
                            "summary",
                            "title"
                        ]
                    }
                },
                {
                  "neural": {
                    "body_chunk_embedding": {
                      "query_text": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?",
                      "model_id": "ZSPJiJIBFB9Oq1h87Iid",
                      "k": 5
                    }
                }
              }
            ]
        }
    }
}

got same type of response, and found following stacktrace in the log

[2024-10-23T23:55:28,952][WARN ][r.suppressed             ] [a2999586b1baa510e1c3a4e65c95804b] path: __PATH__ params: {pretty=true, index=bedrock-rag-chunk-500}
Failed to execute phase [query], all shards failed; shardFailures {[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][0]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][1]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][2]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][3]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][4]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:780)
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:397)
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:820)
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:558)
	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:318)
	at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104)
	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75)
	at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:764)
	at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1729)
	at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:436)
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1515)
	at org.opensearch.transport.NativeMessageHandler.lambda$handleException$5(NativeMessageHandler.java:454)
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:412)
	at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:452)
	at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:444)
	at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:172)
	at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:126)
	at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:121)
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:113)
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:800)
	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:157)
	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:94)
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143)
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119)
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at __PATH__(Thread.java:1583)
Caused by: OpenSearchException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]];
	at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710)
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395)
	... 51 more
Caused by: java.io.EOFException: read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]
	at org.apache.lucene.store.MemorySegmentIndexInput.handlePositionalIOOBE(MemorySegmentIndexInput.java:100)
	at org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:543)
	at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3.longValue(Lucene90NormsProducer.java:389)
	at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47)
	at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60)
	at org.apache.lucene.search.TermScorer.score(TermScorer.java:86)
	at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41)
	at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178)
	at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65)
	at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178)
	at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:193)
	at org.opensearch.neuralsearch.search.collector.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:100)
	at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:226)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:296)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236)
	at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
	at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
	at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:327)
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:283)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
	at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:361)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:468)
	at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher$DefaultQueryPhaseSearcherWithEmptyQueryCollectorContext.searchWithCollector(HybridQueryPhaseSearcher.java:199)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:438)
	at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:65)
	at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:284)
	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:157)
	at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:589)
	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:653)
	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:622)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:950)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.lang.Thread.run(Thread.java:1583)

few things to notice:

  • if any of sub-query (bm25 or neural) used alone or a part of the hybrid query then it works and there is no exception
  • index has 235 documents, which is relative small number

@vibrantvarun
Copy link
Member

@martin-gaievski Can you share the index configuration you used for reproducing the bug?

@martin-gaievski
Copy link
Member Author

@vibrantvarun yes, I'm using following config for index and ingest pipeline:

PUT /bedrock-rag-chunk-500
{
        "settings": {
            "default_pipeline": "bedrock-v2-ingest-pipeline",
            "index": {
                "knn": true
            },
            "analysis": {
                "tokenizer": {
                    "nori_sample_dict": {
                        "type": "nori_tokenizer",
                        "decompound_mode": "mixed",
                        "user_dictionary_rules": [
                          "c++", 
                          "java", 
                          "python"
                          ]
                    }
                },
                "filter": {
                    "autocomplete_filter": {
                        "type": "edge_ngram",
                        "min_gram": 1,
                        "max_gram": 20
                    },
                    "pos_f": {
                      "type": "nori_part_of_speech",
                      "stoptags": [
                        "SP"
                        ,"SF"
                        ,"VCP"
                        ,"NP"
                      ]
                    }
                },
                "analyzer": {
                    "autocomplete": {
                        "type": "custom",
                        "tokenizer": "nori_sample_dict",
                        "filter": [
                            "lowercase",
                            "stop",
                            "pos_f",
                            "autocomplete_filter"
                        ],
                        "char_filter": [
                          "html_strip"
                        ]
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "title" :{
                    "type": "text",
                    "analyzer": "autocomplete"
                },
                "summary" :{
                    "type": "text",
                    "analyzer": "autocomplete"
                },
                "auditor" :{
                    "type": "keyword",
                    "null_value": "deactivate"
                },
                "body_chunk_default": {
                    "type": "text",
                    "analyzer": "autocomplete"
                    
                },
                "body_chunk_embedding":{
                    "type": "knn_vector",
                    "dimension": 1536,
                    "method": {
                        "engine": "faiss",
                        "space_type": "l2",
                        "name": "hnsw"
                    }
                }
            }
        }
}

pipeline

PUT /_ingest/pipeline/bedrock-v2-ingest-pipeline
{
  "description": "Bedrock Titan v2 Embedding pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
          "body_chunk_default": "body_chunk_embedding"
        }
      }
    }
  ]
}

I'm using bedrock text embeddings model with 1536 dimensions, but I suspect that's not a critical piece, you should see the issue with any other model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants