Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestSearchTagValuesV2 #4578

Open
ndk opened this issue Jan 18, 2025 · 0 comments
Open

Flaky test TestSearchTagValuesV2 #4578

ndk opened this issue Jan 18, 2025 · 0 comments

Comments

@ndk
Copy link
Contributor

ndk commented Jan 18, 2025

TestSearchTagValuesV2/second_batch_with_incomplete_query_-_name sometimes fails, and it appears this may be tied to the timing of block compaction.
Example of a failing run: https://github.com/grafana/tempo/actions/runs/12831739931/job/35782895220

15:18:40 tempo: level=info ts=2025-01-17T15:18:40.666072638Z caller=handler.go:134 tenant=single-tenant method=GET traceID= url="/api/v2/search/tag/span.x/values?q=%7B+name%3D%22baz%22+%26%26+span.x+%3D+%7D" duration=860.231µs response_size=113 status=200
    api_test.go:594: 
        	Error Trace:	/home/runner/work/tempo/tempo/integration/e2e/api_test.go:594
        	            				/home/runner/work/tempo/tempo/integration/e2e/api_test.go:409
        	Error:      	Not equal: 
        	            	expected: []e2e.TagValue{e2e.TagValue{Type:"string", Value:"qux"}}
        	            	actual  : []e2e.TagValue{e2e.TagValue{Type:"string", Value:"bar"}, e2e.TagValue{Type:"string", Value:"qux"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,6 @@
        	            	-([]e2e.TagValue) (len=1) {
        	            	+([]e2e.TagValue) (len=2) {
        	            	+ (e2e.TagValue) {
        	            	+  Type: (string) (len=6) "string",
        	            	+  Value: (string) (len=3) "bar"
        	            	+ },

Observations
https://github.com/grafana/tempo/blob/main/modules/ingester/instance_search.go#L462

  • Successful runs have len(instance.completeBlocks)=0 before the test:
searchBlock("headBlock")
searchBlock("completingBlocks")
  • Failed runs have len(instance.completeBlocks)=1 before the test:
searchBlock("headBlock")
searchBlockWithCache("completeBlocks")

It seems that if block compaction triggers after the test traces are added but before querying tags, the test fails. When searchBlockWithCache runs, it appears to add entries from the block without filtering, which might be causing the unexpected extra result ("bar").

Example Logs

Successful run
// Testing traces are already ingested
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=0
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="completingBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
//
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1
Failed run
// Testing traces are already ingested
instance.CompleteBlock ->>>> append completeBlocks, before len(i.completeBlocks) = 0
instance.CompleteBlock ->>>> completeBlocks is updated, now len(i.completeBlocks) = 1
// The first test
instance.SearchTagValuesV2 ->>>> req.Query=""
instance.SearchTagValuesV2 ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch ->>>> query="{}", req.Query=""
instance.SearchTagValuesV2 - performSearch.IsEmptyQuery ->>>> query="{}", req.Query=""
LocalBlock.SearchTagValuesV2 ->>>> "x"
// The second test
instance.SearchTagValuesV2 ->>>> req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"
instance.SearchTagValuesV2 ->>>> len(i.completeBlocks)=1
instance.SearchTagValuesV2 - searchBlock ->>>> spanName="headBlock"
instance.SearchTagValuesV2 - searchBlockWithCache ->>>> spanName="completeBlocks"
instance.SearchTagValuesV2 - performSearch ->>>> query="{name=\"baz\"}", req.Query="{ name=\"baz\" && span.x = }"

Conclusion
Based on these observations, it looks like the test may fail when the data is queried from a block in the completeBlocks list - possibly because the block compaction step happens too soon and searchBlockWithCache does not filter entries correctly. I might be missing some implementation details, but wanted to report this in case it indicates a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant