-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformance test queries with negative offset fail due to latency #94
Comments
We have faced the same issue when running the test for Sysdig. This can be observed for queries with negative offsets of -5 and -10. As you have said, we are querying into the future for 3m for the first query (with offset -5), and 8m for the second query (with offset -10). I also believe that this test should be focused on evaluating query compliance and should be decoupled from the ingestion speed in this case, and generally from the ingestion process as a whole. I see three approaches to mitigating this:
I believe the first one is the least intrusive approach (here is the PR for it), but I would like to get opinions on other proposals as well. |
Thanks! Yeah, I think your PR is the right approach as a short-term workaround. Long-term it would make sense to rethink how we create, ingest, and query the test data set in general. We'll still want to have some amount of randomness in the data set to avoid overfitting / missing weird corner cases (or accommodate for folks like Sysdig who have TSDB sample alignment limitations), but it would be good to get away from the requirement to actually let ingestion run in real-time for a couple of hours before being able to run tests. Prometheus has its own remote write receiver handler since a while ago by now, so that could be used to push all data in big batches. |
Hi
PromQL conformance tests with negative offset fail because of latency issues.
Latencies might be different in storage for native prom and vendor's storage (Due to persistence, Georedundant, etc). Negative offset queries try to fetch data for the future timestamp and the availability of data makes a difference in results of the queries. By this, here the compliance tool is not testing the correctness of the queries but testing the latency of the storages.
Suppose a query with negative offset -5m made at 10:32:00:000.
According to compliance tool, timestamps for queries are:
end_time: time.Now() - 2 min => 10:32:00:000 - 2 mins => 10:30:00:00
start_time: end_time - 10m => 10:30:00:00- 10 mins => 10:20:00:00
Query needs values till 10:35:00 (Because of -5m from end time)
At 10:32:00:000, Due to latency, value may not be available for timestamp 10:31:55:000 in vendor storage whereas it may be available for native prom (This can happen vice-versa as well).
Here the value returned by native prom will be 101 whereas vendor implementation returns 100 (Previous available value).
I think we can overcome this by providing "query_time_parameters": end_time value as a time stamp that is less than 10 mins of current timestamp.
But just like to know other thoughts on this issue and how to overcome it?
The text was updated successfully, but these errors were encountered: