Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rythm: fix ingestion slack time range #4459

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

javiermolinar
Copy link
Contributor

What this PR does:
It fixes the ingestion slack by calculating it according to the partition's last commit.

Using a fixed time on WAL creation won't work as we want for several reasons. Every partition can have a different offset and the wall is created before reading Kafka. Also, we don't have a way to know this start time in case of wal replay.

The adjustment of the ingestion slack should be done outside the WAL, since we could have different strategies. Another possibility could be storing a function to calculate it but I have opted for this approach that extends the current interface without modifying it.

mapno and others added 16 commits November 5, 2024 14:28
* Add unit test for block-builder

* fmt

* Update tests

* cmon
* chore: remove gofakeit dependency (grafana#4274)

* Further reduce Labes() calls in the metrics registry (grafana#4283)

* Respect passed headers in read path requests (grafana#4287)

* Ingester: Validate completed blocks (grafana#4256)

* Add validate method to block

Signed-off-by: Joe Elliott <[email protected]>

* Add Validate usage in the ingester

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* add test and fix replay

Signed-off-by: Joe Elliott <[email protected]>

* increment metric

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Add `invalid_utf8` to reasons spans could be rejected (grafana#4293)

* Add `invalid_utf8` to reasons spans could be rejected

* Update changelog

* Update docs

* Ensure test covers invalid UTF-8 and not slack time

* add signals for duplicate rf1 data (grafana#4296)

Signed-off-by: Joe Elliott <[email protected]>

* Bump anchore/sbom-action from 0.17.5 to 0.17.7 (grafana#4307)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.17.5...v0.17.7)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Update readme with explore traces info (grafana#4263)

* docs: Update readme with explore traces info


Co-authored-by: Kim Nylander <[email protected]>

* chore: remove spanlogger (grafana#4312)

* chore: remove spanlogger

* Query-Frontend: Add middleware to drop headers (grafana#4298)

* header strip ware

Signed-off-by: Joe Elliott <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* remove header strip wear from metrics summary

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Increase length of time compactions have to fail (grafana#4315)

* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <[email protected]>

* gen

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* docs: mark serverless as deprecated (grafana#4017)

* docs: mark serverless as deprecated

* Changelog + readme

* docs: Remove duplicated examples (grafana#4295)

This removes duplicates examples from the Configure TraceQL
metrics page.

Signed-off-by: Alex Bikfalvi <[email protected]>

* tempo-cli: support dropping multiple traces in a single operation (grafana#4266)

* tempo-cli: support dropping multiple traces in a single operation

* update final log message

---------

Co-authored-by: Suraj Nath <[email protected]>

* [DOC] Add clarification for metrics summary and traceQL metrics (grafana#4316)

* Add clarification for metrics summary and traceQL metrics

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/tempo/api_docs/metrics-summary.md

---------

Co-authored-by: Jennifer Villa <[email protected]>

* TraceQL metrics time range fixes (grafana#4325)

* Disconnect job time range filtering from step, so that results in split backend/recent range is accurate

* changelog

* Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (grafana#4331)

* Add doc about configuring TLS with Helm (grafana#4328)

* Add doc about configuring TLS with Helm

* Add memberlist and readinessProbe to example

* Include server config for listening on TLS

* Add note about scraping

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Markus Toivonen <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Add memcached config for TLS

---------

Co-authored-by: Markus Toivonen <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* [DOC] Add TLS info to Helm chart doc (grafana#4334)

---------

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Alex Bikfalvi <[email protected]>
Co-authored-by: Javier Molina Reyes <[email protected]>
Co-authored-by: Zach Leslie <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>
Co-authored-by: Suraj Nath <[email protected]>
Co-authored-by: Alex Bikfalvi <[email protected]>
Co-authored-by: Andrey Karpov <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>
Co-authored-by: Martin Disibio <[email protected]>
Co-authored-by: Markus Toivonen <[email protected]>
* Validate distributor config. Finish encoder/decoder tests

* Repair tests

* Make SingleBinary work out of the box by defaulting to partition 0

* Fix first time startup where blockbuilder fails before ingester can create topic

* Fix initial startup cycle time and delay
* Add more tests to the block-builder

* stuff

* Add comments
* Metrics generator read from kafka first pass

* review feedback
* chore: remove gofakeit dependency (grafana#4274)

* Further reduce Labes() calls in the metrics registry (grafana#4283)

* Respect passed headers in read path requests (grafana#4287)

* Ingester: Validate completed blocks (grafana#4256)

* Add validate method to block

Signed-off-by: Joe Elliott <[email protected]>

* Add Validate usage in the ingester

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* add test and fix replay

Signed-off-by: Joe Elliott <[email protected]>

* increment metric

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Add `invalid_utf8` to reasons spans could be rejected (grafana#4293)

* Add `invalid_utf8` to reasons spans could be rejected

* Update changelog

* Update docs

* Ensure test covers invalid UTF-8 and not slack time

* add signals for duplicate rf1 data (grafana#4296)

Signed-off-by: Joe Elliott <[email protected]>

* Bump anchore/sbom-action from 0.17.5 to 0.17.7 (grafana#4307)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.17.5...v0.17.7)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Update readme with explore traces info (grafana#4263)

* docs: Update readme with explore traces info


Co-authored-by: Kim Nylander <[email protected]>

* chore: remove spanlogger (grafana#4312)

* chore: remove spanlogger

* Query-Frontend: Add middleware to drop headers (grafana#4298)

* header strip ware

Signed-off-by: Joe Elliott <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* remove header strip wear from metrics summary

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Increase length of time compactions have to fail (grafana#4315)

* increase length of time compactions have to fail

Signed-off-by: Joe Elliott <[email protected]>

* gen

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* docs: mark serverless as deprecated (grafana#4017)

* docs: mark serverless as deprecated

* Changelog + readme

* docs: Remove duplicated examples (grafana#4295)

This removes duplicates examples from the Configure TraceQL
metrics page.

Signed-off-by: Alex Bikfalvi <[email protected]>

* tempo-cli: support dropping multiple traces in a single operation (grafana#4266)

* tempo-cli: support dropping multiple traces in a single operation

* update final log message

---------

Co-authored-by: Suraj Nath <[email protected]>

* [DOC] Add clarification for metrics summary and traceQL metrics (grafana#4316)

* Add clarification for metrics summary and traceQL metrics

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/tempo/api_docs/metrics-summary.md

---------

Co-authored-by: Jennifer Villa <[email protected]>

* TraceQL metrics time range fixes (grafana#4325)

* Disconnect job time range filtering from step, so that results in split backend/recent range is accurate

* changelog

* Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (grafana#4331)

* Add doc about configuring TLS with Helm (grafana#4328)

* Add doc about configuring TLS with Helm

* Add memberlist and readinessProbe to example

* Include server config for listening on TLS

* Add note about scraping

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Markus Toivonen <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/configuration/network/tls.md

Co-authored-by: Kim Nylander <[email protected]>

* Add memcached config for TLS

---------

Co-authored-by: Markus Toivonen <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* [DOC] Add TLS info to Helm chart doc (grafana#4334)

* fix deprecation warning by switching to DoBatchWithOptions (grafana#4343)

Signed-off-by: Daniel Strobusch <[email protected]>

* bump dskit to v0.0.0-20241115082728-f2a7eb3aa0e9 to leverage benefits for context causes for DoBatch calls. (grafana#4341)

See grafana/dskit#576

Signed-off-by: Daniel Strobusch <[email protected]>

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 (grafana#4282)

* Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80

Bumps [github.com/minio/minio-go/v7](https://github.com/minio/minio-go) from 7.0.70 to 7.0.80.
- [Release notes](https://github.com/minio/minio-go/releases)
- [Commits](minio/minio-go@v7.0.70...v7.0.80)

---
updated-dependencies:
- dependency-name: github.com/minio/minio-go/v7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <[email protected]>

* update default config values to better align with production workloads (grafana#4340)

* update default config values to better align with production workloads

* Update CHANGELOG.md and config docs

* Ingester memory improvements by adjusting prealloc (grafana#4344)

* remove trace ids

Signed-off-by: Joe Elliott <[email protected]>

* linear buckets

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* tuney tune

Signed-off-by: Joe Elliott <[email protected]>

* metric misses and increase pool size

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 (grafana#4302)

* Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0

Bumps [github.com/Azure/azure-sdk-for-go/sdk/azcore](https://github.com/Azure/azure-sdk-for-go) from 1.13.0 to 1.16.0.
- [Release notes](https://github.com/Azure/azure-sdk-for-go/releases)
- [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md)
- [Commits](Azure/azure-sdk-for-go@sdk/azcore/v1.13.0...sdk/azcore/v1.16.0)

---
updated-dependencies:
- dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azcore
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update serverless vendor

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Zach Leslie <[email protected]>

* Use Prometheus fast regexp (grafana#4329)

* basic integration

Signed-off-by: Joe Elliott <[email protected]>

* patch tests for new meaning

Signed-off-by: Joe Elliott <[email protected]>

* patch up more tests

Signed-off-by: Joe Elliott <[email protected]>

* add basic tests

Signed-off-by: Joe Elliott <[email protected]>

* changelog + docs

Signed-off-by: Joe Elliott <[email protected]>

* remove benches

Signed-off-by: Joe Elliott <[email protected]>

* Cleaned up + tests

Signed-off-by: Joe Elliott <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

* Update docs/sources/tempo/traceql/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* comment

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>

* Fix broken link in service-graphs docs (grafana#4351)

* Fix minor typo in TraceQL docs (grafana#4356)

* Bump default memcached version (grafana#4363)

* Exemplar fixes (grafana#4366)

* Fix exemplars based on duration to convert to seconds, fix various other issues

* changelog

* fix: initialize histogram buckets to 0 to avoid them being downsampled (grafana#4368)

* initialized histogram buckets to 0 to avoid them being downsampled

* Ingester/Generator Live trace cleanup (grafana#4365)

* moved trace sizes somewhere shareable

Signed-off-by: Joe Elliott <[email protected]>

* use tracesizes in ingester

Signed-off-by: Joe Elliott <[email protected]>

* make tests work

Signed-off-by: Joe Elliott <[email protected]>

* trace bytes in generator

Signed-off-by: Joe Elliott <[email protected]>

* remove traceCount

Signed-off-by: Joe Elliott <[email protected]>

* live trace shenanigans

Signed-off-by: Joe Elliott <[email protected]>

* changelog

Signed-off-by: Joe Elliott <[email protected]>

* Update modules/generator/processor/localblocks/livetraces.go

Co-authored-by: Mario <[email protected]>

* Update modules/ingester/instance.go

Co-authored-by: Mario <[email protected]>

* Test cleanup. Add sz test, restore commented out and fix e2e

Signed-off-by: Joe Elliott <[email protected]>

* remove todo comment

Signed-off-by: Joe Elliott <[email protected]>

---------

Signed-off-by: Joe Elliott <[email protected]>
Co-authored-by: Mario <[email protected]>

* Bump anchore/sbom-action from 0.17.7 to 0.17.8 (grafana#4371)

Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.7 to 0.17.8.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](anchore/sbom-action@v0.17.7...v0.17.8)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update for IDs change

* Only run blockbuilder if ingest enabled

---------

Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Alex Bikfalvi <[email protected]>
Signed-off-by: Daniel Strobusch <[email protected]>
Co-authored-by: Javier Molina Reyes <[email protected]>
Co-authored-by: Zach Leslie <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan Perry <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>
Co-authored-by: Suraj Nath <[email protected]>
Co-authored-by: Alex Bikfalvi <[email protected]>
Co-authored-by: Andrey Karpov <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>
Co-authored-by: Martin Disibio <[email protected]>
Co-authored-by: Markus Toivonen <[email protected]>
Co-authored-by: Daniel Strobusch <[email protected]>
Co-authored-by: Carles Garcia <[email protected]>
* Use mapping for assigning partitions

* Use mapping for assigning partitions in the generator too

* Add support for SASL auth to kafka clients
* Extract block-builder into its own module

* Update /operations and examples

* No ephemeral storage

* No rolling strategy either

* fmt and compile

* Address review comment
…a#4410)

* Correctly pass start/end times

* Different code, same result
* Multiple fixes to cycle consumption

* fmt

* happy now?

* ups
…ue data for reads (grafana#4411)

* wip: separate non-flushing local blocks processor to store new queue data for reads

* Make real config for non-flushing local blocks processor, optional, validate wal config and use defaults if needed

* Fix defaulting of second WAL config
modules/blockbuilder/partition_writer.go Outdated Show resolved Hide resolved
modules/blockbuilder/tenant_store.go Outdated Show resolved Hide resolved
@joe-elliott
Copy link
Member

I have not dug into the details of this PR but to share some history about ingestion slack. When we first rolled Tempo out internally it would mark a block's start and time using the min start time and max end time of all spans in the block. We quickly found that every block had a start time of 0 and an end time 100 years in the future due to the data we were consuming. So I created the ingestion slack to prevent this.

Ingestion slack is gross to calculate and I have since just wished I had added 5 minutes to the beginning and end of every block instead of trying to watch all spans and only doing it conditionally.

You all are welcome to tackle this problem however you want, and I'd be glad to discuss it sync if you're like to, but just wanted to give some history.

@javiermolinar javiermolinar requested a review from mapno December 19, 2024 14:01
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new behaviour, but I'm unsure of changing how Append and AppendTrace work like that. Specially only changing it for one parquet version. There is no warning to the callers of the functions.

tempodb/encoding/vparquet4/wal_block.go Show resolved Hide resolved
modules/blockbuilder/blockbuilder.go Outdated Show resolved Hide resolved
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current changes correctly address the problem, but I still find strange that Append and AppendTrace have different behaviors. We should either change the method's signature or be very clear with comments or similar.

@mapno
Copy link
Member

mapno commented Jan 13, 2025

BTW, this PR should now be pointed at main. main-rhythm is not active anymore

@javiermolinar javiermolinar changed the base branch from main-rhythm to main January 22, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants