Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: show how many rows/bytes/time RW lag behind the upstream during streaming data ingestion #19229

Open
8 tasks
lmatz opened this issue Nov 1, 2024 · 0 comments

Comments

@lmatz
Copy link
Contributor

lmatz commented Nov 1, 2024

Is your feature request related to a problem? Please describe.

Take two examples:

  1. For CDC, we can acquire the size of WAL RW that has not been consumed, or we can acquire the time difference between the time when the log was generated and the time when the logs were consumed: https://github.com/risingwavelabs/risingwave/blob/main/grafana/risingwave-dev-dashboard.dashboard.py#L1182
  2. For Kafka, we can get the high watermark and then subtract the last message ID from it to get the number of unconsumed messages: https://github.com/risingwavelabs/risingwave/blob/main/grafana/risingwave-dev-dashboard.dashboard.py#L946

Ideally, we would like to figure out a solution for each streaming connector:

  • Postgres CDC
  • MySQL CDC
  • MongoDB CDC
  • SQLServer CDC
  • Kinesis
  • File systems such as S3/GCS/Azure Blob
  • NATS JetStream
  • MQTT

Some sources may only be able to show the "lag" by one type of unit, e.g. in terms of the number of rows, bytes, or time difference.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-2.2 milestone Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant