Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collect checkpoints in the SQS file source #5278

Open
4 tasks
rdettai opened this issue Jul 30, 2024 · 0 comments
Open
4 tasks

Garbage collect checkpoints in the SQS file source #5278

rdettai opened this issue Jul 30, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@rdettai
Copy link
Contributor

rdettai commented Jul 30, 2024

Is your feature request related to a problem? Please describe.
Follow up to #5148 where the cleanup of the checkpoints in the shard API was left aside.

Describe the solution you'd like
Add an extra parameter to the FileSourceSqs config object to configure how long and how many checkpoints should be kept:

  • deduplication_window_duration_sec (default 3600)
  • deduplication_window_max_messages (default 100k)

Handling the shard lifecycle:

  • In the shard table, an a new column
    • update_timestamp: sqlx::types::time::PrimitiveDateTime
  • During the split commit transaction, update this timestamp
  • Update the shard API to make the cleanup possible:
    • update the DeleteShardsRequest to support pruning based on the timestamp / number of messges
message DeleteShardsRequest {
  quickwit.common.IndexUid index_uid = 1;
  string source_id = 2;
  repeated quickwit.ingest.ShardId shard_ids = 3;
  // If false, only shards at EOF positions will be deleted.
  bool force = 4;
  + // The maximum age of shards to keep.
  + optional uint32 max_age = 5;
  + // The maximum number of shards to keep. Older shards will be deleted first.
  + optional uint32 max_count= 6;
}
  • The sources call the DeleteShards API but the control plane is in charge of debouncing identical requests

Describe alternatives you've considered
For the shard API update, we could

  • list all the shards and filter on the client side (source)
  • instead change ListShardsRequest to support filtering the older shards.

Additional context
Add any other context or information about the feature request here.

@rdettai rdettai added the enhancement New feature or request label Jul 30, 2024
@fulmicoton fulmicoton self-assigned this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In progress
Development

No branches or pull requests

2 participants