Skip to content

add incremental compaction support#18996

Open
cecemei wants to merge 30 commits intoapache:masterfrom
cecemei:compact2
Open

add incremental compaction support#18996
cecemei wants to merge 30 commits intoapache:masterfrom
cecemei:compact2

Conversation

@cecemei
Copy link
Contributor

@cecemei cecemei commented Feb 8, 2026

Description

This PR builds on #18968 to add incremental compaction support, allowing compaction to operate on only uncompacted segments within an interval while upgrading already-compacted segments to maintain consistency.

Key Changes

  1. Incremental Compaction Mode

Extended CompactionMode enum with INCREMENTAL_COMPACTION:

  • FULL_COMPACTION: Compacts all segments (existing behavior)
  • INCREMENTAL_COMPACTION: Compacts only uncompacted segments, upgrades compacted segments
  • NOT_APPLICABLE: Skip compaction
  1. SegmentUpgradeAction

New task action that upgrades segment metadata without rewriting data:

  • Updates partition numbers and shard specs of already-compacted segments
  • Ensures compatibility between old and newly compacted segments
  1. ShardSpec Compatibility

Extended ShardSpec interface with mutation methods to support incremental compaction:

  • withPartitionNum(int): Update partition number
  • withCorePartitions(int): Update core partition count
  • isNumChunkSupported(): Check numbered partition chunk support
  1. CompactionIntervalSpec Enhancements

Added optional uncompactedSegments field:

  • When null: Compact all segments in interval (full compaction)
  • When non-null: Compact only specified segments (incremental compaction)
  1. Policy-Level Control

Added incrementalCompactionUncompactedRatioThreshold to MostFragmentedIntervalFirstPolicy:

  • When uncompactedBytes / compactedBytes < threshold, use incremental compaction
  • When >= threshold, use full compaction
  • Default: 0.0 (always full compaction, backward compatible)
  • Example: 0.5 uses incremental when uncompacted data < 50% of compacted data
  1. Builder Pattern for UserCompactionTaskQueryTuningConfig

Replaced verbose 19-parameter constructors with builder pattern for better readability.

Release Note

Added incremental compaction mode that compacts only newly ingested segments while preserving already-compacted data. Configure using incrementalCompactionUncompactedRatioThreshold in MostFragmentedIntervalFirstPolicy to control when incremental compaction is used based on the ratio of uncompacted to compacted data. Only supported by compaction supervisors on Overlord with MSQ engine.

Key changed/added classes

  • CompactionMode - Added INCREMENTAL_COMPACTION
  • SegmentUpgradeAction - New task action for metadata upgrades
  • ShardSpec - Added mutation methods
  • CompactionIntervalSpec - Added uncompactedSegments field
  • MostFragmentedIntervalFirstPolicy - Added threshold configuration
  • UserCompactionTaskQueryTuningConfig - Added builder pattern

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Feb 8, 2026
public void onCompactionStatusComputed(
public void onSkippedCandidate(
CompactionCandidate candidateSegments,
DataSourceCompactionConfig config

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'config' is never used.
Comment on lines +666 to +676
final DataSegment segment = new DataSegment(
"foo",
Intervals.of("2023-01-0" + i + "/2023-01-0" + (i + 1)),
"2023-01-0" + i,
ImmutableMap.of("path", "a-" + i),
ImmutableList.of("dim1"),
ImmutableList.of("m1"),
new DimensionRangeShardSpec(List.of("dim1"), null, null, i - 1, 8),
9,
100
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DataSegment.DataSegment
should be avoided because it has been deprecated.
@cecemei cecemei changed the title Incremental compaction add incremental compaction support Feb 9, 2026
@cecemei cecemei marked this pull request as ready for review February 9, 2026 03:20
@gianm gianm mentioned this pull request Feb 12, 2026
9 tasks
@gianm
Copy link
Contributor

gianm commented Feb 12, 2026

Wanted to point out that #19016 is aiming at similar goals but is taking a different approach. The big one is that #19016 only works with non-MSQ compaction and this one only works with MSQ compaction. Longer comment on the other PR here: #19016 (comment)

@FrankChen021
Copy link
Member

@gianm is there a plan that the msq based compaction replaces the old one? otherwise we have two different mechanisms, which not only confuses users but also increases the effort to maintain the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants