Skip to content

Comments

refactor: Extract sort-merge join filter logic into separate module#19614

Merged
viirya merged 1 commit intoapache:mainfrom
viirya:refactor-smj-filter-module
Feb 21, 2026
Merged

refactor: Extract sort-merge join filter logic into separate module#19614
viirya merged 1 commit intoapache:mainfrom
viirya:refactor-smj-filter-module

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 2, 2026

Refactored the sort-merge join implementation to improve code organization by extracting all filter-related logic into a dedicated filter.rs module.

Changes:

  • Created new filter.rs module (~576 lines) containing:

    • Filter metadata tracking (FilterMetadata struct)
    • Deferred filtering decision logic (needs_deferred_filtering)
    • Filter mask correction for different join types (get_corrected_filter_mask)
    • Filter application with null-joined row handling (filter_record_batch_by_join_type)
    • Helper functions for filter column extraction and batch filtering
  • Updated stream.rs:

    • Removed ~450 lines of filter-specific code
    • Now delegates to filter module functions
    • Simplified main join logic to focus on stream processing
  • Updated tests.rs:

    • Updated imports to use new filter module
    • Changed test code to use FilterMetadata struct
    • All 47 sort-merge join tests passing

🤖 Generated with Claude Code

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jan 2, 2026
@viirya viirya force-pushed the refactor-smj-filter-module branch 2 times, most recently from 4d6f0dd to 348f56c Compare January 2, 2026 22:17
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya I missed this PR, decomposing that huge piece makes a lot of sense

@viirya
Copy link
Member Author

viirya commented Feb 20, 2026

Thank you @comphead

Refactored the sort-merge join implementation to improve code organization
by extracting all filter-related logic into a dedicated filter.rs module.

Changes:
- Created new filter.rs module (~576 lines) containing:
  - Filter metadata tracking (FilterMetadata struct)
  - Deferred filtering decision logic (needs_deferred_filtering)
  - Filter mask correction for different join types (get_corrected_filter_mask)
  - Filter application with null-joined row handling (filter_record_batch_by_join_type)
  - Helper functions for filter column extraction and batch filtering

- Updated stream.rs:
  - Removed ~450 lines of filter-specific code
  - Now delegates to filter module functions
  - Simplified main join logic to focus on stream processing

- Updated tests.rs:
  - Updated imports to use new filter module
  - Changed test code to use FilterMetadata struct
  - All 47 sort-merge join tests passing

The refactoring maintains all existing functionality with no behavior changes.
Null-joined batch creation for outer joins with different column counts is
handled correctly by:
- Properly extracting and replacing columns based on join type and batch organization
- Using RecordBatchOptions to bypass strict nullable field validation in outer joins

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@viirya viirya force-pushed the refactor-smj-filter-module branch from 348f56c to 16c36e4 Compare February 20, 2026 23:30
@viirya viirya added this pull request to the merge queue Feb 21, 2026
Merged via the queue into apache:main with commit 1736fd2 Feb 21, 2026
32 checks passed
@viirya viirya deleted the refactor-smj-filter-module branch February 21, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants