perf: Pre-resolve type dispatch in sort-merge join comparators by andygrove · Pull Request #20452 · apache/datafusion

andygrove · 2026-02-20T15:58:36Z

Summary

Replace per-row runtime DataType matching in is_join_arrays_equal() and compare_join_arrays() with a JoinComparator struct that resolves typed comparison function pointers once during SortMergeJoinStream construction
Eliminates the overhead of matching on 20+ DataType variants for every row comparison in the merge loop
The JoinComparator provides two methods:
- compare() — for merge-loop ordering decisions (streamed vs buffered advance)
- is_equal() — for buffered batch key-group expansion

Benchmark Results

Best of 3 iterations across 20 SMJ benchmark queries (cargo run --release -p datafusion-benchmarks --bin dfbench -- smj):

Query	Description	Baseline (ms)	Optimized (ms)	Change
Q1	INNER 100K×100K, 1:1	3.16	2.22	-29.9%
Q2	INNER 100K×1M, 1:10	10.41	10.02	-3.8%
Q3	INNER 1M×1M, 1:100	53.06	55.55	+4.7%
Q4	INNER 100K×1M, 1:10, 1% filter	3.33	3.20	-4.2%
Q5	INNER 1M×1M, 1:100, 10% filter	11.41	11.85	+3.9%
Q6	LEFT 100K×1M, 1:10	10.10	9.97	-1.3%
Q7	LEFT 100K×1M, 1:10, 50% filter	11.75	11.91	+1.4%
Q8	FULL 100K×100K, 1:10	2.53	2.52	-0.4%
Q9	FULL 100K×1M, 1:10, 10% filter	11.32	11.02	-2.7%
Q10	LEFT SEMI 100K×1M, 1:10	4.42	4.35	-1.6%
Q11	LEFT SEMI 100K×1M, 1:10, 1% filter	3.97	3.99	+0.5%
Q12	LEFT SEMI 100K×1M, 1:10, 50% filter	59.28	59.15	-0.2%
Q13	LEFT SEMI 100K×1M, 1:10, 90% filter	4.67	4.50	-3.6%
Q14	LEFT ANTI 100K×1M, 1:10	4.40	4.34	-1.6%
Q15	LEFT ANTI 100K×1M, 1:10, partial	4.42	4.36	-1.4%
Q16	LEFT ANTI 100K×100K, 1:1, stress	2.14	2.21	+3.3%
Q17	INNER 100K×5M, 1:50, 5% filter	8.86	7.75	-12.5%
Q18	LEFT SEMI 100K×5M, 1:50, 2% filter	8.07	8.03	-0.5%
Q19	LEFT ANTI 100K×5M, 1:50, partial	19.52	18.88	-3.3%
Q20	INNER 1M×10M, 1:100 + GROUP BY	533.16	559.54	+4.9%

The biggest wins are on comparison-dominated workloads (Q1: 1:1 join, Q17: filtered 1:50 join). High-cardinality joins (Q3, Q5, Q20) where output construction dominates show no significant change.

Test plan

All 48 sort_merge_join unit tests pass
cargo fmt clean
cargo clippy clean (zero warnings)
Benchmark comparison shows no regressions beyond noise

🤖 Generated with Claude Code

mbutrovich · 2026-02-20T17:21:34Z

Thanks for working on SMJ performance @andygrove! It is somewhat overlooked compared to hash join :)

One suggestion: could JoinComparator be built on top of arrow_ord::ord::make_comparator rather than reimplementing type dispatch?

// already in DataFusion's dependency tree (arrow-ord)                                                                                                                                                                                                                                
pub type DynComparator = Box<dyn Fn(usize, usize) -> Ordering + Send + Sync>;
make_comparator(left: &dyn Array, right: &dyn Array, opts: SortOptions) -> Result<DynComparator>

It already does everything this PR implements, and more:

Resolves type dispatch once at construction time
Bakes SortOptions into the closure, so the hot path is just cmp(i, j)
If neither array has nulls, returns a closure with no null checks at all — the current code and proposed JoinComparator both check nulls on every row
Covers more types than the handwritten match: List, Struct, Map, RunEndEncoded, Dictionary, ByteView with optimized prefix comparison

For compare() it's a direct replacement. For is_equal() (null-equals-null semantics), you'd wrap it:

let cmp = make_comparator(left_arr, right_arr, opts)?;
let is_eq = move |i: usize, j: usize| -> bool {
    match (left_arr.is_null(i), right_arr.is_null(j)) {
        (true, true) => true,
        (true, false) | (false, true) => false,
        _ => cmp(i, j).is_eq(),
    }
};

Side note: compare_join_arrays in joins/utils.rs has the same handwritten match and is also used inpiecewise_merge_join. We could clean that up at the same time, or defer to a later PR.

Replace per-row runtime DataType matching in is_join_arrays_equal() and compare_join_arrays() with a JoinComparator struct that resolves typed comparison function pointers once during SortMergeJoinStream construction. This eliminates the overhead of matching on 20+ DataType variants for every row comparison in the merge loop. The JoinComparator provides: - compare(): for merge-loop ordering decisions - is_equal(): for buffered batch key-group expansion Benchmark results (best of 3 iterations, 20 queries): - Q1 (INNER 100Kx100K 1:1): -29.9% - Q17 (INNER 100Kx5M 1:50 filtered): -12.5% - Most other queries: 1-4% improvement - No significant regressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hand-rolled JoinComparator type dispatch (~200 lines of macros and per-type match arms) with Arrow's built-in make_comparator. This handles more types (List, Struct, Map, RunEndEncoded, etc.) and optimizes away null checks when columns have no nulls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ComparatorCache that tracks array identity via Arc data pointers. Comparators are only rebuilt when the underlying join key arrays change (i.e., when a new batch arrives), not on every compare_streamed_buffered call. This eliminates per-row comparator construction overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove · 2026-02-20T20:12:32Z

results are pretty mixed now. I will close this for now and re-open if I manage to get consistent improvements. Thanks for the review @mbutrovich!

github-actions bot added the physical-plan Changes to the physical-plan crate label Feb 20, 2026

andygrove force-pushed the perf/smj-preresolved-comparator branch 2 times, most recently from 3a51775 to bf9d82c Compare February 20, 2026 16:58

github-actions bot added the core Core DataFusion crate label Feb 20, 2026

andygrove force-pushed the perf/smj-preresolved-comparator branch from bf9d82c to df2bf1c Compare February 20, 2026 17:36

andygrove and others added 2 commits February 20, 2026 11:01

andygrove closed this Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf: Pre-resolve type dispatch in sort-merge join comparators#20452

perf: Pre-resolve type dispatch in sort-merge join comparators#20452
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:perf/smj-preresolved-comparator

andygrove commented Feb 20, 2026

Uh oh!

mbutrovich commented Feb 20, 2026 •

edited

Loading

Uh oh!

andygrove commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

andygrove commented Feb 20, 2026

Summary

Benchmark Results

Test plan

Uh oh!

mbutrovich commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mbutrovich commented Feb 20, 2026 •

edited

Loading