Skip to content

Conversation

@Kontinuation
Copy link
Member

This is part of #436 . We defined interface for spatial partitioners and implement several spatial partitioners for different purposes:

  1. KDB: The main partitioner for partitioning the build side
  2. R-tree: partitioning dataset by a given set of partitioning grids
  3. Similar to R-tree, but does not use spatial indexes. This is faster when the number of partitioning grid is small.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements spatial partitioners for efficient spatial join operations. It introduces a pluggable partitioning interface with three implementations: KDB-tree (for data-driven partitioning), R-tree (for indexed lookup of predefined grids), and Flat (for linear scanning when grid counts are small).

Key Changes

  • Defined a SpatialPartitioner trait with support for Regular, Multi, and None partition assignments
  • Implemented KDB-tree partitioner that recursively splits space based on data distribution
  • Implemented R-tree and Flat partitioners for assigning data to predefined partition boundaries

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/partitioning.rs Core trait and enum definitions for spatial partitioning
rust/sedona-spatial-join/src/partitioning/kdb.rs KDB-tree implementation with recursive space partitioning
rust/sedona-spatial-join/src/partitioning/rtree.rs R-tree indexed partitioner for fast boundary lookups
rust/sedona-spatial-join/src/partitioning/flat.rs Linear scan partitioner for small partition counts
rust/sedona-spatial-join/src/partitioning/util.rs Shared utilities for coordinate conversion and geometry operations
rust/sedona-spatial-join/src/lib.rs Module registration for partitioning subsystem
rust/sedona-spatial-join/bench/partitioning/*.rs Benchmark suite for partitioner performance comparison
rust/sedona-spatial-join/Cargo.toml Benchmark target configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All just questions here. Thank you!


impl FlatPartitioner {
/// Create a new flat partitioner from explicit partition boundaries.
pub fn new(boundaries: Vec<BoundingBox>) -> Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we passing around a sufficient number of these that it would be faster/use less memory to have a BoundingBox2d?

Comment on lines +20 to +22
//! This module provides a minimal partitioner that shares the same
//! intersection semantics as [`crate::partitioning::rtree::RTreePartitioner`]
//! but avoids the RTree indexing overhead. It stores partition boundaries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you happen to know how many partition boundaries it takes to make the RTree worth it, it might be nice to add it to this comment

Comment on lines +132 to +135
match partitioner.partition(&bbox).unwrap() {
SpatialPartition::Regular(id) => assert_eq!(id, 0),
_ => panic!("expected Regular partition"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this won't work? (Here and below)

Suggested change
match partitioner.partition(&bbox).unwrap() {
SpatialPartition::Regular(id) => assert_eq!(id, 0),
_ => panic!("expected Regular partition"),
}
assert_eq!(partitioner.partition(&bbox).unwrap(), SpatialPartition::Regular(0));

let rect = bbox_to_geo_rect(bbox)?;

let mut best_leaf_id: Option<u32> = None;
let mut max_area = 0.0_f32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you use let mut max_area = -1.0_f32 and just compare area > max_area. Is there a reason it's different here?

Comment on lines +283 to +284
/// Split this node along the specified axis.
/// Returns true if the split was successful, false otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth briefly explaining here why this split would fail (it seems like it is mostly if you have 100% identical rects)

Comment on lines +389 to +401
for item in self.items.drain(..) {
let belongs_to_left = if split_x {
item.min().x <= split_coord
} else {
item.min().y <= split_coord
};

if belongs_to_left {
left_child.insert_rect(item);
} else {
right_child.insert_rect(item);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is better or not:

Suggested change
for item in self.items.drain(..) {
let belongs_to_left = if split_x {
item.min().x <= split_coord
} else {
item.min().y <= split_coord
};
if belongs_to_left {
left_child.insert_rect(item);
} else {
right_child.insert_rect(item);
}
}
if split_x {
for item in self.items.drain(..) {
if item.min().x <= split_coord {
left_child.insert_rect(item);
} else {
right_child.insert_rect(item);
}
}
} else {
for item in self.items.drain(..) {
if item.min().y <= split_coord {
left_child.insert_rect(item);
} else {
right_child.insert_rect(item);
}
}

use super::*;

#[test]
fn test_kdb_insert_and_leaf_assignment() -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these tests return Result instead of using unwrap()? (I tend to like the failures by unwrap() but DataFusion seems to like returning results)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants