-
Notifications
You must be signed in to change notification settings - Fork 36
feat: Add spatial partitioners for partitioning spatially joined data by partitioning grids #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
52bf81f to
fe59861
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements spatial partitioners for efficient spatial join operations. It introduces a pluggable partitioning interface with three implementations: KDB-tree (for data-driven partitioning), R-tree (for indexed lookup of predefined grids), and Flat (for linear scanning when grid counts are small).
Key Changes
- Defined a
SpatialPartitionertrait with support forRegular,Multi, andNonepartition assignments - Implemented KDB-tree partitioner that recursively splits space based on data distribution
- Implemented R-tree and Flat partitioners for assigning data to predefined partition boundaries
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-spatial-join/src/partitioning.rs | Core trait and enum definitions for spatial partitioning |
| rust/sedona-spatial-join/src/partitioning/kdb.rs | KDB-tree implementation with recursive space partitioning |
| rust/sedona-spatial-join/src/partitioning/rtree.rs | R-tree indexed partitioner for fast boundary lookups |
| rust/sedona-spatial-join/src/partitioning/flat.rs | Linear scan partitioner for small partition counts |
| rust/sedona-spatial-join/src/partitioning/util.rs | Shared utilities for coordinate conversion and geometry operations |
| rust/sedona-spatial-join/src/lib.rs | Module registration for partitioning subsystem |
| rust/sedona-spatial-join/bench/partitioning/*.rs | Benchmark suite for partitioner performance comparison |
| rust/sedona-spatial-join/Cargo.toml | Benchmark target configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe59861 to
859dd9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
859dd9d to
27a6c66
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All just questions here. Thank you!
|
|
||
| impl FlatPartitioner { | ||
| /// Create a new flat partitioner from explicit partition boundaries. | ||
| pub fn new(boundaries: Vec<BoundingBox>) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we passing around a sufficient number of these that it would be faster/use less memory to have a BoundingBox2d?
| //! This module provides a minimal partitioner that shares the same | ||
| //! intersection semantics as [`crate::partitioning::rtree::RTreePartitioner`] | ||
| //! but avoids the RTree indexing overhead. It stores partition boundaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you happen to know how many partition boundaries it takes to make the RTree worth it, it might be nice to add it to this comment
| match partitioner.partition(&bbox).unwrap() { | ||
| SpatialPartition::Regular(id) => assert_eq!(id, 0), | ||
| _ => panic!("expected Regular partition"), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this won't work? (Here and below)
| match partitioner.partition(&bbox).unwrap() { | |
| SpatialPartition::Regular(id) => assert_eq!(id, 0), | |
| _ => panic!("expected Regular partition"), | |
| } | |
| assert_eq!(partitioner.partition(&bbox).unwrap(), SpatialPartition::Regular(0)); |
| let rect = bbox_to_geo_rect(bbox)?; | ||
|
|
||
| let mut best_leaf_id: Option<u32> = None; | ||
| let mut max_area = 0.0_f32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you use let mut max_area = -1.0_f32 and just compare area > max_area. Is there a reason it's different here?
| /// Split this node along the specified axis. | ||
| /// Returns true if the split was successful, false otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth briefly explaining here why this split would fail (it seems like it is mostly if you have 100% identical rects)
| for item in self.items.drain(..) { | ||
| let belongs_to_left = if split_x { | ||
| item.min().x <= split_coord | ||
| } else { | ||
| item.min().y <= split_coord | ||
| }; | ||
|
|
||
| if belongs_to_left { | ||
| left_child.insert_rect(item); | ||
| } else { | ||
| right_child.insert_rect(item); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is better or not:
| for item in self.items.drain(..) { | |
| let belongs_to_left = if split_x { | |
| item.min().x <= split_coord | |
| } else { | |
| item.min().y <= split_coord | |
| }; | |
| if belongs_to_left { | |
| left_child.insert_rect(item); | |
| } else { | |
| right_child.insert_rect(item); | |
| } | |
| } | |
| if split_x { | |
| for item in self.items.drain(..) { | |
| if item.min().x <= split_coord { | |
| left_child.insert_rect(item); | |
| } else { | |
| right_child.insert_rect(item); | |
| } | |
| } | |
| } else { | |
| for item in self.items.drain(..) { | |
| if item.min().y <= split_coord { | |
| left_child.insert_rect(item); | |
| } else { | |
| right_child.insert_rect(item); | |
| } | |
| } |
| use super::*; | ||
|
|
||
| #[test] | ||
| fn test_kdb_insert_and_leaf_assignment() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these tests return Result instead of using unwrap()? (I tend to like the failures by unwrap() but DataFusion seems to like returning results)
This is part of #436 . We defined interface for spatial partitioners and implement several spatial partitioners for different purposes: