-
Notifications
You must be signed in to change notification settings - Fork 1k
[geo] add geospatial statistics and bbox types for parquet #8225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[geo] add geospatial statistics and bbox types for parquet #8225
Conversation
parquet/src/geospatial/statistics.rs
Outdated
/// # Returns | ||
/// | ||
/// A new `BoundingBox` instance with the specified coordinates. | ||
pub fn new(xmin: f64, ymin: f64, xmax: f64, ymax: f64, zmin: Option<f64>, zmax: Option<f64>, mmin: Option<f64>, mmax: Option<f64>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this very verbose constructor with 8 arguments, I'd suggest having new()
take only the required 4 values for the 2D box. Then have a with_zrange
that takes in non-null zmin and zmax and have a with_mrange
that takes in non-null mmin and mmax. This also ensures that it's impossible to define a null zmin with non-null zmax.
parquet/src/geospatial/statistics.rs
Outdated
/// - 4: MultiPoint | ||
/// - 5: MultiLineString | ||
/// - 6: MultiPolygon | ||
/// - 7: GeometryCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest also linking to https://github.com/apache/parquet-format/blob/ae39061f28d7c508a97af58a3c0a567352c8ea41/Geospatial.md#geospatial-types for the full allowed list of type ids.
parquet/src/geospatial/statistics.rs
Outdated
// ---------------------------------------------------------------------- | ||
// Bounding Box | ||
|
||
/// Represents a 2D/3D bounding box with optional M-coordinate support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe best to just copy the full upstream docstring https://github.com/apache/parquet-format/blob/ae39061f28d7c508a97af58a3c0a567352c8ea41/Geospatial.md#bounding-box
parquet/src/geospatial/statistics.rs
Outdated
/// # Returns | ||
/// | ||
/// A `GeospatialStatistics` instance with no bounding box or type information. | ||
pub fn new_empty() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe cleaner to remove this and just implement Default
?
@alamb may have opinions on whether we should create a new feature flag for this? |
Thanks @kylebarron for the feedback - made some changes |
parquet/src/geospatial/statistics.rs
Outdated
if bbox.zmin.is_some() && bbox.zmax.is_some() { | ||
new_bbox = new_bbox.with_zrange(bbox.zmin.unwrap().into(), bbox.zmax.unwrap().into()); | ||
} else if bbox.zmin.is_some() != bbox.zmax.is_some() { | ||
return Err(ParquetError::General(format!("Z-coordinate values mismatch: {:?} and {:?}", bbox.zmin, bbox.zmax))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but maybe slightly easier to read if you match (bbox.zmin, bbox.zmax)
instead of having these two if
clauses.
bbox: Option<BoundingBox>, | ||
/// Optional list of geospatial geometry type identifiers | ||
/// as specified in https://github.com/apache/parquet-format/blob/ae39061f28d7c508a97af58a3c0a567352c8ea41/Geospatial.md#geospatial-types | ||
geospatial_types: Option<Vec<i32>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be u32
instead of i32
?
https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make it u16
if you wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept it as i32
since to keep it in sync with the thrift definition and c++ implementation, but u16 makes more sense given the possible values
Which issue does this PR close?
GeospatialStatistics
.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Not as much as I would like at the moment. This is a draft to get early feedback. Will add more thorough in subsequent changes.
Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.