Skip to content

Conversation

kaushiksrini
Copy link

@kaushiksrini kaushiksrini commented Aug 26, 2025

Which issue does this PR close?

Rationale for this change

  • This is part of a draft to support geospatial types (geometry and geography) in Parquet. This has been

What changes are included in this PR?

  • Structs for supporting geospatial statistics information (bbox and geospatial types) derived from thrift classes.
  • Would appreciate feedback on structure and where certain parts should go.

Are these changes tested?

Not as much as I would like at the moment. This is a draft to get early feedback. Will add more thorough in subsequent changes.

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@kaushiksrini kaushiksrini marked this pull request as draft August 26, 2025 01:58
@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 26, 2025
/// # Returns
///
/// A new `BoundingBox` instance with the specified coordinates.
pub fn new(xmin: f64, ymin: f64, xmax: f64, ymax: f64, zmin: Option<f64>, zmax: Option<f64>, mmin: Option<f64>, mmax: Option<f64>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this very verbose constructor with 8 arguments, I'd suggest having new() take only the required 4 values for the 2D box. Then have a with_zrange that takes in non-null zmin and zmax and have a with_mrange that takes in non-null mmin and mmax. This also ensures that it's impossible to define a null zmin with non-null zmax.

/// - 4: MultiPoint
/// - 5: MultiLineString
/// - 6: MultiPolygon
/// - 7: GeometryCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ----------------------------------------------------------------------
// Bounding Box

/// Represents a 2D/3D bounding box with optional M-coordinate support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// # Returns
///
/// A `GeospatialStatistics` instance with no bounding box or type information.
pub fn new_empty() -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cleaner to remove this and just implement Default?

@kylebarron
Copy link
Contributor

@alamb may have opinions on whether we should create a new feature flag for this?

@kaushiksrini
Copy link
Author

Thanks @kylebarron for the feedback - made some changes

Comment on lines 207 to 211
if bbox.zmin.is_some() && bbox.zmax.is_some() {
new_bbox = new_bbox.with_zrange(bbox.zmin.unwrap().into(), bbox.zmax.unwrap().into());
} else if bbox.zmin.is_some() != bbox.zmax.is_some() {
return Err(ParquetError::General(format!("Z-coordinate values mismatch: {:?} and {:?}", bbox.zmin, bbox.zmax)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you but maybe slightly easier to read if you match (bbox.zmin, bbox.zmax) instead of having these two if clauses.

bbox: Option<BoundingBox>,
/// Optional list of geospatial geometry type identifiers
/// as specified in https://github.com/apache/parquet-format/blob/ae39061f28d7c508a97af58a3c0a567352c8ea41/Geospatial.md#geospatial-types
geospatial_types: Option<Vec<i32>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make it u16 if you wanted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept it as i32 since to keep it in sync with the thrift definition and c++ implementation, but u16 makes more sense given the possible values

@kaushiksrini kaushiksrini marked this pull request as ready for review August 27, 2025 02:34
@kaushiksrini kaushiksrini changed the title [draft] add geospatial statistics and bbox types for parquet [geo] add geospatial statistics and bbox types for parquet Aug 27, 2025
@kaushiksrini kaushiksrini marked this pull request as draft August 27, 2025 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants