Skip to content

Conversation

@dain
Copy link
Member

@dain dain commented Jan 8, 2026

Description

Migrate the geospatial plugin from ESRI geometry-api-java to JTS (Java Topology Suite) as the core geometry library. JTS is more widely used, better maintained, and provides the foundation for upcoming Iceberg geometry type support.

Key changes:

  • Use JTS Geometry as the native stack type instead of serialized bytes
  • Replace custom geometry serialization with standard EWKB format
  • Convert all geometry functions (ST_*, Bing tiles, aggregations, spatial joins)
  • Simplify Hadoop ESRI JSON reader and PostgreSQL connector geometry handling

Additional context and related issues

JTS is the de facto standard geometry library in the Java ecosystem, used by GeoTools, PostGIS, and Apache Sedona. This change aligns Trino's geospatial implementation with the broader ecosystem and enables future improvements like Iceberg geometry support.

Behavioral differences:

  • ST_Union: Point-on-line union no longer inserts vertices at intersection points
  • ST_Union: Empty inputs return empty geometry collection instead of null
  • ST_Buffer: Uses 8 quadrant segments (PostGIS/GEOS standard) - output coordinates differ slightly
  • WKT parsing is stricter and rejects invalid syntax that ESRI silently accepted (e.g., MULTIPOLYGON EMPTY must be MULTIPOLYGON EMPTY, not MULTIPOLYGON(EMPTY))

Release notes

(x) Release notes are required, with the following suggested text:

## Geospatial
  * Replace ESRI geometry library with JTS for improved ecosystem compatibility. ({issue}`issuenumber`)
  * WKT parsing is now stricter per OGC standards and rejects previously accepted invalid syntax. ({issue}`issuenumber`)
  * `ST_Union` edge case changes: empty inputs return empty geometry collection instead of null, and point-on-line unions no longer insert vertices at intersection points. ({issue}`issuenumber`)

Fix test data that was accepted by ESRI but rejected by JTS which
strictly enforces the OGC Simple Features Specification:

- Close polygon rings (first point must equal last point)
- Fix single-point LINESTRING to have two points (minimum required)
- Fix MULTILINESTRING EMPTY syntax (remove extra parentheses)
- Replace invalid MULTIPOLYGON with overlapping polygons using ST_Union
- Replace degenerate polygons in GEOMETRYCOLLECTION with valid geometries
@cla-bot cla-bot bot added the cla-signed label Jan 8, 2026
@github-actions github-actions bot added the postgresql PostgreSQL connector label Jan 8, 2026
@dain dain force-pushed the user/dain/geo-jts branch from b4e8c12 to aad9700 Compare January 9, 2026 01:13
@github-actions github-actions bot added the hive Hive connector label Jan 9, 2026
@dain dain force-pushed the user/dain/geo-jts branch 2 times, most recently from 143d53e to f81439b Compare January 9, 2026 07:41
dain added 7 commits January 9, 2026 00:56
Add test assertion helpers that use ST_Equals for geometric comparison
instead of WKT string comparison. This makes tests insensitive to
vertex ordering and ring starting position differences between
geometry libraries.

Also includes minor WKT compliance fixes that must be done during this
change: close polygon rings and fix invalid MULTIPOLYGON using ST_Union.
Migrate simple geometry functions to use JTS library.

Test updates for behavior differences:
- ST_Boundary returns LINESTRING instead of MULTILINESTRING for simple polygons
- ST_Buffer with infinity returns POLYGON EMPTY instead of MULTIPOLYGON EMPTY
- Minor floating-point precision differences in some calculations
Migrate ST_NumPoints and related accessor functions to JTS.

Test updates for behavior differences:
- ST_NumPoints now counts closing vertices in polygons per OGC standard
- Ring vertex ordering may differ cosmetically (same geometry)
Add JTS-compatible overloads for geometry utility methods to support
incremental migration from ESRI to JTS. The ESRI versions remain for
existing callers until they are converted.
@dain dain force-pushed the user/dain/geo-jts branch 2 times, most recently from 7a7b73e to 9d4f56e Compare January 9, 2026 17:54
dain added 6 commits January 9, 2026 12:11
Rewrite stUnion to use JTS UnaryUnionOp instead of ESRI cursors.

Behavior differences:
- Point-on-line union does not insert vertices
- Empty inputs return empty geometry collection instead of null
- Migrate spatial join operator to JTS for intersection and
  containment tests
- Switch GeoFunctions envelope operations to use JTS Envelope
  (deserializeEnvelope, ST_XMin/XMax/YMin/YMax, ST_IsEmpty)
Use Extended Well-Known Binary (EWKB) format for geometry serialization.
EWKB is the standard used by PostGIS and retains the SRID (Spatial
Reference System Identifier) for coordinate system information.
Note: TestEsriTable's expected values file was converted from Trino's
old internal binary format to WKT. This change cannot be separated
into an earlier commit because the old format's deserializer was
deleted in the EWKB commit, and circular Maven dependencies prevent
adding geospatial as a test dependency to trino-hive.
Change the internal representation of geometry values to use JTS
Geometry objects directly, avoiding unnecessary serialization cycles
between function calls.
@dain dain force-pushed the user/dain/geo-jts branch from 9d4f56e to 0de9eeb Compare January 9, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector postgresql PostgreSQL connector

Development

Successfully merging this pull request may close these issues.

2 participants