Support for GeoParquet #79
-
Hi team, great work on this, its a really promising solution to improving the performance of some of our big queries. I'm trying to figure out the best workflow to incorporate Spatial when regular parquet/delta/etc files don't support geo types? The area i'm having difficulty is marrying the two. We've got 200gb+ of data is in PostgreSQL using PostGIS and getting it into duckdb means writing the data out of PostgreSQL into a format that can then be read back in to memory. Our experiments so far have shown we can write a 15gb 45M row table to standard partitioned parquet with an 8x memory saving (low cardinality data). That suddenly makes in-memory processing much more achievable, but getting the data into duckdb has been the difficulty. Could be that I've overlooked something, but at the moment I'm struggling with getting the geo data with the geom types preserved into duckdb under the conditions in which ducdb shines, where the data is small enough to read into memory. I'd love to hear your thoughts and maybe get some suggestions on getting our existing setup into a more workable state. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi! Since you mentioned postgis, you could also try to use the duckdb postgres scanner to scan postgres tables directly, using the ST_GeomAsWKB function on the postgis side and then ST_GeomFromWKB on the duckdb side to convert the geometry columns, skipping the parquet step entirely. |
Beta Was this translation helpful? Give feedback.
Hi!
Improving the geo support in our parquet reader/writer is something were looking into, but for now you should be able to work around it. Since geoparquet stores geometry as "WKB" in binary columns with some special metadata (which we dont read yet), you can just convert the blob data to duckdb geometry using the ST_GeomFromWKB(BLOB) function.
Since you mentioned postgis, you could also try to use the duckdb postgres scanner to scan postgres tables directly, using the ST_GeomAsWKB function on the postgis side and then ST_GeomFromWKB on the duckdb side to convert the geometry columns, skipping the parquet step entirely.