-
Hi! I was curious what the most performant way of writing out GeoParquet from DuckDB is, I've been surprised how fast it is to read in (geo)parquet, while writing it back out is quite a bit slower. For example, I've staged
which is amazing! If I immediately export it back to parquet, I get:
If I add I'm very new to parquet and duckdb so I'm not sure what all to expect for write performance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
hello @youngpm That export time is really slow for a file that size. Try removing the Parquet is default export type and I think it will use snappy compression which is fine for testing. |
Beta Was this translation helpful? Give feedback.
-
Hey @mtravis , I gave that a go:
So a 3x improvement, not bad! Although still about 100x slower than reading into duckdb. Interestingly, the progress bar goes to 99% in the first few seconds but bogs down afterwards (true of my original post too), not sure what that's telling me. @Maxxen Do you think this is worth filing an issue for? Just not sure if there's a bug or its expected. |
Beta Was this translation helpful? Give feedback.
-
@Maxxen An update! It looks like setting
compared to using the default
|
Beta Was this translation helpful? Give feedback.
I don't think this is a bug, Parquet is a heavily read-optimized format so I think its expected that writes are much slower depending on the column types/compression codecs used. The "Geo"-Parquet code path only adds a small amount of additional processing to calculate the geo-specific metadata/statistics as required by the geoparquet specification, so I'd only be inclined to investigate write-speed if it turns out that writing geoparquet is significantly slower than normal parquet.