Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Optimization: Where to parse Parquet to Arrow #23

Open
lanlou1554 opened this issue Apr 3, 2024 · 0 comments
Open

Optimization: Where to parse Parquet to Arrow #23

lanlou1554 opened this issue Apr 3, 2024 · 0 comments

Comments

@lanlou1554
Copy link
Collaborator

Currently we parse the Parquet on the client side, so every time even for cache hit, we have to re-parse the Parquet files again.
How about parsing it on the server side, which enables to cache Arrow result and save one disk I/O (no need to store the file into disk in the client)
But it is a trade-off: more cache space is needed on the server side, and more data needs to be transmitted via the network.

@lanlou1554 lanlou1554 changed the title Parse Parquet to Arrow Optimization: Where to parse Parquet to Arrow Apr 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant