Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate whether we should use Ibis for performing table operations #3

Closed
3 of 4 tasks
daniel-thom opened this issue Oct 3, 2024 · 2 comments
Closed
3 of 4 tasks

Comments

@daniel-thom
Copy link
Collaborator

daniel-thom commented Oct 3, 2024

Our current plan is to use sqlalchemy in order to support multiple SQL engines, either writing SQL strings for operations or using its expression API. We would have to write custom SQL strings for non-standard operations, such as reading Parquet and CSV files, pivot and unpivot, and time zone conversions.

Ibis may solve some of these problems.

  • read_csv
  • read_parquet
  • pivot / unpivot
  • time zone conversion
@lixiliu
Copy link
Collaborator

lixiliu commented Oct 8, 2024

As discussed,

Pro: Ibis can easily switch between backends, which makes it easy to scale up from DuckDB to Spark as needed and enables the use of dataframe API for data transformation (pivot/unpivot) and SQL for aggregations.

Cons: Need to learn a new API, API language is not attractive/intuitive. Does not have time zone conversion support.

Our non-standard operations and data transformation are limited to a few types, which we can support without Ibis. The cons outweigh the pros.

@daniel-thom
Copy link
Collaborator Author

Unless I missed something, Ibis does not support a workflow where changes to a database can be rolled back on error. We need this functionality, which is provided by sqlalchemy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants