-
Notifications
You must be signed in to change notification settings - Fork 190
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions around Iceberg-rust #450
Comments
Catalog will manage the bucket by their own. For example, to start a demo rest catalog, your can use: iceberg-rust/crates/catalog/rest/testdata/rest_catalog/docker-compose.yaml Lines 21 to 37 in 3a947fa
Both pyiceberg and iceberg-rust implement the Iceberg specification. You can write an Iceberg table with iceberg-rust and then read it using pyiceberg.
iceberg-rust has not yet fully implemented support for DataFusion. Therefore, users will need to utilize the Rust APIs we provide to manipulate tables.
Iceberg-rust does not directly handle authentication or authorization. Typically, these functions are managed by catalog implementations. The community is moving to deprecate the token endpoint and adopt the OAuth2 specification (not decided yet).
There should be no difference in iceberg-rust as long as it adheres to the iceberg-rest catalog specifications.
OpenDAL serves as the core file IO for iceberg-rust and does not directly interact with the catalog. However, it's possible to implement an iceberg catalog based on opendal. |
It depends on what catalog you are using. For hms/glue catalog, which could be classified as client side catalog, you need to setup hive metastore or glue server, and pass
Currently there is no relationship between these two libraries, and they are just iceberg implementation in different languages. iceberg-rust is a library, so you can use it in a server, but you need to write server code by yourself. Since pyiceberg and iceberg-rust both implement iceberg spec, so you can in theory use iceberg-rust to write data into iceberg table, and use pyiceberg to read them, and vice verse.
Currently iceberg-rust has not implemented writing to table yet. The community focuses on reading support in recent releases. @Xuanwo 's answer about other parts are great, and I don't have much to add. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hello, I had some questions around Iceberg-rust regarding data interactions with S3, authn, and authz.
How does connecting an Iceberg catalog with a specific S3 bucket work? I understand the structure on S3 with dividing a table into parquet data files and avro metadata files, but I am not sure how the relationship between this file organization and a deployed catalog works, and how to configure that exactly.
Where does Pyiceberg fit into Iceberg-rust? Would it be possible to deploy Iceberg-rust on the server side, and interact with the rest catalog through Pyiceberg? I like python as a nice interface for data consumers to interact with a catalog, and for basic management of tables.
What are the write table options with an Iceberg rust? As of now, is it only possible with a distributed engine like Spark or Trino? What would be the bottlenecks to duckdb, polars, or Ibis+backend writes? The vast majority of my datasets are less than 50Gb currently, and most workloads a fraction of that. I would like to use Iceberg for its superior data management vs files, but initially for use cases that can mostly be done on a single node and don't really need the power of distributed engines.
How does authentication and authorization work with the current Iceberg-rust? The access control system I described above works for AWS S3 and sharing files. Any pointers about where I could learn to integrate IAM permissions into a catalog and tables? It seems the creators of https://github.com/hansetag/iceberg-catalog are in the middle of implementing some of these exact features. I would love to contribute on these features and implement for my use case. It seems the way it works where non-AWS credentials are vended to consumers, and the catalog uses AWS credentials to sign S3 requests for the users, but I am not sure. I am also not sure how this implementation compares with the open-sourced implementation released by Databricks.
Where exactly does OpenDAL fit into the Iceberg-rust catalog? Would OpenDAL help standardize accessing data from the catalog? The custom metadata Tracking issues of user metadata support opendal#4842 feature could also be useful for connecting tables to different authz commands.
The text was updated successfully, but these errors were encountered: