Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate hudi-rs in AWS SDK for Pandas #172

Open
kazdy opened this issue Oct 14, 2024 · 4 comments
Open

Integrate hudi-rs in AWS SDK for Pandas #172

kazdy opened this issue Oct 14, 2024 · 4 comments
Assignees
Labels
p1 python Related to Python codebase

Comments

@kazdy
Copy link
Contributor

kazdy commented Oct 14, 2024

Description of the improvement

AWS supports Hudi in most of their data services, many users leverage AWS SDK for Pandas (formerly AWS DataWrangler) to handle their data.
Since hudi-rs provides Python bindings, we can support reading Hudi tables using the forementioned SDK.

Expected behavior

provide method in AWS SDK for Pandas"
::read_hudi(path, storage_options)
that allows users to read from hudi tables

Additional context

No response

@kazdy
Copy link
Contributor Author

kazdy commented Oct 14, 2024

@xushiyan please assign me to this one

@kazdy kazdy changed the title Integrate hudi_rs in AWS SDK for Pandas Integrate hudi-rs in AWS SDK for Pandas Oct 14, 2024
@xushiyan xushiyan added this to the release-0.3.0 milestone Oct 14, 2024
@xushiyan
Copy link
Member

@kazdy looks like this is more for aws sdk for pands than on hudi-rs. You would need to implement some integration logic on the sdk side and leveraging hudi-rs python api. We can consider proper APIs to add in hudi-rs to support the integration. I'll keep this open to track the integration work.

@xushiyan xushiyan removed this from the release-0.3.0 milestone Nov 30, 2024
@xushiyan xushiyan added python Related to Python codebase p2 labels Nov 30, 2024
@xushiyan
Copy link
Member

Related issue aws/aws-sdk-pandas#1470

@xushiyan
Copy link
Member

an example usage https://aws-sdk-pandas.readthedocs.io/en/stable/tutorials/039%20-%20Athena%20Iceberg.html#Query
similarly it can support read hudi with time travel

@xushiyan xushiyan added p1 and removed p2 labels Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 python Related to Python codebase
Projects
None yet
Development

No branches or pull requests

2 participants