Skip to content

feat: Notebooks to support multiple hudi versions#18255

Open
rangareddy wants to merge 8 commits intoapache:masterfrom
rangareddy:notebooks_to_support_multiple_hudi_versions
Open

feat: Notebooks to support multiple hudi versions#18255
rangareddy wants to merge 8 commits intoapache:masterfrom
rangareddy:notebooks_to_support_multiple_hudi_versions

Conversation

@rangareddy
Copy link
Collaborator

@rangareddy rangareddy commented Feb 26, 2026

Describe the issue this Pull Request addresses

Currently, the Hudi Docker environment is locked into a single version of Hudi and Spark, limiting developers' ability to test cross-version compatibility. Additionally, there is a lack of out-of-the-box support for querying Hudi tables via Trino and Presto within the notebook environment.

This PR addresses these gaps by:

* Refactoring the Dockerfile to support parameterized Hudi and Spark versions.
* Integrating Trino and Presto python clients into the base image.
* Providing specialized notebooks to demonstrate Hudi integration with these query engines.

Summary and Changelog

This PR enhances the local development environment by making the Hudi version dynamic and expanding the analytics toolset available to users.

Impact

Users can now build and run Hudi notebooks for specific versions by passing build arguments. The inclusion of Trino/Presto clients allows for immediate end-to-end testing of the "Lakehouse" architecture (Write via Spark, Read via Trino/Presto) in a single container.

Risk Level

Low. These changes primarily affect the local development/Docker environment and do not alter the core Hudi library code or public Java APIs.

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2026
@danny0405
Copy link
Contributor

Let's fix the compile errors:

Error: ere 2 source files that did not have Apache License [ERROR]
./hudi-notebooks/notebooks/06_hudi_trino_example.ipynb
./hudi-notebooks/notebooks/07_hudi_presto_example.ipynb
Error: Process completed with exit code 1.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.30%. Comparing base (967408c) to head (6bd6d81).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##             master   #18255    +/-   ##
==========================================
  Coverage     57.29%   57.30%            
- Complexity    18560    18579    +19     
==========================================
  Files          1945     1947     +2     
  Lines        106256   106396   +140     
  Branches      13131    13153    +22     
==========================================
+ Hits          60880    60967    +87     
- Misses        39654    39680    +26     
- Partials       5722     5749    +27     
Flag Coverage Δ
hadoop-mr-java-client 45.34% <ø> (-0.07%) ⬇️
spark-java-tests 47.44% <ø> (+0.02%) ⬆️
spark-scala-tests 45.53% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@rangareddy
Copy link
Collaborator Author

Hi @danny0405

Now Al checks are passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants