Modernize Bundle Validation CI by Migrating to Testcontainers #17638
Replies: 4 comments
-
|
+1, the current bundle validation flow is kind of heavy and inconvenient for developers to locating problems when validation fails. |
Beta Was this translation helpful? Give feedback.
-
|
+1, guess we need to write more codes here to instantiate the engine execution tasks but it should be more easier for debugging. |
Beta Was this translation helpful? Give feedback.
-
|
+1. Thanks for raising this @voonhous ! An efficient and easy-to-use test infra will be very helpful, and bundle validation is also a critical piece for releases. There is also a bundle-related RFC #6902 |
Beta Was this translation helpful? Give feedback.
-
|
I've added a issue to track the task progress of this, if there's anyone that wants to contribute or add on to it, feel free to ping me or reply to the issue here: #17961 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This discussion proposes a significant modernization of Apache Hudi's bundle validation infrastructure. Currently, our bundle validation process relies on a complex combination of Docker-based shell scripts (
ci_run.shandvalidate.sh) to verify the integrity and functionality of our release artifacts (Spark, Flink, Utilities, etc.).This infrastructure is critical as it powers three major workflows:
bot.yml: The active CI workflow running on PRs and commits. [MAIN]release_candidate_validation.yml: Validation for release candidates (currently disabled).maven_artifact_validation.yml: Post-release validation for Maven Central artifacts (currently disabled).The current execution chain involves a GitHub Actions workflow triggering
ci_run.sh, which sets up a Docker environment, mounts volumes, and then executesvalidate.shinside the container. This script then sequentially runs a series of tests across various bundles.Current CI Structure
Workflow Execution
bot.ymlis active for standard CI, the release candidate and Maven artifact validations are currently disabled manually in the YAML files.Validation Process
Inside the Docker container,
validate.shperforms the following validation steps sequentially:Env Stack
ci_run.sh,validate.sh) managing the lifecycle.Current Challenges
After analyzing the packaging/bundle-validation/ directory, several pain points are evident:
Proposed Solution: Migrate to Testcontainers
What is Testcontainers?
Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
Benefits
WaitStrategiesto ensure services like Hive Metastore are fully ready before starting tests, replacing arbitrary sleep commands found in current scripts.Success Metrics / End goal
mvn test -pl packaging/hudi-bundle-validationor write tests just like how they will write unit tests, and be able to run them from their IDE without to switch between terminals and run shell commands.References
Testcontainers Documentation
JUnit 5 Parallel Execution
Apache Hudi Bundle Validation Readme
Next Steps
If the community is aligned with this proposal, we can start drafting out more concrete plans on how to navigate this migration.
Looking forward to your thoughts!
Beta Was this translation helpful? Give feedback.
All reactions