-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(kfp): restructure pipeline to allow mocking sections #17
Conversation
Signed-off-by: Tomas Coufal <[email protected]>
@tumido out of curiousity why not just fix the bug that prevents nested pipelines from passing artifacts? I ask because I plan to start looking at that issue and am wondering if you already found some unsurmountable issue which makes the hack approach a better option? |
@boarder7395 At this moment I'm glad I know KFP good enough from the user perspective. It would take me a while to obtain skills in working on KFP itself. It's a codebase I never touched. Besides that I need to have the pipeline running on our current KFP deployment. I can't be waiting at this moment for a PR getting merged into KFP, then bubbling through KFP release process, then getting the new KFP version adopted by RHOIA, then getting RHOAI updated on our cluster. It's a lot faster to workaround the issue. Especially when my job right now is to get the pipeline working and I can't justify weeks of studying of how KFP works internally, how to set up a dev environment, how to write tests, etc... all of it. I know... in ideal world it would be nice to be able to always resolve the root cause. 🤷 Besides, this is not much of a hack - instead of having a master pipeline and then multiple pipelines per stage, that would be chained from the umbrella master pipeline, we just define a single pipeline that runs all the steps. The mocking would need to happen in either case - we want to be able to develop the stages independently but stages depend on data from previous stage - this data need to be provided by something somehow. Fixing the issue above would change nothing here. |
@tumido That makes sense to me, the one looking down the barrel of doing exactly those steps now. I was familiar with kfp 1.0 but 2.0 has been something I've avoided. Just wanted to make sure there wasn't already a dead end at the end of the tunnel :) |
Co-authored-by: Jude Niroshan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Then I'm gonna self-merge. Because I can. 😄 🙌 |
This PR restructures our approach to KFP pipelines. It also adds a CLI interface:
Since we can't pass data between nested pipelines (kubeflow/pipelines#10041), we need to use a flat, single pipeline.
Each stage code components live in their respective folders as a python package:
In order to provide better dev experience, each stage can be mocked individually. This is done via component substitution - it preserves the component signature but replaces the body. This way developers can mock individual outputs (where it matters) so we don't break continuity.
In this example I provide mocked components for the SDG stage:
In order to provide output artifact from SDG, I've used a trick for Python lightweight components (yes, it's a hack). This trick bypasses the "Hermetic" nature of KFP Python lightweight components. I'm creating a new empty shell python package that provides data via
setuptools.data-files
. This package is installed on the fly to the component runtime. The faked component then ensures that the data gets copied to the output artifact.I order to test the pipeline with mocked SDG do:
main...tumido:kfp-to-cli?expand=1
#diff-8c5a3a6ccb with:pipeline.yaml
and replace each SDG step content with a faked component. Upload the newpipeline.yaml
to KFP and run.