Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does kedro seem to avoid accessing run arguments or context in higher level functions? #4104

Closed
MarcelBeining opened this issue Aug 19, 2024 · 6 comments
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature

Comments

@MarcelBeining
Copy link

Description

We use kedro pipelines alot for our AI projects and we stumble so often over this problem, that it is time to make an issue about it.
We regularly pass arguments like the desired environment as a run argument to kedro run.
We also need custom functionalities that we implement into settings.py (e.g. custom hooks) and pipeline_registry.py (e.g. custom pipeline combination). For these functionalities we sometimes need extra information, such as the environment we are running.

There is no simple and robust way to access run arguments in these functions!
Possible solutions that have been suggested and tested by us so far:

  1. Use sys.argv ourselves: that seems kind of error-prone if the env is handed over in some other way (e.g. KEDRO_ENV)
  2. Get the env info from session object: that worked until get_current_session() was deprecated in 0.18 and it seems completely impossible now to access session object in deeper functions.
  3. Put an "env" globals variable in each env and use it: That works for using it in catalog but in the config files mentioned above, we would need to reinitialize an OmegaConfigLoader, which requires... guess what: defining the env :-D
  4. Use a hook to intercept during after_context_created, save the env information from there in a global class/variable and use it: That seems very hacky and works only for pipeline_registry.py, not for settings.py as that is called before after_context_created

The same problem one has of course if trying to access any parameter from parameters.yml in these higher level files.

Context

This should be important for anyone, who extends kedro pipeline functionality above its standard use.

Possible Implementation

Simply make it possibly to import and access the kedro context or session object (at least in some frozen, read-only state) from anywhere!

@MarcelBeining MarcelBeining added the Issue: Feature Request New feature or improvement to existing feature label Aug 19, 2024
@MarcelBeining MarcelBeining changed the title <Title> Why does kedro seem to avoid accessing run arguments or context in higher level functions? Aug 19, 2024
@DimedS
Copy link
Contributor

DimedS commented Aug 20, 2024

Hi @MarcelBeining. Thanks for raising this! I think it makes a lot of sense. Would you be interested in submitting a pull request for this? If not, the Kedro maintainers could consider adding it.

@MarcelBeining
Copy link
Author

Hi @DimedS. I guess it had some reasons why kedro maintainers designed it as it is now. So before I trial-and-error different implementations until it suits the (to me unknown) design principles, I'd rather suggest the kedro maintainers should add it :-)

@noklam
Copy link
Contributor

noklam commented Aug 27, 2024

Can you explains what arguments do you need? Maybe I don't understand the question, isn't this available in hooks?

https://docs.kedro.org/en/stable/api/kedro.framework.hooks.specs.PipelineSpecs.html#kedro.framework.hooks.specs.PipelineSpecs

@MarcelBeining
Copy link
Author

MarcelBeining commented Sep 3, 2024

Sure, there are two use cases:

  1. We need parameters from the correct parameter.yml in settings.py to fill in configurable email details (sender, recipient etc.) to an EmailNotifier hook (https://gitlab.com/anacision/kedro-expectations#notification). But in settings.py it is currently not possible to get the correct kedro parameters at it is not possible to find out which environment argument is used for the current run.
  2. Depending on the environment, some pipelines should not be available (i.e. build together in pipeline_registry.py) to avoid executing critical code in production. Here we also would know in pipeline_registry.py what env argument kedro is run with. This is even before "before_pipeline_run" so no possibility to get the env argument from there. And even if, it would be kind of hacky as one would have to use a global variable and needs an extra hook that fills it.

@noklam noklam added the Community Issue/PR opened by the open-source community label Sep 9, 2024
@merelcht
Copy link
Member

Sure, there are two use cases:

  1. We need parameters from the correct parameter.yml in settings.py to fill in configurable email details (sender, recipient etc.) to an EmailNotifier hook (https://gitlab.com/anacision/kedro-expectations#notification). But in settings.py it is currently not possible to get the correct kedro parameters at it is not possible to find out which environment argument is used for the current run.

I don't have a clear cut answer on how to fix this, but what you're trying to do here does go against the flow of execution for Kedro. settings.py is used to instantiate all components needed for a functioning Kedro project pre running it. It's not meant to contain knowledge about the runtime variables. The architecture diagram might help illustrate how components are designed to interact with each other:
https://docs.kedro.org/en/stable/extend_kedro/architecture_overview.html

  1. Depending on the environment, some pipelines should not be available (i.e. build together in pipeline_registry.py) to avoid executing critical code in production. Here we also would know in pipeline_registry.py what env argument kedro is run with. This is even before "before_pipeline_run" so no possibility to get the env argument from there. And even if, it would be kind of hacky as one would have to use a global variable and needs an extra hook that fills it.

For this second case, can't you use namespaces to filter what pipelines should be executed? https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html

@merelcht
Copy link
Member

merelcht commented Nov 1, 2024

Closing this due to inactivity. Feel free to re-open this to continue the conversation!

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

4 participants