Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add SuperComponent #174

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

feat: add SuperComponent #174

wants to merge 21 commits into from

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented Jan 22, 2025

Related Issues

SuperComponents

Supercomponents in general behave like any other component. They have init params, from_dict() and to_dict() methods as usual. The init params typically determine how the internal pipeline is constructed (e.g. which components are used).

from haystack_experimental.super_components.converters.file_converter import AutoFileConverter
file_converter = AutoFileConverter()
file_converter.to_dict()

Expanding SuperComponents

What makes SuperComponents special is the ability to expand it by calling their to_super_component_dict() method. This converts the component to a generic SuperComponent that contains the pipeline constructed by the SuperComponent. From there on the pipeline can be changed in any way.

file_converter.to_super_component_dict()

Load as SuperComponent:

from haystack_experimental.core.super_component import SuperComponent

SuperComponent.from_dict(file_converter.to_super_component_dict())

Proposed Changes:

How did you test it?

Notes for the reviewer

Check out the super_components.ipynb to get an idea how it's working

Checklist

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coveralls
Copy link

coveralls commented Jan 22, 2025

Pull Request Test Coverage Report for Build 12949947646

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.2%) to 79.889%

Totals Coverage Status
Change from base Build 12948231503: 1.2%
Covered Lines: 1577
Relevant Lines: 1974

💛 - Coveralls

@mathislucka
Copy link
Member

mathislucka commented Jan 24, 2025

@tstadel I've updated the PR and did some clean up. Here is what I would propose:

Naming
We rename the PipelineWrapper to SuperComponent.
SuperComponent diverges from our normal naming scheme (*er) on purpose to signal that this is a different type of component with special features.
I also moved all SuperComponent-related changes to core for the same reason.

Inheritance
We need a slight tweak to how we create ready-made super components (e.g. AutoFileConverter).
I decided to create a SuperComponentBase-class that any custom super components have to inherit from.
SuperComponent inherits from the base class too.
The reason is that all custom super components would have to define to_dict and from_dict methods if they inherited directly from SuperComponent. However, if the custom super components do not have any complex parameters to serialize, they would not have to define these methods because we can handle them with the default serialization in the pipeline.
If a custom super component inherited from SuperComponent and would not override the serialization methods, this would lead to confusing situations where it would serialize to a SuperComponent in a pipeline instead of the actual type of the custom super component.

Inheriting from SuperComponentBase avoids this scenario.

Further Examples

I updated the example notebook and made the initialization of the AutoFileConverter less complex, so that users can understand it better. It's more verbose but easier to customize and adapt.

Here's another simple super component as an example:

from typing import Optional, Dict, Any

from haystack import component, Pipeline

from haystack.components.generators.chat import AzureOpenAIChatGenerator, OpenAIChatGenerator, HuggingFaceAPIChatGenerator
from haystack.utils import Secret

from haystack_experimental.core.super_component import SuperComponentBase

_GENERATOR_PROVIDERS = {
    "openai": OpenAIChatGenerator,
    "azure": AzureOpenAIChatGenerator,
    "huggingface": HuggingFaceAPIChatGenerator,
}

@component
class AutoGenerator(SuperComponentBase):
    def __init__(
            self,
            model: str = "openai:gpt-4o",
            api_key=Secret.from_env_var("LLM_API_KEY"),
            generation_kwargs: Optional[Dict[str, Any]] = None
    ) -> None:
        if ":" not in model:
            raise ValueError("Model string must be in format 'provider:model_name'")

        provider, model_name = model.split(":")
        if provider not in _GENERATOR_PROVIDERS:
            raise ValueError(
                f"Provider must be one of {list(_GENERATOR_PROVIDERS.keys())}"
            )
        
        generator = _GENERATOR_PROVIDERS[provider](
            api_key=api_key,
            model=model_name,
            generation_kwargs=generation_kwargs,
        )
        
        pp = Pipeline()
        pp.add_component("generator", generator)

        super(AutoGenerator, self).__init__(pipeline=pp)

@mathislucka mathislucka changed the title wip: supercomponents feat: add SuperComponent Jan 24, 2025
@mathislucka
Copy link
Member

Benefits

An abstraction layer that wraps around pipelines was requested by the community many times (e.g. deepset-ai/haystack#7638).

They allow to encapsulate an entire pipeline so that it is easier to organize code and keep an overview over very large pipelines.

Additionally, we got feedback that our components are very atomic (which is generally good) and they can be hard to understand by novice users.

SuperComponentBase allows us to carefully introduce higher-level abstractions for common component patterns in Haystack, making it easier to get started for novice users.

At the same time, we are not hiding how these components are implemented. Any advanced Haystack user who understands how to build pipelines can look at the code for our super components and customize it if needed.

For deepset Studio, we are giving users the possibility to zoom-in to any super component if they want to understand how it works. By offering the to_super_component_dict-method, Studio users can decide to convert any super component to a generic SuperComponent which will allow them to customize the underlying pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants