Normalize image encoders interface to the text encoder interface #1606

KennethEnevoldsen · 2024-12-17T02:35:35Z

As discussed in this PR we would like to normalize the interfaces to allow for:

model.encode(sentences)
model.encode(images)
model.encode(documents) # can be image and text

isaac-chung · 2025-01-02T09:43:01Z

Adding to this comment, the encoder can then be something like:

class MultiModalEncoder:
    """
    For MIEB, MOEB, etc.
    """

    def __init__(self, device: str | None,**kwargs: Any):
        pass

    def encode(
        self,
        inputs: Sequence[str] | Sequence[Image.Image] | DataLoader | Sequence[Any],
        *,
        task_name: str,
        prompt_type: PromptType | None = None,
        **kwargs: Any,
    ) -> np.ndarray:
        """
        inputs: Handles uni-modal or multi-modal inputs. For example:
            - text only: Sequence[str]
            - image and text: Sequence[Any] -> Sequence[tuple[Image.Image, str]]
            - image dataloader and text: Sequence[Any] -> tuple[DataLoader, list[str]]
            - images and text: Sequence[Any] -> Sequence[tuple[list[Image.Image], str]]
        
        This can potentially wrap existing get_fused_embedding methods as well.
        """
        pass

@KennethEnevoldsen @gowitheflow-1998 wdyt?

KennethEnevoldsen mentioned this issue Dec 17, 2024

MIEB: Define an interface for MIEB encoders #1340

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize image encoders interface to the text encoder interface #1606

Normalize image encoders interface to the text encoder interface #1606

KennethEnevoldsen commented Dec 17, 2024

isaac-chung commented Jan 2, 2025 •

edited

Loading

Normalize image encoders interface to the text encoder interface #1606

Normalize image encoders interface to the text encoder interface #1606

Comments

KennethEnevoldsen commented Dec 17, 2024

isaac-chung commented Jan 2, 2025 • edited Loading

isaac-chung commented Jan 2, 2025 •

edited

Loading