Component for highlighting chunks based on the generated response/answer #21

NISH1001 · 2024-01-30T04:02:10Z

NISH1001
Jan 30, 2024
Maintainer

Currently because of generative nature of the search engines and RAG components in larch, there's no way to know which part of the chunks the response has been generated.

I propose a new feature component like ChunkHighlighter or something that takes in list of chunks and generated response (all being texts/strs) and have downstream implement some highlighting mechanism.

For example, here's the basic token-based highlighter we can have

from larch.highlighters import ChunkHighlighter

class TextSpan(BaseModel):
    start: Optional[int] = None
    end: Optional[int] = None
    text: Optional[str] = None

class TokenBasedHighlighter(ChunkHighlighter):
    def compute_highliights(self, chunks: List[str], text: str) -> List[Tuple[TextSpan]]:
        tokens = text.split()
        highlights = []
        for chunk in chunks:
            chunk_highlights = []
            for token in tokens:
                start = chunk.find(token)
                if start != -1:
                    chunk_highlights.append(TextSpan(start=start, end=start+len(token)))
            highlights.append(chunk_highlights)
        return highlights


text = "this is a dummy response"

chunks = [
"There are multiple ways to generate dummy things as response",
"What a dummy chunk this is"
]

highlights = TokenBasedHighlighter()(chunks=chunks, text=text)

# maybe we could even have greedy approach, that gets longest highlight in the index if they are sequential in the text chunk
highlights = TokenBasedHighlighter(greedy=True)(chunks=chunks, text=text)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component for highlighting chunks based on the generated response/answer #21

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Component for highlighting chunks based on the generated response/answer #21

NISH1001 Jan 30, 2024 Maintainer

Replies: 0 comments

NISH1001
Jan 30, 2024
Maintainer