You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently because of generative nature of the search engines and RAG components in larch, there's no way to know which part of the chunks the response has been generated.
I propose a new feature component like ChunkHighlighter or something that takes in list of chunks and generated response (all being texts/strs) and have downstream implement some highlighting mechanism.
For example, here's the basic token-based highlighter we can have
fromlarch.highlightersimportChunkHighlighterclassTextSpan(BaseModel):
start: Optional[int] =Noneend: Optional[int] =Nonetext: Optional[str] =NoneclassTokenBasedHighlighter(ChunkHighlighter):
defcompute_highliights(self, chunks: List[str], text: str) ->List[Tuple[TextSpan]]:
tokens=text.split()
highlights= []
forchunkinchunks:
chunk_highlights= []
fortokenintokens:
start=chunk.find(token)
ifstart!=-1:
chunk_highlights.append(TextSpan(start=start, end=start+len(token)))
highlights.append(chunk_highlights)
returnhighlightstext="this is a dummy response"chunks= [
"There are multiple ways to generate dummy things as response",
"What a dummy chunk this is"
]
highlights=TokenBasedHighlighter()(chunks=chunks, text=text)
# maybe we could even have greedy approach, that gets longest highlight in the index if they are sequential in the text chunkhighlights=TokenBasedHighlighter(greedy=True)(chunks=chunks, text=text)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Currently because of generative nature of the search engines and RAG components in larch, there's no way to know which part of the chunks the response has been generated.
I propose a new feature component like
ChunkHighlighter
or something that takes in list of chunks and generated response (all being texts/strs) and have downstream implement some highlighting mechanism.For example, here's the basic token-based highlighter we can have
Beta Was this translation helpful? Give feedback.
All reactions