Replies: 3 comments 2 replies
-
Docling is also available via LlamaIndex; another avenue is to adopt LlamaIndex. What is not clear to me is whether doing so would require using it for vector store interactions as well, nor what the associated tradeoffs would be. |
Beta Was this translation helpful? Give feedback.
-
I agree with @jwm4 that providing a controllable choice of a conversion tool / chunking method is a viable option. I have just created an issue on that (#1061) and am currently working on a PR. Unfortunately, I've only discovered this thread after opening the issue. One important question here is whether we want the choice to be made at server side (via the .yaml configuration) or at the client side, i.e., allow the user to specify their priorities via the insert() method? Both have their pros and cons. |
Beta Was this translation helpful? Give feedback.
-
I think it's worth considering generics in designing a mechanism for jus to be able to handle transforming arbitrary input data into context-ready documents. Docling is a great tool for that and we should probably propose a solution that empowers users to define their data and transform it according to their requirements. How Feast (@feast-dev) does this is through an arbitrary UDF that can be executed at different times (e.g., during data ingestion). It would probably be useful to make thresholding, chunking, tokenization, and transformation all configurable components of the RAG experience. We could probably keep the existing behavior as default. |
Beta Was this translation helpful? Give feedback.
-
I am looking at the code in vector_store.py, and it looks like it is using pypdf to convert documents to text and then splitting that text into overlapping chunks of fixed length. Splitting text into fixed length chunks tends to lead to chunks that are incoherent. For example you might get a chunk that starts in the middle of the last sentence of a section and then continues on to the start of the following section. The overlapping chunks partially offset this limitation: in the previous example the sentence that is split up at the start of the chunk is probably also available as a complete sentence in a different chunk so it is at least possible for RAG to get that entire sentence for generating an answer. However, overlapping chunks brings in its own problems too. For example, the redundancy of the overlapping chunks can result in a lot of duplicated content in the top search results which can crowd out lower ranked distinct results that could turn out to be the ones that have the answers to the questions.
One way to deal with that would be the following:
This would have a variety of other benefits too, e.g., access to Docling's relatively sophisticated handling of tables which can be very important for RAG applications where a lot of critical information is in tables.
Alternatively, rather than replacing the existing logic we could add Docling-based processing as an alternative option. If we make it an option, we could have the option be specified in the
llama_stack_client.types.Document
constructor or in the call toclient.tool_runtime.rag_tool.insert
. I would only want it to be an option if we can provide relatively clear guidance to users about the pros and cons of each option and how to choose one or the other.I would note that DocQA in llama-stack-apps is already using Docling outside of Llama Stack. They take the following approach:
This seems like a decent way to get some of the benefits of Docling, e.g., the table processing. However, it misses out on the opportunity to use the structure to do the chunking. That seems particularly important with complex structures like tables, where it is particularly valuable to group them all together in one chunk whenever possible.
Another open question is the following: If we do add Docling support should it be inline or remote (so you can scale out Docling more) or configurable for either option. If it is configurable, it seems like it would make sense to have a provider type for this purpose (document processing) which would be a much bigger change than just an in-place rewrite of vector_store.py to use Docling instead of pypdf. However, it seems like it would also be more impactful and could open the door to a principled way to support a broader assortment of document processing technologies.
Beta Was this translation helpful? Give feedback.
All reactions