Chatbot with PDF for Semantic Search over Documents (Build with Streamlit, LangChain, Pinecone/Chroma/Azure Cognitive Search)
This repository contains a code example for how to build an interactive chatbot for semantic search over documents. The chatbot allows users to ask natural language questions and get relevant answers from a collection of documents. The chatbot uses Streamlit for web and chatbot interface, LangChain, and leverages various types of vector databases, such as Pinecone, Chroma, and Azure Cognitive Search’s Vector Search, to perform efficient and accurate similarity search. The code is written in Python and can be easily modified to suit different use cases and data sources.
Please also check out my story in Medium (Streamlit and Vector Databases: A Guide to Creating Interactive Web Apps for Semantic Search over Documents) for more detail sharing.
- preprocess_pinecone.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Pinecone vector database.
- preprocess_chroma.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Chroma vector database.
- preprocess_acs.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Azure Cognitive Search vector database.
- consume_pinecone.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Pinecone vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
- consume_chroma.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Chroma vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
- consume_acs.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Azure Cognitive Search vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
- app_pinecone.py <-- Example of using Streamlit, LangChain, and Pinecone vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.
- app_chroma.py <-- Example of using Streamlit, LangChain, and Chroma vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.
- app_acs.py <-- Example of using Streamlit, LangChain, and Azure Cognitive Search vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.
To run this Streamlit web app
streamlit run app_pinecone.py
High-level architecture and flow of this Semantic Search over Documents demo
Enjoy!