A starter template for building a Retrieval-Augmented Generation (RAG) system using free, local models.
This project provides a foundational RAG system that runs completely locally without requiring API keys. It's designed as a learning template for students to understand RAG architecture and experiment with features. There are many suggestions below for students to expand on this template. Students are encouraged to use LLM tools to assist them in coding. Note: This project will download Phi-2 to hard drive (~5GB). Models may be deleted from your computer hard drive after project completion.
Course: ECON 5502 — Fall 2025
The rag_system.py can be a bit slow. See rag_chatbot.py for low latency solution using Hugging Face API. The rag_chatbot_notebook.py does not use a class and may be easier for students to build upon given the notebook style they are familiar with.
The rag_system.py script includes:
- PDF Processing: Extract text from local PDF documents using PyPDF
- Embeddings: Generate vector embeddings with SentenceTransformer (all-MiniLM-L6-v2)
- Vector Storage: Fast similarity search using FAISS
- Language Model: Text generation with Microsoft Phi-2 (2.7B parameters, CPU-optimized)
- Web Interface: Interactive chat UI built with Gradio
All components run locally on your machine - no API keys or cloud services required! The cost of this is that the chatbot can have high latency. This is one of the areas for students to develop and iterate on, learning to build at pace.
Run these commands in the terminal
-
Create virtual environment (and activate it) [make sure Kernel is using this virtual environment]:
python -m venv msqe-rag
then after venv is created
source msqe-rag/bin/activate -
Install Dependencies:
pip install -r requirements.txt
-
Add Your Documents:
- Place your PDF files in the
documents/folder (intro to econometrics textbook is the sampled pdf included)
- Place your PDF files in the
-
Run the System:
python rag_system.py
-
Use the Interface (This can be removed if latency is too high and interact in terminal):
- The Gradio interface will launch in your browser
- Ask questions about your PDF documents
- View retrieved sources for each answer
msqe-rag/
├── documents/ # Put your PDF files here
│ └── your_files.pdf
├── rag_chatbot_notebook.py # Low latency solution with notebook for easy building (start here) [NO API]
├── rag_chatbot.py # Low latency solution with API (uses class)
├── rag_system.py # Main RAG implementation to
├── requirements.txt # Project python dependencies
└── README.md
This template is intentionally kept simple to facilitate learning. Here are some potential features you could add:
- Hugging Face API Integration: Replace the local Phi-2 model with more powerful models via Hugging Face's Inference API
- Access cutting-edge models like Llama 3, Mistral, or GPT variants
- Benefit: Better answer quality and reasoning capabilities
- Trade-off: Requires API key and internet connection
- Automated Document Collection: Scrape PDFs directly from websites
- Implement web scraping with libraries like BeautifulSoup, Scrapy, or requests
- Automatically download and process PDFs from government sites, research repositories, etc.
- Benefit: Scalable data collection without manual downloads
- Use case: Continuously update knowledge base with new publications
- Replace FAISS with Cloud Vector Databases: An important exercise in building scalable RAG systems
- Current limitation: FAISS stores vectors in memory, data lost when program ends
- Pinecone: Managed vector database service
- Persistent storage with automatic backups
- Handles billions of vectors with low latency
- Built-in metadata filtering and hybrid search
- Free tier available for learning
- Alternatives: Weaviate, ChromaDB, Qdrant, Milvus
- Benefits of migration:
- Data persistence across sessions
- Horizontal scaling for production workloads
- No need to rebuild index on restart
- Better performance with large document collections
- Learning outcomes: Understand production-ready vector search architecture
- Side-by-Side Chatbot Comparison: Build a dual-interface to evaluate RAG effectiveness
- Left panel: RAG-enabled chatbot (retrieval + generation)
- Right panel: Foundation model only (no retrieval context)
- Purpose: Directly compare how RAG improves responses
- Implementation approach:
- Modify Gradio interface to show two chat windows simultaneously
- Send same query to both systems
- Display responses side-by-side for comparison
- What to evaluate:
- Factual accuracy (does RAG provide correct info from documents?)
- Hallucination reduction (does foundation model make up facts?)
- Answer relevance and specificity
- Citation and source grounding
- Learning outcomes:
- Understand the value proposition of RAG systems
- Learn evaluation methodologies for LLM applications
- Develop critical thinking about model outputs
- Extension ideas:
- Add evaluation metrics (BLEU, ROUGE, or custom rubrics)
- Include human feedback buttons (thumbs up/down)
- Log comparison results for analysis
- Chunking Strategies: Implement smarter text splitting
- Semantic chunking based on paragraphs/sections
- Overlapping chunks to preserve context
- Batch Processing: Handle large document collections more efficiently
- Caching: Add response caching for frequently asked questions
- Add support for multiple document formats (Word, HTML, plain text)
- Implement citation tracking to show exact page numbers
- Add query history and session persistence
- Create evaluation metrics for answer quality
- Build a REST API for programmatic access
- Understand the RAG pipeline: Retrieval → Augmentation → Generation
- Experiment with different models and parameters
- Learn about vector embeddings and similarity search
- Practice integrating multiple ML/NLP libraries
- Develop skills in system architecture and scalability
- The current implementation prioritizes simplicity and learning over performance
- All code is intentionally straightforward to encourage understanding and modification
- Start by running the system as-is, then gradually add features
- Document your changes and learnings as you enhance the system
Template designed for hands-on learning and experimentation with RAG systems Last Updated on: 11-01-2025