This repository contains the code, data, and analysis for our study [link later] on advanced Retrieval-Augmented Generation (RAG) techniques. It's part of our scientific paper investigating the efficacy of various RAG techniques in enhancing the precision and contextual relevance of LLMs.
eval_questions/
: Contains a JSON file with 107 QA pairs used in the evaluation.papers_for_questions/
: Holds a collection of AI-ArXiv papers that were utilized for creating the 107 QA pairs.resources/
: Includes essential resources like the prompt template and configuration files. Note: Actual config files need API keys and other settings to be filled out.main.py
: The main script where experiments are defined and executed.res_analysis.ipynb
: A Jupyter notebook for in-depth analysis of the final experimental results.utils.py
: Helper functions supporting various operations within the repository.vector_db.py
: Scripts for setting up different vector databases, such as Classic VDB, Sentence-window, and Document Summary.final_results.xlsx
: Spreadsheet containing the final results from our experiments, shared for transparency and scientific verification.
To replicate our experiments or to analyze our results, please ensure to fill in the necessary API keys and other configurations by creating a .env
file (see .sample.env
) - the .env
is ignored in .gitignore for security.
Setup the python environment using either venv
or pyenv
or your favourite python environment amanger. Call the environment aragog
or anything you like.
python3 -m venv aragog
and activate it usingsource venv/bin/activate
(Mac/Linux) orvenv\Scripts\activate
(Windows).- OR
pyenv
withpyenv virtualenv 3.12 aragog
, then activate withpyenv local aragog
.
Then run pip install -r requirements.txt
to install all necessary dependencies.
The res_analysis.ipynb
notebook provides a detailed examination of the experimental results stored in final_results.xlsx
.
To set up vector databases for experiments, run the vector_db.py
script. Subsequently, execute main.py
to perform the experiments. Post-experimentation, use res_analysis.ipynb
for analyzing the results. Helper functions in utils.py
are employed across scripts to streamline processes.
Contributions are welcome. For any changes or enhancements, please open an issue first to discuss what you would like to change.
This project is open-source and available under the MIT License.