Skip to content
Change the repository type filter

All

    Repositories list

    • rankers

      Public
      Modular LLM ranking library for Information Retrieval and RAG. Implements state-of-the-art Pairwise, Setwise, and Listwise ranking with structured generation and specialized models (RankZephyr, RankLlama). Features efficient sorting algorithms, sliding windows, and zero-shot capabilities.
      Python
      1500Updated Dec 22, 2025Dec 22, 2025
    • LLM-Blender: Ensembling framework that maximizes LLM performance via pairwise ranking. Employs PairRanker to rank candidates and GenFuser to merge outputs, generating superior responses by combining the diverse strengths of multiple open-source models.
      Python
      33600Updated Dec 20, 2025Dec 20, 2025
    • Training code for advanced RAG techniques - Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero. Reproduces paper methodologies to fine-tune LLMs via SFT and GRPO for adaptive retrieval, corrective evaluation, query refinement, self-reflection, and agentic search behaviors.
      Python
      2500Updated Dec 13, 2025Dec 13, 2025
    • Advanced RAG Pipelines and Evaluation
      Python
      11000Updated Dec 7, 2025Dec 7, 2025
    • rrf

      Public
      Performance Evaluation of Rankers and RRF Techniques for Retrieval Pipelines: Employs Diversity, Lost-in-the-Middle, and Similarity rankers to reorder documents and maximize LLM context window performance. Implements Hybrid Retrieval with Reciprocal Rank Fusion (RRF) and rigorous BEIR evaluation (NDCG, MAP, Recall, Precision).
      Python
      1600Updated Nov 23, 2025Nov 23, 2025
    • dspy-opt

      Public
      Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.
      Python
      1600Updated Oct 31, 2025Oct 31, 2025
    • biothink

      Public
      Self-Reflective Question Answering for Biomedical Reasoning
      Python
      1500Updated Oct 14, 2025Oct 14, 2025
    • Python
      0200Updated Oct 7, 2025Oct 7, 2025
    • Pipelines for Fine-Tuning LLMs using SFT and RLHF
      Python
      2600Updated Oct 7, 2025Oct 7, 2025
    • 0000Updated Oct 2, 2025Oct 2, 2025
    • grpo

      Public
      Group Relative Policy Optimization (GRPO) Implementations
      Python
      0400Updated Sep 3, 2025Sep 3, 2025
    • prp

      Public
      Pairwise Ranking Prompting (PRP): Zero-shot LLM reranking library implementing efficient pairwise strategies (Heapsort, Sliding Window, All-Pairs). Mitigates position bias via bidirectional comparison and ensures reliability with structured Pydantic validation. Built for Haystack pipelines.
      Python
      0300Updated Jul 24, 2025Jul 24, 2025
    • MedRAG

      Public
      Python
      41000Updated Jul 16, 2025Jul 16, 2025
    • scGPT

      Public
      Jupyter Notebook
      311000Updated Jul 9, 2025Jul 9, 2025
    • BioReason

      Public
      BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
      Jupyter Notebook
      51000Updated Jun 14, 2025Jun 14, 2025
    • Dataloaders is a versatile library designed for processing and formatting datasets to support various Retrieval-Augmented Generation (RAG) pipelines, facilitating efficient evaluation and analysis.
      Python
      1400Updated May 1, 2025May 1, 2025
    • Effect of Optimizer Selection and Hyperparameter Tuning on Training Efficiency and LLM Performance
      Python
      1400Updated Apr 16, 2025Apr 16, 2025
    • vectordb

      Public
      Pipelines for Semantic Search, Metadata Filtering, Hybrid Search, Reranking, and Retrieval-Augmented Generation (RAG) on the TriviaQA, ARC, PopQA, FactScore, and Edgar datasets. These pipelines have been implemented using the Pinecone, Weaviate, Milvus, Qdrant and Chroma vector databases.
      Python
      1400Updated Jan 31, 2025Jan 31, 2025
    • .github

      Public
      0000Updated Jan 2, 2025Jan 2, 2025
    • A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
      Python
      157000Updated Oct 1, 2024Oct 1, 2024
    • Reference implementation for DPO (Direct Preference Optimization)
      Python
      233000Updated Aug 11, 2024Aug 11, 2024
    • [ISMB '24] Self-BioRAG: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
      Python
      10000Updated Apr 4, 2024Apr 4, 2024
    • GenePT

      Public
      Jupyter Notebook
      45000Updated Mar 18, 2024Mar 18, 2024
    • Python
      0000Updated Jan 30, 2023Jan 30, 2023