Skip to content

JBAujogue/LLM-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Introduction

This project contains experiments on GenAI.

Getting Started

WSL setup

This project was developed on a Windows 11 os, while some components require a linux os and are thus running inside a containerized environment backed by WSL.

Install WSL and create a linux distribution following the Microsoft official doc. See also the wsl basic commands.

Docker setup

  1. Install Docker desktop or Podman on the Windows os.

    • Note: All previous versions of Docker Engine and CLI installed directly through Linux distributions must be uninstalled before installing Docker Desktop.
    • Activate the WSL integration in Docker Desktop settings following docker documentation
  2. Install nvidia-container-toolkit in your linux distro:

    • Launch a linux terminal by running the following command in a cmd (the --distribution flag being optional)
      wsl --distribution <distro-name>
    • Execute the install commands found in nvidia documentation
    • Allow docker to use nvidia Container Runtime by executing the commands found in nvidia documentation
  3. Additional tips:

    • Terminate the running linux kernel:
      (kill one kernel) wsl -t <distro-name>
      (kill all kernel) wsl --shutdown

Database setup

  1. Get qdrant latest docker image by running (in CMD or bash):
    docker pull qdrant/qdrant

Python setup

This project uses python 3.11 as core interpreter, and poetry 1.6.1 as dependency manager.

  1. Install miniconda on windows or on linux.

  2. Create a new conda environment with

    conda env create -f environment.yml
  3. Activate the environment with

    conda activate llm-playground
  4. Move to the project directory, and install the project dependencies with

    poetry install
  5. Launch a jupyter server with

    jupyter notebook

How to use it

Run Qdrant vector database

We dedicate a docker container with name qdrant-db to the vector database backend microservice.

  • Create and run new qdrant vector database service:
    wsl -e ./scripts/services/qdrant/qdrant-db.sh
  • Run existing qdrant vector database service:
    docker start qdrant-db

Run Text Generation Inference LLM service

We dedicate a docker container with name tgi-service to a TGI-based LLM backend microservice.

  • Create and run new TGI llm service: Launch docker desktop, open a cmd or shell and run

    wsl -e ./scripts/services/tgi/tgi-service.sh
  • Run existing TGI llm service: Launch docker desktop, open a cmd or shell and run

    docker start tgi-service

Learning plan

1. Inference
Framework Documentation Examples Comment
Huggingface transformers
ctransformers Github CPU-only, Unmaintained
Service Documentation Examples Comment
vLLM Github, Inference speed blog post, 2309 Official quickstart, Official list of examples, Run in WSL Linux-only
TGI: Text Generation Inference Github HF page Run with WSL & Docker, Run again with WSL & Docker, External usage, Use with OpenAI / langchain / llama-index client Linux-only
Triton Inference Server Github, pytriton Github tensorRT with TIS Linux-only
Llamafile
ollama Github ollama for Mixtral
OpenLLM Github
DeepSparse Github CPU-only, Linux-only
SDK Documentation Examples Comment
LangChain
Llama-index Github, Documentation Official list of notebooks
EmbedChain
Jan (product) Github
- Further readings -
List of LLM frameworks, AWS GenAI tutorials, Open LLM Huggingface leaderboard, MTEB leaderboard, hamelsmu llama-inference, can-ai-code-results,
*
2. Compression, Quantization, Pruning
Method Documentation Examples Paper
SparseML
Github
BitsAndBytes
HF docs
HF docs
2208
GPTQ
HF blog
Official repo notebooks
2210
AWQ: Activation-aware Weight Quantization
HF docs
notebook
2306
SqueezeLLM
2306
EXL2 Github Blog post
HQQ: Half-Quadratic Quantization Github HQQ for Mixtral
EETQ Github
ATOM 2310
*
3. Evaluation
Method Documentation Examples Github
LLM-autoeval Github
Deepeval Integration in Huggingface Trainer Github
*
4. Prompt Engineering
Method Documentation Examples Paper
Chain of thoughts
Tree of thoughts
Graph of thoughts
Prompt injection
*
5. Data Ingestion
Method Documentation Examples Paper
Retrieval Aware Fine-tuning (RAFT)
Github
2403
Automatic Data Selection in Instruction Tuning
2312
Fill-In-The-Middle (FIM) transformation
2207
*
6. Retrieval-Augmented Generation
Method Documentation Examples Paper
RAG llama-index blog, llama-index documentation 2312
self-RAG
*
7. Finetuning
Method Documentation Examples Paper
PEFT: Parameter-Efficient FineTuning
C-RLFT: Conditioned-Reinforcement Learning Fine-Tuning
LoRA: Low Ranking Adaptation
QLoRA: Quantized Low Ranking Adaptation
DPO: Direct Preference Optimization
2305
SPIN: Self-Play Finetuning
2401
ASTRAIOS: Parameter-Efficient Instruction Tuning
2401
LLAMA-pro: Progressive Learning of LLMs
GaLore: Gradient Low Rank Projection
2403
ORPO: Odds Ratio Preference Optimization
2403
DNO: Direct Nash Optimization
2404
Framework Documentation Examples Comment
TRL Github , Finetuning scripts ,
Axolotl Github , Finetuning script , Based on TRL, Multi-GPU with Accelerate
HF alignment handbook Github , Finetuning scripts , Based on TRL, Multi-GPU with Accelerate
Adapters Github , Train adapters around LLM or BERT models , Based on TRL, Multi-GPU with Accelerate
*
8. Model aggregation
Method Documentation Examples Paper
MoE: Mixture of Experts
2209
Model merging HF blog, Model merging bibliography
*
9. Agents
Method Documentation Examples Paper

Architectures

TGI-langchain-architecture
Architecture based on TGI and langchain proposed in this blog post

References

Asset references
Inference Leverage external source Multi-turn interaction Reasoning & intermediate steps Agents
βœ… Building AI Chatbots with Mistral and Llama2 πŸ”² 7 Frameworks for Serving LLMs πŸ”² A Cheat Sheet and Some Recipes For Building Advanced RAG πŸ”² Why Are Advanced RAG Methods Crucial for the Future of AI? πŸ”² Llama-lab

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published