- Menlo Park
- sanyambhutani.com
- @bhutanisanyam1
Highlights
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Scalable toolkit for efficient model alignment
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Inference and training library for high-quality TTS models.
Fast and accurate automatic speech recognition (ASR) for edge devices
Accessible large language models via k-bit quantization for PyTorch.
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
PyTorch native quantization and sparsity for training and inference
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Serve, optimize and scale PyTorch models in production
✨✨Latest Advances on Multimodal Large Language Models
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Composable building blocks to build Llama Apps
Agentic components of the Llama Stack APIs
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Stable Diffusion web UI
Public code of Dr. Ivan Reznikov used in posts, articles, conferences
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)