-
University of Science and Technology of China
- Hefei,China
-
20:38
- 12h behind
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
This package contains the original 2012 AlexNet code.
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
DeepSeek-V3/R1 inference performance simulator
Analyze computation-communication overlap in V3/R1.
A high-throughput and memory-efficient inference and serving engine for LLMs
how to optimize some algorithm in cuda.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🤱🏻 Turn any webpage into a desktop app with Rust. 🤱🏻 利用 Rust 轻松构建轻量级多端桌面应用
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
My learning notes/codes for ML SYS.
Disaggregated serving system for Large Language Models (LLMs).
Efficient and easy multi-instance LLM serving
yinfan98 / PaddleSpeech
Forked from PaddlePaddle/PaddleSpeechEasy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
A modular graph-based Retrieval-Augmented Generation (RAG) system
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
SGLang is a fast serving framework for large language models and vision language models.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
A low-latency & high-throughput serving engine for LLMs
A tool for bandwidth measurements on NVIDIA GPUs.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Dynamic Memory Management for Serving LLMs without PagedAttention
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).