-
Alibaba Cloud
Stars
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
A flexible and efficient training framework for large-scale alignment tasks
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
PyTorch distributed training acceleration framework
Efficient and easy multi-instance LLM serving
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
Fast and memory-efficient exact attention
Research and development for optimizing transformers
Development repository for the Triton language and compiler
A C++ standalone library for machine learning
A machine learning compiler for GPUs, CPUs, and ML accelerators
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
A framework for large scale recommendation algorithms.
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
FastNN provides distributed training examples that use EPL.
An Industrial Graph Neural Network Framework
GPU-scheduler-for-deep-learning
Open source platform for the machine learning lifecycle