Skip to content

InternLM/AdaptiveGEMM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdaptiveGEMM

Paper Link

This repository is based on the original DeepGEMM. In addition to that, we have added two key features:
  • Adaptation to Various Lengths of Group M: We have enhanced the repository to support different lengths of group M, providing more flexibility for diverse use cases.
  • Support for Group K GEMM: We have also added support for Group K GEMM operations, expanding the functionality of the original DeepGEMM. NOTE: k in each group should be padded to 128x.

Quick start

Requirements

  • Hopper architecture GPUs, sm_90a must be supported
  • Python 3.8 or above
  • CUDA 12.3 or above
    • But we highly recommend 12.8 or above for the best performance
  • PyTorch 2.1 or above
  • CUTLASS 3.6 or above (could be cloned by Git submodule)

Development

# Submodule must be cloned
git clone --recursive [email protected]:InternLM/AdaptiveGemm.git

# Make symbolic links for third-party (CUTLASS and CuTe) include directories
python setup.py develop

# Test JIT compilation
python tests/test_jit.py

# Test all GEMM implements (normal, contiguous-grouped and masked-grouped)
python tests/test_varlen_groupm.py

Installation

python setup.py install

Citation

If you find our work useful, please cite:

@misc{su2025tmaadaptivefp8groupedgemm,
  title={TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper}, 
  author={Zhongling Su and Rong Fu and Weihan Cao and Jianfei Gao and Minxi Jin and Zhilin Pei and Hui Wang},
  year={2025},
  eprint={2508.16584},
  archivePrefix={arXiv},
  primaryClass={cs.AR},
  url={https://arxiv.org/abs/2508.16584}, 
}

Original deepgemm

To get detailed information of original deepgeem, please refer to https://github.com/deepseek-ai/DeepGEMM

About

AdaptiveGEMM: FP8 GEMM with Adaptation to Various Lengths of Group M

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 58.6%
  • Python 41.4%