qingquansong

Qingquan Song qingquansong

LLM & RecSys & AutoML @ LinkedIn

50 followers · 20 following

Achievements

Pinned Loading

linkedin/Liger-Kernel Public

Efficient Triton Kernels for LLM Training

Python 4.7k 284
NVIDIA/TensorRT-LLM Public

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9.8k 1.2k
keras-team/autokeras Public

AutoML library for deep learning

Python 9.2k 1.4k
vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 42.3k 6.4k
datamllab/automl-in-action-notebooks Public

Jupyter notebooks for the code samples of the book "Automated Machine Learning in Action"

Jupyter Notebook 91 46
pytorch/ao Public

PyTorch native quantization and sparsity for training and inference

Python 1.9k 232

117 contributions in the last year

Learn how we count contributions

Less

March 2025

Created 3 commits in 1 repository

sgl-project/sglang 3 commits

Created a pull request in sgl-project/sglang that received 5 comments

Mar 18

Add deepseek style fused moe group gate selection kernel

Motivation PR adapted and improved from #3191 Rewrite Macro. Extended to support all power of 2 # expert & # expert group, also all # topk_group & #…

+675 −3 lines changed • 5 comments

Opened 4 other pull requests in 1 repository

sgl-project/sglang 1 open 1 closed 2 merged

[Not ready for merge] Remove macro definition for ROCM for __shfl_xor_sync
This contribution was made on Mar 15
Add deepseek style fused moe group gate selection kernel
This contribution was made on Mar 15
Fix per token fp8 quant precision
This contribution was made on Mar 13
Add moe topk softmax templated from vllm
This contribution was made on Mar 11

Reviewed 6 pull requests in 2 repositories

sgl-project/sglang 4 pull requests

[quantization] fix channelwise conversion with scalar weight scale
This contribution was made on Mar 19
Add deepseek style fused moe group gate selection kernel
This contribution was made on Mar 18
Add deepseek style fused moe group gate selection kernel
This contribution was made on Mar 17
Add moe topk softmax templated from vllm
This contribution was made on Mar 15

linkedin/Liger-Kernel 2 pull requests

Update README.md
This contribution was made on Mar 18
Refactor chunked preference functions and distillation base class
This contribution was made on Mar 3

Created an issue in linkedin/Liger-Kernel that received 2 comments

Mar 21

Support DAPO Chunked loss

🚀 The feature, motivation and pitch ByteDance DAPO is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-…

2 comments

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qingquan Song qingquansong

Achievements

Achievements

Block or report qingquansong

Pinned Loading

117 contributions in the last year

Contribution activity

March 2025

Created a pull request in sgl-project/sglang that received 5 comments

Add deepseek style fused moe group gate selection kernel

Created an issue in linkedin/Liger-Kernel that received 2 comments

Support DAPO Chunked loss

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar
Sun
Mon
Tue
Wed
Thu
Fri
Sat