Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3rdparty, document] Updated Documentation that for triton fused_moe kernel tuning for AMD Instinct GPUs #2191

Merged
merged 9 commits into from
Nov 27, 2024
14 changes: 14 additions & 0 deletions 3rdparty/amd/tuning/TUNING.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,20 @@ TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 TORCHINDU

#Inference with large improvement on AMD GPU
TORCHINDUCTOR_FREEZING=1 your_script.sh
```
## 4. Fused MOE kernel
To maximize moe kernel efficiency, need to use below scripts to find out the best launch configuration

### Key parameters:
- **--model**: what moe model type to do tuning, it will automatically decide the size of d_model, model_intermediate_size, num_layers
- **--tp-size**: simulate the whole model run configuration to set the dimension size using tp correctly
- **--batch**: M dimension size of moe kernel, for prefill moe kernel the value is batch*input_len, for decode moe kernel the value is batch
- **--dtype**: computation type

```bash
#Tuning
python benchmark_moe_rocm.py --model grok1 --tp-size 8 --dtype float8 --batch "32"

kkHuang-amd marked this conversation as resolved.
Show resolved Hide resolved
```

## Reference
Expand Down
Loading
Loading