Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fix moe benchmark when bs*seq is small #3382

Conversation

yiakwy-xpu-ml-framework-team
Copy link
Contributor

Motivation

In "benchmark_deepseekv3_moe_align_blocks.py" change block_size 3 and run calculate_diff(batch_size=1, seq_len=4). The test will fail.

This is because expert_ids for cuda is not zero initialized and contain junk data.

Modifications

After fix, the script can run smoothly for

  • calculate_diff(batch_size=4, seq_len=1024)
  • calculate_diff(batch_size=1, seq_len=4)
截屏2025-02-08 06 29 59

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team changed the title fix moe benchmark when bs*seq is small [BUG] fix moe benchmark when bs*seq is small Feb 7, 2025
@zhyncs zhyncs enabled auto-merge (squash) February 8, 2025 07:37
@zhyncs zhyncs disabled auto-merge February 8, 2025 07:37
@zhyncs zhyncs merged commit 64480df into sgl-project:main Feb 8, 2025
1 check passed
@BBuf BBuf mentioned this pull request Feb 8, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants