Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we have to do tracing in a real system when we try to analyze a 10,000 GPUs training system? #164

Open
basicmi opened this issue Oct 31, 2024 · 1 comment
Labels
question Further information is requested

Comments

@basicmi
Copy link

basicmi commented Oct 31, 2024

Accoring to the Astra-sim 2.0 paper, simulates based on Chakra trace, to "decouple parallelization strategies from the ASTRAsim
implementation" . Does that mean we have to trace a real 10,000 GPUs AI training system before we can do simulation and analysis of the system in that scale?

Thanks

@basicmi basicmi added the question Further information is requested label Oct 31, 2024
@191220042
Copy link

I have the same question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants