Replies: 2 comments
-
Flash attention added a deterministic flag since v2.4. For FA version >= 2.4, |
Beta Was this translation helpful? Give feedback.
0 replies
-
Is nccl algo deterministic? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue Description:
I read the information about reproducibility, which mentions using
--deterministic-mode
by settingNCCL_ALGO
,NVTE_ALLOW_NONDETERMINISTIC_ALGO=0
, and not using--use-flash-attn
to achieve deterministic training.I tested Megatron with dual-node (TP=2, PP=2) setups using eight A800 GPUs each, training for 50 iterations. I used this configuration for multiple runs and checked whether the saved models were identical each time (comparing parameters one by one). I found that setting
NVTE_ALLOW_NONDETERMINISTIC_ALGO=0
alone ensured identical model parameters across runs. It seems only this setting matters for reproducibility in my tests. Conversely, not setting this environment variable resulted in different model parameters being saved after each run.Questions:
NCCL_ALGO
and--use-flash-attn
cause non-deterministic training results?NCCL_ALGO
defaults to None. In this case, how does NCCL choose the algorithm, and how can I know which algorithm is being selected?Environment Details:
NVTE_ALLOW_NONDETERMINISTIC_ALGO=0
Thank you for your assistance.
Beta Was this translation helpful? Give feedback.
All reactions