Replies: 1 comment
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
The RotaryEmbedding of Megatron is of nn.Module type.
If the entire model is cast to another type by to(torch.bfloat16), the data type of inv_freq will change accordingly.
However, maintaining float32 in subsequent sin/cos calculations seems to be a wise choice.
Is this a potential precision issue that could lead to unnecessary calculation errors?
Beta Was this translation helpful? Give feedback.
All reactions