You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been implementing the Transformer architecture and learning about einsum. Following your implementation (einsum) against one without einsum I found differences in the final result. Here is the code for reproducibility:
It seems that the values aren't off, they are just transposed? I'm a newbie with einsum, and I couldn't figure it out. Hope someone can found the solution for this :)
The text was updated successfully, but these errors were encountered:
Hey Aladdin, thanks for your tutorials!
I've been implementing the Transformer architecture and learning about einsum. Following your implementation (einsum) against one without einsum I found differences in the final result. Here is the code for reproducibility:
The attention scores match perfectly, but the final attention score doesn't match. With my inputs, here is the result:
It seems that the values aren't off, they are just transposed? I'm a newbie with einsum, and I couldn't figure it out. Hope someone can found the solution for this :)
The text was updated successfully, but these errors were encountered: