Not sure if a bug or not... #37
jamestwebber
started this conversation in
General
Replies: 2 comments 3 replies
-
Thanks for raising this! I misinterpreted the issue yesterday, sorry about that. I've fixed the fan-in. Do let me know if these implementations match. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I don't think the weight shape is same as the Jax. See https://github.com/rwightman/pytorch-image-models/blob/4ea593196414684d2074cbb81d762f3847738484/timm/models/layers/std_conv.py#L79. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I opened #35 because I thought I spotted a bug in the weight standardization code but @vballoli says it's fine, so I'm opening a discussion (which I never knew existed!) to figure it out.
The code reads
This was suspicious to me as
shape[0:]
is just making a needless copy ofshape
, and calculatesfan_in
as the size of the entire tensor.The code in the deepmind repository here reads
shape[:-1]
, which meansfan_in
is the product over all but the last dimension, which makes more sense to me.Maybe I am missing a
pytorch
vsjax
implementation difference? What's the reason for the discrepancy?Beta Was this translation helpful? Give feedback.
All reactions