You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When people actually write models in pytorch, they typically follow this structure: namely, you subclass nn.Module and define __init__ and forward. This is the structure used by the transformer code, but not by any of the pytorch code used for other models. This makes things confusing at both ends:
PytorchRNN is not how anyone would choose to write the model, so students will be confused if they don't know pytorch and misled if they do not
The students are confused by the way the transformer works (for example, many had never seen an nn.Linear module before and were trying to treat it like it was a weight matrix.
Things should clearly not be this way: we're teaching pytorch badly for the sole purpose of making the pytorch code look as much like numpy as possible. Why are we doing that? Why are we even having them implement these models in numpy? I think the original reason is that students were more familiar with numpy, but I doubt that's a good enough reason to keep two parallel implementations in different frameworks.
The text was updated successfully, but these errors were encountered:
When people actually write models in pytorch, they typically follow this structure: namely, you subclass
nn.Module
and define__init__
andforward
. This is the structure used by the transformer code, but not by any of the pytorch code used for other models. This makes things confusing at both ends:nn.Linear
module before and were trying to treat it like it was a weight matrix.Things should clearly not be this way: we're teaching pytorch badly for the sole purpose of making the pytorch code look as much like numpy as possible. Why are we doing that? Why are we even having them implement these models in numpy? I think the original reason is that students were more familiar with numpy, but I doubt that's a good enough reason to keep two parallel implementations in different frameworks.
The text was updated successfully, but these errors were encountered: