Fix GRU to match pytorch (#2701). #2704

nwhitehead · 2025-01-15T19:02:35Z

Pull Request Template

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

This addresses issue #2701

Changes

Update GRU implementation of "new" gate to match pytorch implementation. This can change numerical output in some cases.
Add GRU unit test with sequence length > 1.
Fix GRU input state dimensions and hidden state handling. This is an API change since the dimensions of the optional hidden state input are being corrected to the right sizes.

These changes do affect numerical results and change the API slightly. I think just updating to the correct API dimensions seems like the best thing since the previous implementation was incorrect, not just different than pytorch.

Testing

These changes were tested with a small unit test. For this test the correct values were computed manually using the equations for GRU.

I tested these changes against PyTorch. The weights and biases from PyTorch were saved then split into sections using a custom script (to split apart the weights for each gate). Input and output tensors were separately saved and then loaded into a test rust program. Everything was randomly initialized. With this PR the results from burn and torch were almost identical (within 6 decimal digits). I tried input sizes of 1, 2, and 8. I tried hidden sizes of 1, 2, and 8. I tried sequence lengths of 1, 2, and 3.

Update GRU implementation of new gate to match pytorch implementation. This can change numerical output in some cases. Add GRU unit test with sequence length > 1. Fix GRU input state dimensions and hidden state handling. This is an API change since the dimensions of the optional hidden state input are being corrected to the right sizes. Just updating to the correct dimensions seems like the best thing since the previous implementation was incorrect, not just different than pytorch.

codecov · 2025-01-15T19:53:17Z

Codecov Report

Attention: Patch coverage is 91.57895% with 8 lines in your changes missing coverage. Please review.

Project coverage is 83.20%. Comparing base (f630b3b) to head (cc4766f).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-core/src/nn/rnn/gru.rs	91.57%	8 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2704   +/-   ##
=======================================
  Coverage   83.20%   83.20%           
=======================================
  Files         819      819           
  Lines      106814   106866   +52     
=======================================
+ Hits        88870    88916   +46     
- Misses      17944    17950    +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

laggui

Thanks for bringing this up and tackling it yourself 🙏

Fun fact, the reset gate changes don't originate from pytorch 😄

The implementation we had is based on the latest v3 revisions (published at EMNLP) and has reset gate applied to hidden state before matrix multiplication.

The changes in your PR are based on the original v1 and applies the reset gate after.

Curiously, pytorch notes efficiency for their differing implementation (without much explanation). If you're interested, check out this awesome explanation behind the motivation to move the reset gate.

Your implementation LGTM! But what do you think about supporting both via a config? And we could provide the references I just linked in the doc.

crates/burn-core/src/nn/rnn/gru.rs

laggui

Thank you 🙏

I just had to fix the clippy doc issue and covered both versions in the tests.

laggui reviewed Jan 15, 2025

View reviewed changes

crates/burn-core/src/nn/rnn/gru.rs Outdated Show resolved Hide resolved

nwhitehead and others added 2 commits January 15, 2025 17:14

Add GruConfig option reset_after to allow both reset behaviors.

95a7eaf

Fix clippy and keep previous test

cc4766f

laggui approved these changes Jan 16, 2025

View reviewed changes

laggui merged commit 9daf048 into tracel-ai:main Jan 16, 2025
11 checks passed

nwhitehead deleted the dev-fixgru branch January 16, 2025 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GRU to match pytorch (#2701). #2704

Fix GRU to match pytorch (#2701). #2704

nwhitehead commented Jan 15, 2025

codecov bot commented Jan 15, 2025 •

edited

Loading

laggui left a comment

laggui left a comment •

edited

Loading

Fix GRU to match pytorch (#2701). #2704

Fix GRU to match pytorch (#2701). #2704

Conversation

nwhitehead commented Jan 15, 2025

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

codecov bot commented Jan 15, 2025 • edited Loading

Codecov Report

laggui left a comment

Choose a reason for hiding this comment

laggui left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jan 15, 2025 •

edited

Loading

laggui left a comment •

edited

Loading