fix: torch.checkpoint() incorrectly wraps single forward step in original codebase. #274

COLAZERO2 · 2025-08-06T12:29:11Z

Bug Fixes:
Fixes bugs that caused the loss to remain high due to unstable gradients when training with gradient checkpointing enabled. After fixing, the acceptance rate increases as intended when using the gradient checkpoint memory optimization trick.

Modification:
Refactors the draft model’s forward function by separating target model hidden state retrieval and the draft model’s layer flow. Wraps the entire training-time test predictions over the drafting length, removing the torch.checkpoint() loops that previously led to a complicated computation graph and incorrect gradient flows.

This caused the loss to remain high due to unstable gradients when training with gradient checkpointing enabled. After fixing, accuracy increases as intended when using the gradient checkpoint memory optimization trick.

jasonyong · 2025-10-27T02:44:45Z

It works.

fix: torch.checkpoint() incorrectly wraps single forward step

1acf8bd

This caused the loss to remain high due to unstable gradients when training with gradient checkpointing enabled. After fixing, accuracy increases as intended when using the gradient checkpoint memory optimization trick.

KerwinKai mentioned this pull request Sep 3, 2025

Training loss does not decrease #286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: torch.checkpoint() incorrectly wraps single forward step in original codebase. #274

fix: torch.checkpoint() incorrectly wraps single forward step in original codebase. #274

Uh oh!

COLAZERO2 commented Aug 6, 2025

Uh oh!

jasonyong commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: torch.checkpoint() incorrectly wraps single forward step in original codebase. #274

Are you sure you want to change the base?

fix: torch.checkpoint() incorrectly wraps single forward step in original codebase. #274

Uh oh!

Conversation

COLAZERO2 commented Aug 6, 2025

Uh oh!

jasonyong commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants