Skip to content

Commit

Permalink
Begin implementing a training edge case to pickup where the training …
Browse files Browse the repository at this point in the history
…left off if interrupeted.
  • Loading branch information
jshuadvd committed Jul 7, 2024
1 parent b7d2497 commit 2fe7e9e
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,15 @@ def train(
)

# Save checkpoint
accelerator.save_state(f"checkpoint_epoch_{epoch}.pt")
accelerator.save_state(
{"epoch": epoch, "best_val_loss": best_val_loss},
f"checkpoint_epoch_{epoch}.pt",
)

# Save latest checkpoint
accelerator.save_state(
{"epoch": epoch, "best_val_loss": best_val_loss}, "checkpoint_latest.pt"
)

# Early stopping
if avg_val_loss < best_val_loss:
Expand Down

0 comments on commit 2fe7e9e

Please sign in to comment.