Remove do_not_average_loss; undo Megatron loss averaging in RL code#1940
Remove do_not_average_loss; undo Megatron loss averaging in RL code#1940
Conversation
Instead of passing do_not_average_loss=True to Megatron's forward_backward_func, we now let Megatron apply its default loss averaging (output_tensor *= cp_group_size; output_tensor /= num_microbatches) and undo it in forward_step_arbitrary_loss by applying the inverse (loss *= num_microbatches / cp_size). Changes: - common.py: rename cp_normalize -> undo_megatron_loss_averaging, add num_microbatches param, replace _div_by_cp_size with _undo_megatron_loss_averaging wrapper - megatron_policy_worker.py: remove do_not_average_loss=True, pass num_microbatches to forward_step partial - test_sequence_packing_gradients.py: update call to match new signature
📝 WalkthroughWalkthroughThe pull request refactors loss normalization handling in Megatron-based RL training by replacing a boolean Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Instead of passing
do_not_average_loss=Trueto Megatron'sforward_backward_func, we now let Megatron apply its default loss averaging (output_tensor *= cp_group_size; output_tensor /= num_microbatches) and undo it inforward_step_arbitrary_lossby applying the inverse (loss *= num_microbatches / cp_size).This removes our dependency on the upstream
do_not_average_lossoption in Megatron-LM (ref: PR 2951).Changes
nemo_rl/models/megatron/common.py: Renamecp_normalize→undo_megatron_loss_averaging, addnum_microbatchesparam, replace_div_by_cp_sizewrapper with_undo_megatron_loss_averagingthat appliesloss * num_microbatches / cp_sizenemo_rl/models/policy/workers/megatron_policy_worker.py: Removedo_not_average_loss=Truefromforward_backward_funccall, passnum_microbatchestoforward_steppartialtests/unit/algorithms/test_sequence_packing_gradients.py: Update call toforward_step_arbitrary_lossto use new parameter namesSummary by CodeRabbit