Commit 85ec396
Enhance TrainPipelineSparseDist logging to help differentiate data loading patterns in train pipeline (#3350)
Summary:
Pull Request resolved: #3350
Observed inconsistent data loading behaviors in APS train_module_train_step. Expected 3 batches loaded on first invocation of train loop, but sometimes only 1 batch loading shows in trace ([link](https://www.internalfb.com/intern/sbdive/?id=tree%2Fttfb%2Fttfb_ai_lab_APS_mtml_ctr_cmf_rc1_baseline_gpu-f788555024-fbd033a0-89b2-4b72-b540-346901657b25-treatment-1&bucket=sbdive)) despite increasing trace frequency from 500ms to 50ms. Added logs to differentiate data loading patterns.
Perf impact: logs are added only when data loader is exhausted
Reviewed By: andywag
Differential Revision: D81418443
fbshipit-source-id: 98ccb5cb480bf31572e99b9796cf375ef676d1251 parent 60f7f87 commit 85ec396
1 file changed
+17
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
446 | 446 | | |
447 | 447 | | |
448 | 448 | | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
449 | 454 | | |
450 | 455 | | |
451 | 456 | | |
| |||
624 | 629 | | |
625 | 630 | | |
626 | 631 | | |
| 632 | + | |
627 | 633 | | |
628 | 634 | | |
629 | 635 | | |
| |||
637 | 643 | | |
638 | 644 | | |
639 | 645 | | |
| 646 | + | |
640 | 647 | | |
641 | 648 | | |
642 | 649 | | |
| |||
801 | 808 | | |
802 | 809 | | |
803 | 810 | | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
804 | 814 | | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
805 | 819 | | |
806 | 820 | | |
807 | 821 | | |
| |||
820 | 834 | | |
821 | 835 | | |
822 | 836 | | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
823 | 840 | | |
824 | 841 | | |
825 | 842 | | |
| |||
0 commit comments