We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
实验设置:tp8pp2cp2,8卡4节点,global batch size 128 模型:llama-3.1-8b 一个iteration需要两个半小时 [2024-12-30 13:25:22] [2024-12-30 13:25:21] iteration 1/ 93 | consumed samples: 128 | elapsed time per iteration (ms): 9037407.9 | throughput per GPU (TFLOP/s/GPU): 14.6 | learning rate: 9.997433E-06 | global batch size: 128 | lm loss: 1.564099E+00 | loss scale: 1.0 | grad norm: 0.672 | number of skipped iterations: 0 | number of nan iterations: 0 | 请问这正常吗,是有什么针对长文本需要调整的参数吗?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
实验设置:tp8pp2cp2,8卡4节点,global batch size 128
模型:llama-3.1-8b
一个iteration需要两个半小时
[2024-12-30 13:25:22] [2024-12-30 13:25:21] iteration 1/ 93 | consumed samples: 128 | elapsed time per iteration (ms): 9037407.9 | throughput per GPU (TFLOP/s/GPU): 14.6 | learning rate: 9.997433E-06 | global batch size: 128 | lm loss: 1.564099E+00 | loss scale: 1.0 | grad norm: 0.672 | number of skipped iterations: 0 | number of nan iterations: 0 |
请问这正常吗,是有什么针对长文本需要调整的参数吗?
The text was updated successfully, but these errors were encountered: