-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize inference performance of ERNIE on P40 GPU #165
Comments
Profile结果
|
优化方案
|
Intel相关的一些工作
|
确定模型中的reshape操作是否必须的,是否可以移除。很多reshape的输入输出shape看起来是一样的。@Xreki
结论:不能直接移除 |
优化效果汇总
|
在版本1的基础上,在预测了使用了
|
方法一:用
|
实现fc的GPU kernel :PaddlePaddle/Paddle#19687
|
fc+elementwise_add+layer_norm融合
|
multi-head attention融合
|
QA测试结果
|
二期优化工作
embedding_eltwise_layernorm_fuse_pass匹配的子图模式P4的浮点计算能力 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
负责人
@Xreki @zhaoyuchen2018
初始性能
NVIDIA BERT推理解决方案Faster Transformer开源了
The text was updated successfully, but these errors were encountered: