-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在win10系统下使用PaddleInference2.5编译ppyoloe_crn_l,出现如下问题, 请问如何解决? #519
Comments
你好,感谢您的反馈,从上面的报错暂时看不出原因。你可以设置如下flag: 另外,也可以参考 https://www.paddlepaddle.org.cn/inference/master/guides/performance_tuning/precision_tracing.html 确认是否为某个pass的问题。 最后,可以尝试更换到2.6版本,确认问题是否存在。 如果你尝试了以上办法,请务必将运行结果贴在这里,这对我们后续的分析定位会很有帮助。 |
win10系统下, 在cmd设置 set FLAGS_call_stack_level = 2 Code: #include #include <gflags/gflags.h> #include "paddle_inference_api.h" using paddle_infer::Config; DEFINE_string(model_dir, "", "Directory of the inference model."); using Time = decltype(std::chrono::high_resolution_clock::now()); std::shared_ptr InitPredictor()
} void run(Predictor* predictor,const std::vector& input,const std::vector& input_shape,std::vector* out_data)
} int main(int argc, char* argv[])
} Problem: C++ Traceback (most recent call last):Not support stack backtrace yet. Error Message Summary:InvalidArgumentError: The axis is expected to be in range of [1, -1), but got 1 |
你好,现在看来应该是内部问题,有一个输入没有被初始化,我们会尽快提PR修复,合入后会及时在PR里同步。 |
已经修复了 |
你好, 我昨天也调试了一下. 挨个观察, 发现是run_mode的设置问题. 默认是paddle_gpu, 改成trt_xxx就可以了. |
另外请教一个问题. 使用TensorRT的情况下, 加载模型需要时间太长. 有没有办法在几秒或者几十秒内完成? |
原生gpu改了应该也要正确,确实是个bug
发自我的iPhone
…------------------ 原始邮件 ------------------
发件人: zhaoke ***@***.***>
发送时间: 2024年5月8日 08:13
收件人: PaddlePaddle/Paddle-Inference-Demo ***@***.***>
抄送: lizexu123 ***@***.***>, Comment ***@***.***>
主题: Re: [PaddlePaddle/Paddle-Inference-Demo] 在win10系统下使用PaddleInference2.5编译ppyoloe_crn_l,出现如下问题, 请问如何解决? (Issue #519)
你好, 我昨天也调试了一下. 挨个观察, 发现是run_mode的设置问题. 默认是paddle_gpu, 改成trt_xxx就可以了.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
理论上所有run_mode应该都能跑通,现在我们已经修复了原生GPU下的问题,可以尝试下原生GPU。使用 TRT加载时间太长可能跟TRT的图优化过程有关,可以分享下你现在具体加载的用时。 |
W0508 15:06:45.928721 8180 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! 中间2行: 15:06:48 到 15:09:45 时间用了将近3分钟. 另外请问 原生GPU , 是指你们已经更新了SDK包吗? 在哪里下载? 可以给出链接吗? |
原生GPU指你一开始的配置,即不使用TRT。更新的话主要更新的是本仓库,你拉下这个仓库的最新commit就行,或者手动仿照 #520 更新下 关于TRT的时间问题,我会反馈给相关同事,短期内可能没法修复。 |
ppyoloe_crn_l.exe --model_file ppyoloe_crn_l_300e_coco/model.pdmodel --params_file ppyoloe_crn_l_300e_coco/model.pdiparams
D:\000-AI\paddle\Deploy\2.5\Paddle-Inference-Demo-master\c++\gpu\ppyoloe_crn_l\build\Release>ppyoloe_crn_l.exe --model_file ppyoloe_crn_l_300e_coco/model.pdmodel --params_file ppyoloe_crn_l_300e_coco/model.pdiparams
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0505 13:45:31.034536 5320 executor.cc:187] Old Executor is Running.
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [map_op_to_another_pass]e[0m
e[32m--- Running IR pass [identity_scale_op_clean_pass]e[0m
e[32m--- Running IR pass [is_test_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [delete_weight_dequant_linear_op_pass]e[0m
e[32m--- Running IR pass [constant_folding_pass]e[0m
e[32m--- Running IR pass [silu_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0505 13:45:31.640825 5320 fuse_pass_base.cc:59] --- detected 78 subgraphs
e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m
e[32m--- Running IR pass [fused_multi_transformer_encoder_pass]e[0m
e[32m--- Running IR pass [fused_multi_transformer_decoder_pass]e[0m
e[32m--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]e[0m
e[32m--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]e[0m
e[32m--- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass]e[0m
e[32m--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]e[0m
e[32m--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]e[0m
e[32m--- Running IR pass [fuse_multi_transformer_layer_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m
e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [fc_elementwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
I0505 13:45:34.308128 5320 fuse_pass_base.cc:59] --- detected 9 subgraphs
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0505 13:45:34.541483 5320 fuse_pass_base.cc:59] --- detected 118 subgraphs
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[32m--- Running IR pass [conv2d_fusion_layout_transfer_pass]e[0m
e[32m--- Running IR pass [transfer_layout_elim_pass]e[0m
e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m
e[32m--- Running IR pass [inplace_op_var_pass]e[0m
e[1me[35m--- Running analysis [save_optimized_model_pass]e[0m
W0505 13:45:34.565424 5320 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0505 13:45:34.566417 5320 ir_params_sync_among_devices_pass.cc:51] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0505 13:45:34.726987 5320 memory_optimize_pass.cc:222] Cluster name : tmp_2 size: 26214400
I0505 13:45:34.726987 5320 memory_optimize_pass.cc:222] Cluster name : batch_norm_2.tmp_2 size: 26214400
I0505 13:45:34.726987 5320 memory_optimize_pass.cc:222] Cluster name : image size: 4915200
I0505 13:45:34.727985 5320 memory_optimize_pass.cc:222] Cluster name : sigmoid_2.tmp_0 size: 26214400
I0505 13:45:34.727985 5320 memory_optimize_pass.cc:222] Cluster name : batch_norm_48.tmp_2 size: 1228800
I0505 13:45:34.727985 5320 memory_optimize_pass.cc:222] Cluster name : tmp_0 size: 13107200
I0505 13:45:34.728982 5320 memory_optimize_pass.cc:222] Cluster name : elementwise_add_0 size: 4915200
I0505 13:45:34.728982 5320 memory_optimize_pass.cc:222] Cluster name : tmp_7 size: 4915200
I0505 13:45:34.728982 5320 memory_optimize_pass.cc:222] Cluster name : elementwise_add_16 size: 614400
I0505 13:45:34.728982 5320 memory_optimize_pass.cc:222] Cluster name : pool2d_5.tmp_0 size: 768
I0505 13:45:34.729979 5320 memory_optimize_pass.cc:222] Cluster name : scale_factor size: 8
I0505 13:45:34.729979 5320 memory_optimize_pass.cc:222] Cluster name : shape_2.tmp_0_slice_0 size: 4
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0505 13:45:34.970336 5320 analysis_predictor.cc:1660] ======= optimize end =======
I0505 13:45:34.971334 5320 naive_executor.cc:164] --- skip [feed], feed -> scale_factor
I0505 13:45:34.973328 5320 naive_executor.cc:164] --- skip [feed], feed -> image
I0505 13:45:34.984299 5320 naive_executor.cc:164] --- skip [gather_nd_0.tmp_0], fetch -> fetch
I0505 13:45:34.984299 5320 naive_executor.cc:164] --- skip [multiclass_nms3_0.tmp_2], fetch -> fetch
W0505 13:45:34.987293 5320 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.2, Runtime API Version: 11.8
W0505 13:45:34.991281 5320 gpu_resources.cc:149] device: 0, cuDNN Version: 8.6.
C++ Traceback (most recent call last):
Not support stack backtrace yet.
Error Message Summary:
InvalidArgumentError: The axis is expected to be in range of [-1, 1), but got 1
[Hint: Expected axis_value >= -rank && axis_value < rank == true, but received axis_value >= -rank && axis_value < rank:0 != true:1.] (at ..\paddle\phi\infermeta\unary.cc:3567)
The text was updated successfully, but these errors were encountered: