Segmentation fault:11 #336

dongzhenguo2016 · 2020-06-17T02:21:37Z

`06-16 23:55:24 Epoch[0] Batch [3590] Iter: 3590/26046 Lr: 0.00500 Speed: 9.42 samples/sec Train-RpnAcc=0.997272, RpnL1=0.165742, RcnnAcc_1st=0.985713, RcnnL1_1st=0.604444, RcnnAcc_2nd=0.986624, RcnnL1_2nd=1.236113, RcnnAcc_3rd=0.984117, RcnnL1_3rd=1.859310,
06-16 23:55:28 Epoch[0] Batch [3600] Iter: 3600/26046 Lr: 0.00500 Speed: 9.50 samples/sec Train-RpnAcc=0.997278, RpnL1=0.165552, RcnnAcc_1st=0.985734, RcnnL1_1st=0.603507, RcnnAcc_2nd=0.986646, RcnnL1_2nd=1.234198, RcnnAcc_3rd=0.984152, RcnnL1_3rd=1.856836,

Segmentation fault: 11`
I recently encountered the same error while training cascade_r101v1_fpn_1x, how can I solve it? Feel so strange.
My platform is ubuntu 16.04
maxnet-cu100 1.6.0

RogerChern · 2020-06-17T07:59:43Z

The proposal operator has some problems when handling invalid input, which leads to a segment fault when the input contains NaN. This means your Cascade R-CNN heads or the RPN head has blown up. You can try to lower the learning for your task.

…

On Wed, Jun 17, 2020 at 10:21 AM dongzhenguo2016 ***@***.***> wrote: `06-16 23:55:24 Epoch[0] Batch [3590] Iter: 3590/26046 Lr: 0.00500 Speed: 9.42 samples/sec Train-RpnAcc=0.997272, RpnL1=0.165742, RcnnAcc_1st=0.985713, RcnnL1_1st=0.604444, RcnnAcc_2nd=0.986624, RcnnL1_2nd=1.236113, RcnnAcc_3rd=0.984117, RcnnL1_3rd=1.859310, 06-16 23:55:28 Epoch[0] Batch [3600] Iter: 3600/26046 Lr: 0.00500 Speed: 9.50 samples/sec Train-RpnAcc=0.997278, RpnL1=0.165552, RcnnAcc_1st=0.985734, RcnnL1_1st=0.603507, RcnnAcc_2nd=0.986646, RcnnL1_2nd=1.234198, RcnnAcc_3rd=0.984152, RcnnL1_3rd=1.856836, Segmentation fault: 11` I recently encountered the same error while training cascade_r101v1_fpn_1x, how can I solve it? Feel so strange. My platform is ubuntu 16.04 maxnet-cu100 1.6.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#336>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGODH7XMRMW2K2YDBPUCGTRXASD7ANCNFSM4OAFBOJA> .

dongzhenguo2016 · 2020-06-19T03:41:19Z

The proposal operator has some problems when handling invalid input, which leads to a segment fault when the input contains NaN. This means your Cascade R-CNN heads or the RPN head has blown up. You can try to lower the learning for your task.
…
On Wed, Jun 17, 2020 at 10:21 AM dongzhenguo2016 @.***> wrote: 06-16 23:55:24 Epoch[0] Batch [3590] Iter: 3590/26046 Lr: 0.00500 Speed: 9.42 samples/sec Train-RpnAcc=0.997272, RpnL1=0.165742, RcnnAcc_1st=0.985713, RcnnL1_1st=0.604444, RcnnAcc_2nd=0.986624, RcnnL1_2nd=1.236113, RcnnAcc_3rd=0.984117, RcnnL1_3rd=1.859310, 06-16 23:55:28 Epoch[0] Batch [3600] Iter: 3600/26046 Lr: 0.00500 Speed: 9.50 samples/sec Train-RpnAcc=0.997278, RpnL1=0.165552, RcnnAcc_1st=0.985734, RcnnL1_1st=0.603507, RcnnAcc_2nd=0.986646, RcnnL1_2nd=1.234198, RcnnAcc_3rd=0.984152, RcnnL1_3rd=1.856836, Segmentation fault: 11 I recently encountered the same error while training cascade_r101v1_fpn_1x, how can I solve it? Feel so strange. My platform is ubuntu 16.04 maxnet-cu100 1.6.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#336>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODH7XMRMW2K2YDBPUCGTRXASD7ANCNFSM4OAFBOJA .

Yes, reducing the learning rate can indeed solve this problem. But after adjusting the learning rate from 0.01 to 0.001, I found that mAP dropped by 1 point. This is not the result I want. Therefore, I think that the local optimal solution obtained after the learning rate is reduced is not as good as the local optimal solution obtained when the previous learning rate is large.
Below is my code after adjusting the learning rate：
class OptimizeParam: class optimizer: type = "sgd" lr = 0.001 / 8 * len(KvstoreParam.gpus) * KvstoreParam.batch_image momentum = 0.9 wd = 0.0001 clip_gradient = None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault:11 #336

Segmentation fault:11 #336

dongzhenguo2016 commented Jun 17, 2020

RogerChern commented Jun 17, 2020 via email

dongzhenguo2016 commented Jun 19, 2020

Segmentation fault:11 #336

Segmentation fault:11 #336

Comments

dongzhenguo2016 commented Jun 17, 2020

RogerChern commented Jun 17, 2020 via email

dongzhenguo2016 commented Jun 19, 2020