Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault:11 #336

Open
dongzhenguo2016 opened this issue Jun 17, 2020 · 2 comments
Open

Segmentation fault:11 #336

dongzhenguo2016 opened this issue Jun 17, 2020 · 2 comments

Comments

@dongzhenguo2016
Copy link

`06-16 23:55:24 Epoch[0] Batch [3590] Iter: 3590/26046 Lr: 0.00500 Speed: 9.42 samples/sec Train-RpnAcc=0.997272, RpnL1=0.165742, RcnnAcc_1st=0.985713, RcnnL1_1st=0.604444, RcnnAcc_2nd=0.986624, RcnnL1_2nd=1.236113, RcnnAcc_3rd=0.984117, RcnnL1_3rd=1.859310,
06-16 23:55:28 Epoch[0] Batch [3600] Iter: 3600/26046 Lr: 0.00500 Speed: 9.50 samples/sec Train-RpnAcc=0.997278, RpnL1=0.165552, RcnnAcc_1st=0.985734, RcnnL1_1st=0.603507, RcnnAcc_2nd=0.986646, RcnnL1_2nd=1.234198, RcnnAcc_3rd=0.984152, RcnnL1_3rd=1.856836,

Segmentation fault: 11`
I recently encountered the same error while training cascade_r101v1_fpn_1x, how can I solve it? Feel so strange.
My platform is ubuntu 16.04
maxnet-cu100 1.6.0

@RogerChern
Copy link
Collaborator

RogerChern commented Jun 17, 2020 via email

@dongzhenguo2016
Copy link
Author

The proposal operator has some problems when handling invalid input, which leads to a segment fault when the input contains NaN. This means your Cascade R-CNN heads or the RPN head has blown up. You can try to lower the learning for your task.

On Wed, Jun 17, 2020 at 10:21 AM dongzhenguo2016 @.***> wrote: 06-16 23:55:24 Epoch[0] Batch [3590] Iter: 3590/26046 Lr: 0.00500 Speed: 9.42 samples/sec Train-RpnAcc=0.997272, RpnL1=0.165742, RcnnAcc_1st=0.985713, RcnnL1_1st=0.604444, RcnnAcc_2nd=0.986624, RcnnL1_2nd=1.236113, RcnnAcc_3rd=0.984117, RcnnL1_3rd=1.859310, 06-16 23:55:28 Epoch[0] Batch [3600] Iter: 3600/26046 Lr: 0.00500 Speed: 9.50 samples/sec Train-RpnAcc=0.997278, RpnL1=0.165552, RcnnAcc_1st=0.985734, RcnnL1_1st=0.603507, RcnnAcc_2nd=0.986646, RcnnL1_2nd=1.234198, RcnnAcc_3rd=0.984152, RcnnL1_3rd=1.856836, Segmentation fault: 11 I recently encountered the same error while training cascade_r101v1_fpn_1x, how can I solve it? Feel so strange. My platform is ubuntu 16.04 maxnet-cu100 1.6.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#336>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODH7XMRMW2K2YDBPUCGTRXASD7ANCNFSM4OAFBOJA .

Yes, reducing the learning rate can indeed solve this problem. But after adjusting the learning rate from 0.01 to 0.001, I found that mAP dropped by 1 point. This is not the result I want. Therefore, I think that the local optimal solution obtained after the learning rate is reduced is not as good as the local optimal solution obtained when the previous learning rate is large.
Below is my code after adjusting the learning rate:
class OptimizeParam: class optimizer: type = "sgd" lr = 0.001 / 8 * len(KvstoreParam.gpus) * KvstoreParam.batch_image momentum = 0.9 wd = 0.0001 clip_gradient = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants