Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMP to ImageNet classification and segmentation scripts + auto layout #1201

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Feb 25, 2020

Signed-off-by: Serge Panev [email protected]

@Kh4L Kh4L changed the title Add AMP to ImageNet classification script Add AMP to ImageNet classification and segmentation scripts Mar 11, 2020
@mli
Copy link
Member

mli commented Mar 11, 2020

Job PR-1201-3 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1201/3/index.html
Code coverage of this PR: pr.svg vs. Master: master.svg

@Kh4L Kh4L force-pushed the add_amp_classification branch from 21c9d60 to 52c5650 Compare March 11, 2020 03:30
@Kh4L Kh4L changed the title Add AMP to ImageNet classification and segmentation scripts Add AMP to ImageNet classification and segmentation scripts + auto layout Mar 11, 2020
@Kh4L Kh4L force-pushed the add_amp_classification branch from 52c5650 to a90357b Compare March 11, 2020 03:55
assert not opt.auto_layout or opt.amp, "--auto-layout needs to be used with --amp"

if opt.amp:
amp.init(layout_optimization=opt.auto_layout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to definition of amp.init() here, seems there is no argument like layout_optimization ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal feature, it will be added soon

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curiously, when setting both --amp and --dtype float16, what will be happening?

@Kh4L Kh4L force-pushed the add_amp_classification branch 2 times, most recently from 5682d9a to f51f405 Compare March 11, 2020 04:19
@mli
Copy link
Member

mli commented Mar 11, 2020

Job PR-1201-9 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1201/9/index.html
Code coverage of this PR: pr.svg vs. Master: master.svg

@@ -105,6 +106,10 @@ def parse_args():
help='name of training log file')
parser.add_argument('--use-gn', action='store_true',
help='whether to use group norm.')
parser.add_argument('--amp', action='store_true',
help='Use MXNet AMP for mixed precision training.')
parser.add_argument('--auto-layout', action='store_true',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an option like --target-dtype since now we not only have float16 for amp, but bfloat16. Then, we can pass target-dtype to amp.init() to enable float16/bfloat16 training for GPU and CPU respectively. Thanks.

if opt.resume_states is not '':
trainer.load_states(opt.resume_states)

if opt.amp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here may need change to if opt.amp and opt.target_dtype == 'float16':

@@ -404,8 +417,13 @@ def train(ctx):
p.astype('float32', copy=False)) for yhat, y, p in zip(outputs, label, teacher_prob)]
else:
loss = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)]
for l in loss:
l.backward()
if opt.amp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here may need change to if opt.amp and opt.target_dtype == 'float16':

@zhreshold
Copy link
Member

@@ -210,7 +216,12 @@ def __init__(self, args, logger):
v.wd_mult = 0.0

self.optimizer = gluon.Trainer(self.net.module.collect_params(), args.optimizer,
optimizer_params, kvstore=kv)
optimizer_params, update_on_kvstore=(False if args.amp else None))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why kvstore=kv is deleted? Could you add it back? Thanks.

@@ -95,6 +96,11 @@ def parse_args():
# synchronized Batch Normalization
parser.add_argument('--syncbn', action='store_true', default=False,
help='using Synchronized Cross-GPU BatchNorm')
# performance related
parser.add_argument('--amp', action='store_true',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually add default=False for arguments. Could you add it? Thank you.

@xinyu-intel
Copy link
Member

@Kh4L any update on this PR?

@zhreshold
Copy link
Member

@Kh4L Any update for this? BTW, do you have numbers for the improvement?

@Kh4L Kh4L force-pushed the add_amp_classification branch from f51f405 to f2e92a4 Compare November 20, 2020 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants