Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question FBGEMM_GPU] Adam optmizer not optimized #2824

Open
JacoCheung opened this issue Jul 11, 2024 · 8 comments
Open

[Question FBGEMM_GPU] Adam optmizer not optimized #2824

JacoCheung opened this issue Jul 11, 2024 · 8 comments

Comments

@JacoCheung
Copy link

Hi team, I'm using Adam optimizer for my model. But there is a warning regarding performance. (Can It be resolved? Or do you have any quantitative number for the perf degradation?)

[FBGEMM_GPU] NOTE: The training optimizer 'adam' is marked as
        EXPERIMENTAL and thus not optimized, in order to reduce code compilation
        times and build sizes!

I also noted that there was a discussion about the optimizer.
It seemed that adam was not considered for optimizations. I'd like to know what's the plan for Adam for today. Thanks!

@JacoCheung
Copy link
Author

Another issue is when I specify the ouput datatype as bf16, it hit me with not implemented error.

@sryap
Copy link
Contributor

sryap commented Jul 29, 2024

Hi @JacoCheung

I'm using Adam optimizer for my model. But there is a warning regarding performance. (Can It be resolved? Or do you have any quantitative number for the perf degradation?)

You can move Adam off the experimental optimizer list by setting

"is_experimental_optimizer": True,
to False. This should make it more performant

Another issue is when I specify the ouput datatype as bf16, it hit me with not implemented error.

We have enabled BF16 output for every optimizer. Could you share an error log?

@JacoCheung
Copy link
Author

Hi @sryap , thanks for your reply. I'll try this flag out.

Re BF16, it seems that the error is rasied by forward kernel (regardless of optimzer).
I was using v6.0.0. I checked changelog just now and found out it's supported since v7.0.0.

@JacoCheung
Copy link
Author

Regarding the fp16 output dtype, fbgemm does not have a scaler for backward/update. Is this intended?

@sryap
Copy link
Contributor

sryap commented Jul 29, 2024

Regarding the fp16 output dtype, fbgemm does not have a scaler for backward/update. Is this intended?

Which scalar are you referring to?

@JacoCheung
Copy link
Author

The scaler used in mxied precision training.

@sryap
Copy link
Contributor

sryap commented Aug 6, 2024

Could you please share the link to the scalar that you're referring to? Thanks

@JacoCheung
Copy link
Author

JacoCheung commented Aug 13, 2024

Sorry for my confusion. Let me clarify a little bit.

The scalar I refer to is a generic concept in mixed-precision training esp in fp16 training. In fp16 training schema, the loss is usually scaled, and so the dgrad is scaled in the bwd. There should be a unscaling process for wgrad(or dgrad).

However, fbgemm_gpu fuses update with bwd / dgrad (TBE does not have explict wgrad ). So I expect the forward() function of TBE operator to accept a scaling factor, and do the dgrad/wgrad unscaling at backward stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants