Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fused_amax_and_scale_update_after_reduction(): incompatible function arguments. The following argument types are supported: #1275

Open
cassanof opened this issue Oct 20, 2024 · 2 comments
Assignees

Comments

@cassanof
Copy link

cassanof commented Oct 20, 2024

Currently getting the following error on a simple forward with a transformer model when using DelayedScaling:

110882 [rank0]:     with te.fp8_autocast(enabled=True, fp8_recipe=self.te_fp8_recipe):
110883 [rank0]:   File "/home/federico/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 144, in __exit__
110884 [rank0]:     next(self.gen)
110885 [rank0]:   File "/mnt/large_shared/federico/env/lib/python3.11/site-packages/transformer_engine/pytorch/fp8.py", line 581, in fp8_autocast
110886 [rank0]:     FP8GlobalStateManager.fp8_autocast_exit(enabled, _graph=_graph)
110887 [rank0]:   File "/mnt/large_shared/federico/env/lib/python3.11/site-packages/transformer_engine/pytorch/fp8.py", line 435, in fp8_autocast_exit
110888 [rank0]:     cls.reduce_and_update_fp8_tensors(forward=True, fp8_weights=False)
110889 [rank0]:   File "/mnt/large_shared/federico/env/lib/python3.11/site-packages/transformer_engine/pytorch/fp8.py", line 365, in reduce_and_update_fp8_tensors
110890 [rank0]:     tex.fused_amax_and_scale_update_after_reduction(
110891 [rank0]: TypeError: fused_amax_and_scale_update_after_reduction(): incompatible function arguments. The following argument types are supported:
110892 [rank0]:     1. (arg0: torch.Tensor, arg1: list[torch.Tensor], arg2: list[torch.Tensor], arg3: list[torch.Tensor], arg4: str, arg5: transformer_engine::DType, arg6:
       float) -> None
110893
110894 [rank0]: Invoked with: tensor([2.3625e+01, 3.7500e-01, 0.0000e+00, 2.3625e+01, 4.2188e-01, 0.0000e+00,
110895 [rank0]:         3.0000e+00, 3.9648e-01, 0.0000e+00, 9.2578e-01, 3.4570e-01, 0.0000e+00,
110896 [rank0]:         2.7188e+00, 3.6328e-01, 0.0000e+00, 2.7188e+00, 5.4688e-01, 0.0000e+00,
110897 [rank0]:         5.2188e+00, 5.1172e-01, 0.0000e+00, 1.1600e+02, 3.3594e-01, 0.0000e+00,
110898 [rank0]:         1.1600e+02, 9.4922e-01, 0.0000e+00, 2.7656e+00, 3.8477e-01, 0.0000e+00,
110899 [rank0]:         7.3438e-01, 2.9492e-01, 0.0000e+00, 1.6750e+01, 6.2109e-01, 0.0000e+00,
110900 [rank0]:         1.6750e+01, 4.7461e-01, 0.0000e+00, 1.6750e+01, 2.6367e-01, 0.0000e+00,
110901 [rank0]:         2.2188e+00, 5.1953e-01, 0.0000e+00, 4.4000e+01, 3.3203e-01, 0.0000e+00,
110902 [rank0]:         4.4000e+01, 6.6797e-01, 0.0000e+00, 1.8828e+00, 4.1211e-01, 0.0000e+00,
110903 [rank0]:         8.1250e-01, 3.9453e-01, 0.0000e+00, 1.8750e+01, 6.2109e-01, 0.0000e+00,
.... omitted rest, many tensors printed out ....

The recipe is quite simple: te_recipe.DelayedScaling(te_recipe.Format.HYBRID, amax_history_len=64, amax_compute_algo="max"). If I omit the recipe from the autocast context the forward works as expected.

Any ideas?

@ksivaman ksivaman self-assigned this Oct 24, 2024
@ksivaman
Copy link
Member

@cassanof Do you have a script that replicates this error? I'm not able to reproduce it with the same recipe. If not, could you give a more detailed stack trace with the argument types to tex.fused_amax_and_scale_update_after_reduction?

@cassanof
Copy link
Author

Hi! unfortunately i cannot share, and wasn't able to repro with some of the open models. The arguments are a long list of different tensors.

At the end, i was able to get amax scaling to work by completely disabling the fused kernel in your code and using the non-fused instead. This is obviously undesired though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants