[docs] use nn.module instead of tensor as model #3157

faaany · 2024-10-11T08:34:13Z

What does this PR do?

When running the self-contained example on CUDA or XPU, I will get the following error:

initial model weight is 0.00000
initial model weight is 0.00000
0 tensor([1., 2.], device='cuda:0')
Traceback (most recent call last):
  File "/mnt/disk4/fanlilin/workspace/test.py", line 35, in <module>
    outputs = inputs @ model
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

The reason is that accelerator.prepare() will not move the tensor model to device. We should have a model with type torch.nn.Module.

So this PR updates the example code on this.

After the update, below is the result:

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
initial model weight is 0.00000
initial model weight is 0.00000
0 tensor([1., 2.], device='cuda:0')
1 tensor([3., 4.], device='cuda:0')
2 tensor([5., 6.], device='cuda:0')
3 tensor([7., 8.], device='cuda:0')
w/ accumulation, the final model weight is 2.04000
w/o accumulation, the final model weight is 2.04000

muellerzr

Thanks for fixing!

Signed-off-by: Lin, Fanli <[email protected]>

faaany · 2024-10-15T06:51:22Z

I think the failed CI is not caused by my change. Could you pls help retrigger the CI? Thanks a lot! @muellerzr

…to doc_bug_fix

HuggingFaceDocBuilderDev · 2024-10-22T10:32:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr approved these changes Oct 11, 2024

View reviewed changes

use nn.module instead of tensor

eef9223

Signed-off-by: Lin, Fanli <[email protected]>

faaany and others added 3 commits October 21, 2024 14:03

Merge branch 'huggingface:main' into doc_bug_fix

bfb0000

fix neptune

8bedb78

Merge branch 'doc_bug_fix' of https://github.com/faaany/accelerate in…

e9def4b

…to doc_bug_fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] use nn.module instead of tensor as model #3157

[docs] use nn.module instead of tensor as model #3157

faaany commented Oct 11, 2024 •

edited

Loading

muellerzr left a comment

faaany commented Oct 15, 2024

HuggingFaceDocBuilderDev commented Oct 22, 2024

[docs] use nn.module instead of tensor as model #3157

Are you sure you want to change the base?

[docs] use nn.module instead of tensor as model #3157

Conversation

faaany commented Oct 11, 2024 • edited Loading

What does this PR do?

muellerzr left a comment

Choose a reason for hiding this comment

faaany commented Oct 15, 2024

HuggingFaceDocBuilderDev commented Oct 22, 2024

faaany commented Oct 11, 2024 •

edited

Loading