Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly incorrect gradients from autodiff #8356

Open
chenzhekl opened this issue Sep 28, 2023 · 1 comment
Open

Possibly incorrect gradients from autodiff #8356

chenzhekl opened this issue Sep 28, 2023 · 1 comment
Assignees

Comments

@chenzhekl
Copy link

Describe the bug
A clear and concise description of what the bug is, ideally within 20 words.

The same algorithm of different forms produces different gradients.

To Reproduce
Please post a minimal sample code to reproduce the bug.
The developer team will put a higher priority on bugs that can be reproduced within 20 lines of code. If you want a prompt reply, please keep the sample code short and representative.

import taichi as ti
import torch

ti.init(arch=ti.cuda)


@ti.kernel
def foo(x: ti.types.ndarray(), y: ti.types.ndarray()):
    for i in x:
        # a = 0.0
        # for j in y:
        #     a += y[j]
        # x[i] += a
        for j in y:
            x[i] += y[j]


x = torch.tensor(
    [0, 0, 0, 0, 0], dtype=torch.float32, device="cuda", requires_grad=True
)
y = torch.tensor([1, 2, 3], dtype=torch.float32, device="cuda", requires_grad=True)

foo(x, y)
x.grad = torch.ones_like(x)
foo.grad(x, y)
print(x.grad, y.grad)

The above code outputs

[Taichi] version 1.7.0, llvm 15.0.4, commit aa0619fb, linux, python 3.10.12
[Taichi] Starting on arch=cuda
tensor([1., 1., 1., 1., 1.], device='cuda:0') tensor([1., 1., 1.], device='cuda:0')

while the commented-out code, which does the same thing, outputs:

[Taichi] version 1.7.0, llvm 15.0.4, commit aa0619fb, linux, python 3.10.12
[Taichi] Starting on arch=cuda
tensor([1., 1., 1., 1., 1.], device='cuda:0') tensor([5., 5., 5.], device='cuda:0')

Log/Screenshots
Please post the full log of the program (instead of just a few lines around the error message, unless the log is > 1000 lines). This will help us diagnose what's happening. For example:

$ python my_sample_code.py
[Taichi] mode=release
[Taichi] version 0.6.29, llvm 10.0.0, commit b63f6663, linux, python 3.8.3
...

Additional comments
If possible, please also consider attaching the output of command ti diagnose. This produces the detailed environment information and hopefully helps us diagnose faster.

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

@jim19930609
Copy link
Contributor

Ok the problem is that the Taichi compiler does not reject use of "struct-for" in an inner loop -- the line of for j in y in the code.

Write something like:

@ti.kernel
def foo(x: ti.types.ndarray(), y: ti.types.ndarray()):
    for i in x:
        for j in range(y.shape[0]): 
            x[i] += y[j]

Meanwhile, we should add a guard to emit an error message when struct-for is used inside the loop

@jim19930609 jim19930609 self-assigned this Oct 13, 2023
@jim19930609 jim19930609 moved this from Untriaged to Todo in Taichi Lang Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants