Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure In-place correctness checks work properly #273

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions src/liger_kernel/ops/cross_entropy.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,8 @@ def cross_entropy_forward(_input, target, ignore_index, label_smoothing, reducti
target = target.contiguous()

# Here we use a trick to store X_ptr gradient in X_ptr so we can save memory
# explicitly declare in-place operation is performed
_input.add_(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain a bit on what this is doing exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How torch checks whether an in-place op on a tensor would result in an incorrect gradient calculation, is by implicitly tracking the "version" of a tensor and comparing the saved version and afterward version.

When an in-place operation is performed on a tensor, the version of the tensor is incremented by 1, achieved by an internal function bump() written in C.

Since the bump() function is only called when doing "torch" in-place operation, i.e. in-place operations in "triton kernel" cannot trigger bump(), which makes torch lose track of the version and unable to raise an error.

This approach is a hint for torch, by manually performing a torch's in-place op when we do that tensor dirty in triton kernel.

Reference:
torch in-place-correctness-checks
https://pytorch.org/docs/stable/autograd.html#in-place-correctness-checks
version tracking, bump (there's a note above the function block)
https://github.com/pytorch/pytorch/blob/190e09d8b6a13f789b143f0fbd1325f924550967/c10/core/TensorImpl.h#L382

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this operation have any performance impact? cc @ByronHsu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impact is huge for add_(0). will try some other inplace ops.

liger_cross_entropy_kernel[(n_rows,)](
X_ptr=_input,
X_stride=_input.stride(-2),
Expand Down
Loading