Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

Open
sean-xiang-applovin opened this issue Oct 12, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@sean-xiang-applovin
Copy link

Bug Description

torch.ops.aten.remainder.Scalar seems to return fmod result when input number is big

To Reproduce

save it and run the script below

import torch
import torch.nn as nn

a = torch.tensor([[5950571286963681280]]).cuda()
example_args = (a,)


class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()

    def forward(self, x):
        return torch.remainder(x, 196613)


model = ToyModel().eval().cuda()

with torch.no_grad():
    ep = torch.export.export(model, args=example_args)

from torch_tensorrt.dynamo._compiler import compile as dynamo_compile
from torch_tensorrt import logging as ts_logging

with ts_logging.debug():
    compiled = dynamo_compile(
        exported_program=ep,
        disable_tf32=True,
        inputs=example_args,
        min_block_size=1,
        debug=True,
    )

with torch.no_grad():
    print(compiled(*example_args))

Expected behavior

expected to return result like

tensor([[75722]], device='cuda:0')

however, the printed result is

tensor([[-120891]], device='cuda:0')

my full execution log is
remainder_error.log

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 10.1.0
  • PyTorch Version (e.g. 1.0): 2.4.1+cu124
  • CPU Architecture: x86_64
  • OS (e.g., Linux): linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.11.9
  • CUDA version: 12.6
  • GPU models and configuration: nvidia L4
  • Any other relevant information:

Additional context

@sean-xiang-applovin sean-xiang-applovin added the bug Something isn't working label Oct 12, 2024
@sean-xiang-applovin
Copy link
Author

sean-xiang-applovin commented Oct 12, 2024

BTW,

  1. the converted version of torch.ops.aten.remainder.Scalar seems not even as fast as original ops.
  2. it seems torch.ops.aten.remainder.Scalar works with int that is not that big. Not sure if this is caused by int64

@apbose
Copy link
Collaborator

apbose commented Oct 22, 2024

Thanks for pointing this out.
I looked into this a bit.
TRT does not support fmod operation directly. So in torchTRT we implement it as
fmod(fmod(dividend, divisor) + divisor)
and fmod in turn is sub(dividend, prod(trunc_div(dividend, divisor), divisor))

Generally dividend > prod(trunc_div(dividend, divisor), divisor)

But in large integers trunc_div(dividend, divisor) in this case results in 30265401409536 (should be 30,265,401,000,766) which results in prod(trunc_div(dividend, divisor), divisor) > dividend and results in the negative number.
As you said, 5950571286963681280 falls in the signed int64 range, so I am not sure why TRT is returning reduced precision. I can get it clarified more from TRT team. It must be loss of accuracy in computation. Please note that float32 would also lead in accuracy loss.

@sean-xiang-applovin
Copy link
Author

Thanks @apbose for your help. I have tried to export this graph to onnx and compile it with trtexec, it seems the same issue. The result I get by this way is -80369420288,

I have attached my exported onnx in this scalar.zip

What is the suggested way to deal with these big numbers, do you have any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants