🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

sean-xiang-applovin · 2024-10-12T01:16:00Z

Bug Description

torch.ops.aten.remainder.Scalar seems to return fmod result when input number is big

To Reproduce

save it and run the script below

import torch
import torch.nn as nn

a = torch.tensor([[5950571286963681280]]).cuda()
example_args = (a,)


class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()

    def forward(self, x):
        return torch.remainder(x, 196613)


model = ToyModel().eval().cuda()

with torch.no_grad():
    ep = torch.export.export(model, args=example_args)

from torch_tensorrt.dynamo._compiler import compile as dynamo_compile
from torch_tensorrt import logging as ts_logging

with ts_logging.debug():
    compiled = dynamo_compile(
        exported_program=ep,
        disable_tf32=True,
        inputs=example_args,
        min_block_size=1,
        debug=True,
    )

with torch.no_grad():
    print(compiled(*example_args))

Expected behavior

expected to return result like

tensor([[75722]], device='cuda:0')

however, the printed result is

tensor([[-120891]], device='cuda:0')

my full execution log is
remainder_error.log

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 10.1.0
PyTorch Version (e.g. 1.0): 2.4.1+cu124
CPU Architecture: x86_64
OS (e.g., Linux): linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version: 3.11.9
CUDA version: 12.6
GPU models and configuration: nvidia L4
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

sean-xiang-applovin · 2024-10-12T01:27:08Z

BTW,

the converted version of torch.ops.aten.remainder.Scalar seems not even as fast as original ops.
it seems torch.ops.aten.remainder.Scalar works with int that is not that big. Not sure if this is caused by int64

apbose · 2024-10-22T01:07:03Z

Thanks for pointing this out.
I looked into this a bit.
TRT does not support fmod operation directly. So in torchTRT we implement it as
fmod(fmod(dividend, divisor) + divisor)
and fmod in turn is sub(dividend, prod(trunc_div(dividend, divisor), divisor))

Generally dividend > prod(trunc_div(dividend, divisor), divisor)

But in large integers trunc_div(dividend, divisor) in this case results in 30265401409536 (should be 30,265,401,000,766) which results in prod(trunc_div(dividend, divisor), divisor) > dividend and results in the negative number.
As you said, 5950571286963681280 falls in the signed int64 range, so I am not sure why TRT is returning reduced precision. I can get it clarified more from TRT team. It must be loss of accuracy in computation. Please note that float32 would also lead in accuracy loss.

sean-xiang-applovin · 2024-10-24T17:03:17Z

Thanks @apbose for your help. I have tried to export this graph to onnx and compile it with trtexec, it seems the same issue. The result I get by this way is -80369420288,

I have attached my exported onnx in this scalar.zip

What is the suggested way to deal with these big numbers, do you have any suggestions?

sean-xiang-applovin added the bug Something isn't working label Oct 12, 2024

narendasan assigned apbose Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

sean-xiang-applovin commented Oct 12, 2024

sean-xiang-applovin commented Oct 12, 2024 •

edited

Loading

apbose commented Oct 22, 2024

sean-xiang-applovin commented Oct 24, 2024

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int #3230

Comments

sean-xiang-applovin commented Oct 12, 2024

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

sean-xiang-applovin commented Oct 12, 2024 • edited Loading

apbose commented Oct 22, 2024

sean-xiang-applovin commented Oct 24, 2024

sean-xiang-applovin commented Oct 12, 2024 •

edited

Loading