Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use a model built with the TorchSparse library in C++? #197

Open
2 tasks done
sun-sey opened this issue Mar 7, 2023 · 7 comments
Open
2 tasks done

How can I use a model built with the TorchSparse library in C++? #197

sun-sey opened this issue Mar 7, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@sun-sey
Copy link

sun-sey commented Mar 7, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Have you followed all the steps in the FAQ?

  • I have tried the steps in the FAQ.

Current Behavior

I want to build and infer in C++ a Torchsparse model trained in Python.

How do I do this?

I know that when downsampling and upsampling are mixed in Sparse Convolution, the graph cannot be traced using torch.jit.trace.

Error Line

No error lines.

Environment

- PyTorch: 1.12.1
- PyTorch CUDA: 11.7

Full Error Log

No response

@sun-sey sun-sey changed the title [Installation] <title> How can I use a model built with the TorchSparse library in C++? How can I use a model built with the TorchSparse library in C++? Mar 7, 2023
@zhijian-liu
Copy link
Contributor

This is not yet supported. Could you kindly let us know your use case? Thanks!

@zhijian-liu zhijian-liu added the enhancement New feature or request label Mar 10, 2023
@zhijian-liu zhijian-liu self-assigned this Jul 15, 2023
@tuxbotix
Copy link

tuxbotix commented Sep 1, 2023

Hi,
I also had the same questions, did some digging to see what's needed.

Usecases:

  1. Inference with libtorch (C++), where the runtime performance for inference is even better (lesser overheads) -> i.e. realtime inference stuff. At least for RnD purposes.
  2. This also requires JIT support and when all layers are jittable, some performance improvements can be leveraged.
  3. With some cleanups and improvements, exporting the model to TensorRT is also made easier***

Usage Workflow:

  1. Export the model to jit -> torch.jit.script(model).save('output.pt')
  2. Build the C++ application and link against the torchsparse library
  3. Register the operations (automatically done if TORCH_LIBRARY macro is used instead of pybind11 macro)
  4. Load the graph and call forward()

TODOS:

  1. Export the backend operations via TORCH_LIBRARY macro (instead of pybind11 macro)
    • This registers the ops for both C++ and Python.
    • This makes the JIT compiler to see the operations (jit scripted graph is required for C++ export).
    • Minor changes to the APIs, replace int with int64_t, etc
  2. The bigger changes are in the python classes and functional API:
    • Might need a rewrite or have a jittable() like implementation in PyG https://pytorch-geometric.readthedocs.io/en/latest/advanced/jit.html
    • No dynamically sized tuples, think of C++ tuples -> element count is same. (Tuple[int, ...] is not allowed)
    • ^kmap dicts need some changes in the key and value types. Either fix the tuple sizes for strides or use a larger tuple and full the redundant ones with -1 ((1,1,1,-1) for 3D while this supports upto 4D)
    • No global variables! -> specifically the buffer used for the faster algorithm. This is not allowed. Perhaps change the API to pass a buffer ?

So far I managed the first part, it was quite straightforward, replaces the int parameters of the functions with int64_t and float with double.

I'm trying out the second phase now, not sure how it will end up. If someone is interested I can open PRs for each part.

@tuxbotix
Copy link

tuxbotix commented Sep 15, 2023

Small update,
I managed to port the current master branch for jit compilation. It is still a bit hacky and I'm planning to open a PR once 2.1 sources are out (#236 & #237) to integrate those changes.

Timing:

With the examples/example.py I got 1.5-2x speed difference in C++ inference (jit exported) Vs Python (with and without JIT). I expect even further improvements with torchsparse 2.1 algorithms.

What had to be done:

  • Use TORCH_LIBRARY macro instead of pybind11 for bindings
  • Port conv3d autograd function to C++ and export with the above macro (JIT: Support for torch.autograd.functional.jacobian in TorchScript pytorch/pytorch#69741)
  • change/ remove all Tuple[int, ...] to List[int] -> dynamically sized tuples (in python) are pretty much std::vectors in C++ side whereas std::tuples must be fixed sized.
  • change dict key type to str in SparseTensor -> I just did f"{list(input.stride}|..." and didn't notice significant performance penalty, there are also other optimization opportunities

Once all this was accomplished, the shared object can be loaded to a C++ program with dlopen or link against it and then load the exported JIT compiled model.

@FengYuQ
Copy link

FengYuQ commented Sep 16, 2023

Small update, I managed to port the current master branch for jit compilation. It is still a bit hacky and I'm planning to open a PR once 2.1 sources are out (#236 & #237) to integrate those changes.

Timing:

With the examples/example.py I got 1.5-2x speed difference in C++ inference (jit exported) Vs Python (with and without JIT). I expect even further improvements with torchsparse 2.1 algorithms.

What had to be done:

  • Use TORCH_LIBRARY macro instead of pybind11 for bindings
  • Port conv3d autograd function to C++ and export with the above macro (JIT: Support for torch.autograd.functional.jacobian in TorchScript pytorch/pytorch#69741)
  • change/ remove all Tuple[int, ...] to List[int] -> dynamically sized tuples are pretty much std::vectors in C++ side where tuples must be fixed sized.
  • change dict key type to str in SparseTensor -> I just did f"{list(input.stride}|..." and didn't notice significant performance penalty, there are also other optimization opportunities

Once all this was accomplished, the shared object can be loaded to a C++ program with dlopen or link against it and then load the exported JIT compiled model.

looking foward for your work

@fchou-labs
Copy link

Any update or work-in-progress branch? I am interested in this feature as well.

@cama36
Copy link

cama36 commented Mar 20, 2024

Is there any update?

@tuxbotix
Copy link

tuxbotix commented Mar 28, 2024

Hi,

My apologies for the late reply. I managed to get the conversion to work by following the recipe I shared above.

However, I'm unable to share that implementation due to reasons. Furthermore, I realised this needed some fundamental changes in the library implementation and should better be done with the coordination of the original authors.

Quick start

The process requires the previously mentioned steps. Most crucially:

  1. Use TORCH_LIBRARY macro instead of directly using Pybind11.
  1. Make the Python code TorchScript compliant
  • This took most of the time! Mainly because this codebase relied a lot on duck typing and also the nature of torchscript being 'statically typed'.

  • But this was greatly helped by the torchscript compiler!

  • I wrote a small unit test like the one below, targeting the torchsparse conv3d operator. One could also start with a smaller scope (i.e. SparseTensor, etc) and fix things.

    import torch
    import torchsparse
    
    def test_torch_jit_conv():
      scripted_conv3d = torch.jit.script(torchsparse.nn.functional.conv3d) # This is *just* an example
    
  1. Test driven development is crucial
  • In addition to the tests by the authors, extra tests were written for JIT conversion and comparing the JIT variant vs original variant to ensure nothing was changed during the scripting.

Results:

  1. Tested a real world model and the examples on both Python & C++
  2. While ImplicitGEMM is good for training and inference, the newly implemented Fetch on Demand approach is more memory efficient and faster for inference only usecases.

Retrospection

I'm not 100% sure if TorchScript is the way to go for this. On one hand, it required significant changes to the codebase, also it was rather intrusive.

An alternative I am considering is to convert to ONNX via Torch Dynamo backend (which uses FX tracing -> different set of constraints) and mark the torchscript ops as custom ops. Then the runtime (i.e. TensorRT) can load them as a plugin.

Both approaches (torchscript and TensorRT) has their advantages and drawbacks. Both will also need significant changes of the library itself. If there is any more interest, we can start breaking this down to small parts and get it up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants