-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[TRT RTX EP] Add support for D3D12 external resource import #26948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Awesome, thanks for this! You're seeing my inexperience with the CUDA API here :), I have another branch I was working on to fix some of the context stuff, but I figure this implementation will be a longer-term collaboration/hand-off at some point. Just wanted to validate the API with some real code. |
|
Yes sure, we (or in other words @praneshgo ) will probably take it over. I made these changes for the exact same reason to experiment with it. And i already identified a TRT RTX optimization opportunity that we will fix internally. To better test the correct async behaviour by the way it might be better to submit multiple inferences and wait on the last result to ensure that we are not synchronous due to CPU overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive test for CUDA Interop Graphics (CIG) inference to demonstrate proper usage patterns when working with external D3D12 resources. The key change is modifying context management to avoid calling cudaSetDevice when a CUDA context already exists, which prevents creating unwanted new contexts during CIG workflows.
Changes:
- Added
FullInferenceWithExternalMemoryCIGtest demonstrating CIG context usage with external memory import - Modified context management in
nv_provider_factory.ccto check for existing CUDA contexts before callingcudaSetDevice - Migrated test API calls from
ort_api_toort_interop_api_for external resource operations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc | Added CudaDriverLoader helper class, renamed test fixture to NvExecutionProviderExternalResourceImporterTest, migrated to interop API, and added comprehensive CIG inference test |
| onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc | Modified ImportMemory, ImportSemaphore, and CreateSyncStreamForDevice to check for existing CUDA context before calling cudaSetDevice |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
|
Functionally, the change looks good to me. |
|
@gaugarg-nv @ankan-ban can you please review this as well? Thanks. |
|
Thanks, Max, for writing the test. It looks good. It's nice to see most of the interop functionality nicely abstracted out behind the new ORT interop APIs. There are just a couple of things that are still Nvidia specific in the test (maybe consider abstracting out these too in the future with more additions to ORT APIs):
I think resolving the above 2 would allow app developers to write truly IHV agonistic code that runs everywhere (e.g, using DX12 APIs for allocating resources, doing any pre/post processing and generic ORT APIs to run the model). |
|
Thanks @ankan-ban for the review.
@nieubank can you help tag this for 1.24 since we would like to make sure that this goes in with the newly added Interop API. I can take care of rebasing to main and accepting some of the copilot comments if that's all. |
ca44c43 to
79d020e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Show resolved
Hide resolved
|
#26988 added the stream in RunOptions and is now merged. |
|
>DO you consider this as a big blocker ? I though of this small driver API usage as OK for an ISV to integrate, let me know if you think different |
3d012ef to
468eff1
Compare
|
I rebased on the refined structs and started providing the stream as run option. There are missing changes to support this for CiG but we are tracking this internally. |
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Fixed
Show fixed
Hide fixed
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_utils.h
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_utils.h
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc
Outdated
Show resolved
Hide resolved
|
@gedoensmax Please, review, address, comment on, resolve Copilot comments. They are often useful. |
|
@yuslepukhin all the comments i resolved i already implemented from copilot. I missed the one on raw tensor size handling and made the according change. |
|
It looks good. You will need to resolve conflicts. It will also require to mark all the comments as resolved. |
…porter_cig # Conflicts: # onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
This PR adds a test for CiG inference to demonstrate how usage for it should look like.
It is important to not call
cudaSetDevicein that flow since it will create a new context. @nieubank I am not sure why there was acudaSetDeviceon each import call 🤔 Is this done to enable importing semaphores of e.g. GPU:1 to a session running on GPU:0 ? Context management is unreliable with the current CUDA runtime and CUDA driver API mixing.