Skip to content

Conversation

@gedoensmax
Copy link
Contributor

This PR adds a test for CiG inference to demonstrate how usage for it should look like.
It is important to not call cudaSetDevice in that flow since it will create a new context. @nieubank I am not sure why there was a cudaSetDevice on each import call 🤔 Is this done to enable importing semaphores of e.g. GPU:1 to a session running on GPU:0 ? Context management is unreliable with the current CUDA runtime and CUDA driver API mixing.

@nieubank
Copy link
Contributor

nieubank commented Jan 8, 2026

This PR adds a test for CiG inference to demonstrate how usage for it should look like. It is important to not call cudaSetDevice in that flow since it will create a new context. @nieubank I am not sure why there was a cudaSetDevice on each import call 🤔 Is this done to enable importing semaphores of e.g. GPU:1 to a session running on GPU:0 ? Context management is unreliable with the current CUDA runtime and CUDA driver API mixing.

Awesome, thanks for this! You're seeing my inexperience with the CUDA API here :), I have another branch I was working on to fix some of the context stuff, but I figure this implementation will be a longer-term collaboration/hand-off at some point. Just wanted to validate the API with some real code.

@gedoensmax
Copy link
Contributor Author

Yes sure, we (or in other words @praneshgo ) will probably take it over. I made these changes for the exact same reason to experiment with it. And i already identified a TRT RTX optimization opportunity that we will fix internally.

To better test the correct async behaviour by the way it might be better to submit multiple inferences and wait on the last result to ensure that we are not synchronous due to CPU overhead.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive test for CUDA Interop Graphics (CIG) inference to demonstrate proper usage patterns when working with external D3D12 resources. The key change is modifying context management to avoid calling cudaSetDevice when a CUDA context already exists, which prevents creating unwanted new contexts during CIG workflows.

Changes:

  • Added FullInferenceWithExternalMemoryCIG test demonstrating CIG context usage with external memory import
  • Modified context management in nv_provider_factory.cc to check for existing CUDA contexts before calling cudaSetDevice
  • Migrated test API calls from ort_api_ to ort_interop_api_ for external resource operations

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
onnxruntime/test/providers/nv_tensorrt_rtx/nv_external_resource_importer_test.cc Added CudaDriverLoader helper class, renamed test fixture to NvExecutionProviderExternalResourceImporterTest, migrated to interop API, and added comprehensive CIG inference test
onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc Modified ImportMemory, ImportSemaphore, and CreateSyncStreamForDevice to check for existing CUDA context before calling cudaSetDevice

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@praneshgo
Copy link
Contributor

Functionally, the change looks good to me.

@praneshgo
Copy link
Contributor

@gaugarg-nv @ankan-ban can you please review this as well? Thanks.

@ankan-ban
Copy link
Contributor

ankan-ban commented Jan 15, 2026

Thanks, Max, for writing the test. It looks good. It's nice to see most of the interop functionality nicely abstracted out behind the new ORT interop APIs.

There are just a couple of things that are still Nvidia specific in the test (maybe consider abstracting out these too in the future with more additions to ORT APIs):

  1. Use of CUDA APIs by the app to create the context before invoking ORT (this requires a command queue handle that would share same TSG with the cuda context. The same TSG is a requirement to enable CiG on our hardware).

  2. Use of nv-specific session options (kUserComputeStream, kHasUserComputeStream, kMaxSharedMemSize). Maybe having a mechanism for passing the generic ORT-stream object to session.run() makes even more sense. The shared memory size limit is again required for running in CiG mode - and hopefully if we have a generic way of doing "1" above - it can allow the EP to automatically set the correct value depending on the GPU.

I think resolving the above 2 would allow app developers to write truly IHV agonistic code that runs everywhere (e.g, using DX12 APIs for allocating resources, doing any pre/post processing and generic ORT APIs to run the model).

@gedoensmax
Copy link
Contributor Author

Thanks @ankan-ban for the review.

  1. DO you consider this as a big blocker ? I though of this small driver API usage as OK for an ISV to integrate, let me know if you think different.
  2. Fully agree ! I would love to set the kMaxSharedMemSize implicit but i did not found a way to check for the max supported shared mem size based on the currently pushed context. Having a stream as input on Ort::Session::Run would be great and @skottmckay has mentioned that on another thread, can you let us know what the status on this is ?

@nieubank can you help tag this for 1.24 since we would like to make sure that this goes in with the newly added Interop API. I can take care of rebasing to main and accepting some of the copilot comments if that's all.

@nieubank nieubank added this to the 1.24.0 milestone Jan 15, 2026
@gedoensmax gedoensmax force-pushed the maximilianm/nv_ext_importer branch from ca44c43 to 79d020e Compare January 16, 2026 23:33
@gedoensmax gedoensmax changed the base branch from nieubank/nv_ext_importer to main January 16, 2026 23:33
@gedoensmax gedoensmax changed the title Add test for CIG inference [TRT RTX EP] Add support for D3D12 external resourrce import Jan 16, 2026
@gedoensmax gedoensmax requested a review from Copilot January 16, 2026 23:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gedoensmax gedoensmax changed the title [TRT RTX EP] Add support for D3D12 external resourrce import [TRT RTX EP] Add support for D3D12 external resource import Jan 16, 2026
@skottmckay
Copy link
Contributor

#26988 added the stream in RunOptions and is now merged.

@ankan-ban
Copy link
Contributor

>DO you consider this as a big blocker ? I though of this small driver API usage as OK for an ISV to integrate, let me know if you think different
Agree that it's not a big blocker - but after #26988 it seems the only NV specific thing that the app needs to do. Agree that it's likely not too much for ISVs to integrate.

@gedoensmax gedoensmax force-pushed the maximilianm/nv_ext_importer branch from 3d012ef to 468eff1 Compare January 19, 2026 15:02
@gedoensmax
Copy link
Contributor Author

I rebased on the refined structs and started providing the stream as run option. There are missing changes to support this for CiG but we are tracking this internally.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuslepukhin
Copy link
Member

@gedoensmax Please, review, address, comment on, resolve Copilot comments. They are often useful.

@gedoensmax
Copy link
Contributor Author

@yuslepukhin all the comments i resolved i already implemented from copilot. I missed the one on raw tensor size handling and made the according change.

nieubank
nieubank previously approved these changes Jan 26, 2026
@yuslepukhin
Copy link
Member

It looks good. You will need to resolve conflicts. It will also require to mark all the comments as resolved.

…porter_cig

# Conflicts:
#	onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc
@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants