-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Describe the issue
During the very first inference of a model in ONNX Runtime Web, the UI freezes while a large GPU operation is executed.
The freeze comes from two sources:
-
maxDispatchNumber is a hard‑coded constant – all dispatches are accumulated until this limit is reached, then a single massive command‑encoder submit is performed. This results in one long GPU dispatch that stalls the render thread.
-
Shader pipelines are created synchronously with GPUDevice.createComputePipeline – every required compute pipeline is compiled on‑the‑fly. Because the call blocks until compilation finishes, all pipelines are compiled back‑to‑back, which ends up as a huge synchronous GPU operation on the first frame. There is createComputePipelineAsync which is async method, and there is recommendation there:
It is generally preferable to use this method over GPUDevice.createComputePipeline() whenever possible, as it prevents blocking of GPU operation execution on pipeline compilation.
Both issues lead to a noticeable “jank” of animations and page interactions on the first model run.
I think maxDispatchNumber should either be set to 1 or the user should be allowed to set it and regarding the second one - it seems that createComputePipeline needs to be changed to createComputePipelineAsync.
Right now I see during profiling heavy GPU operations that blocks UI.

To reproduce
I use multiple small mostly Conv models. But it can be easily reproduced on transformers-js models, for example - https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

Its especially reproducing on Windows Web and Electron. On mac it's not hurts that much.
Urgency
No response
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.22.0
Execution Provider
'webgpu' (WebGPU)