Skip to content

[Web] WebGPU first‑run model warm‑up causes long GPU‑blocking operations (maxDispatchNumber & synchronous pipeline creation) #25882

@grazder

Description

@grazder

Describe the issue

During the very first inference of a model in ONNX Runtime Web, the UI freezes while a large GPU operation is executed.
The freeze comes from two sources:

  1. maxDispatchNumber is a hard‑coded constant – all dispatches are accumulated until this limit is reached, then a single massive command‑encoder submit is performed. This results in one long GPU dispatch that stalls the render thread.

  2. Shader pipelines are created synchronously with GPUDevice.createComputePipeline – every required compute pipeline is compiled on‑the‑fly. Because the call blocks until compilation finishes, all pipelines are compiled back‑to‑back, which ends up as a huge synchronous GPU operation on the first frame. There is createComputePipelineAsync which is async method, and there is recommendation there:

It is generally preferable to use this method over GPUDevice.createComputePipeline() whenever possible, as it prevents blocking of GPU operation execution on pipeline compilation.

Both issues lead to a noticeable “jank” of animations and page interactions on the first model run.

I think maxDispatchNumber should either be set to 1 or the user should be allowed to set it and regarding the second one - it seems that createComputePipeline needs to be changed to createComputePipelineAsync.

Right now I see during profiling heavy GPU operations that blocks UI.

Image

To reproduce

I use multiple small mostly Conv models. But it can be easily reproduced on transformers-js models, for example - https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

Image

Its especially reproducing on Windows Web and Electron. On mac it's not hurts that much.

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.22.0

Execution Provider

'webgpu' (WebGPU)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:WebGPUort-web webgpu providermodel:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.platform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions