[Web] WebGPU first‑run model warm‑up causes long GPU‑blocking operations (maxDispatchNumber & synchronous pipeline creation)

### Describe the issue

During the very first inference of a model in ONNX Runtime Web, the UI freezes while a large GPU operation is executed.
The freeze comes from two sources:

1. [maxDispatchNumber](https://github.com/microsoft/onnxruntime/blob/e525ea2295c455cbb14840b075a7935a089ecb94/js/web/lib/wasm/jsep/backend-webgpu.ts#L200) is a hard‑coded constant – all dispatches are accumulated until this limit is reached, then a single massive command‑encoder submit is performed. This results in one long GPU dispatch that stalls the render thread.

2. Shader pipelines are created synchronously with [GPUDevice.createComputePipeline](https://github.com/microsoft/onnxruntime/blob/e525ea2295c455cbb14840b075a7935a089ecb94/js/web/lib/wasm/jsep/webgpu/program-manager.ts#L115) – every required compute pipeline is compiled on‑the‑fly. Because the call blocks until compilation finishes, all pipelines are compiled back‑to‑back, which ends up as a huge synchronous GPU operation on the first frame. There is [createComputePipelineAsync](https://developer.mozilla.org/en-US/docs/Web/API/GPUDevice/createComputePipelineAsync) which is async method, and there is recommendation there: 

> It is generally preferable to use this method over [GPUDevice.createComputePipeline()](https://developer.mozilla.org/en-US/docs/Web/API/GPUDevice/createComputePipeline) whenever possible, as it prevents blocking of GPU operation execution on pipeline compilation.

Both issues lead to a noticeable “jank” of animations and page interactions on the first model run.

I think maxDispatchNumber should either be set to 1 or the user should be allowed to set it and regarding the second one - it seems that createComputePipeline needs to be changed to createComputePipelineAsync.

Right now I see during profiling heavy GPU operations that blocks UI.

<img width="527" height="360" alt="Image" src="https://github.com/user-attachments/assets/abb22807-78d7-4665-82f8-aaa1bcac0597" />

### To reproduce

I use multiple small mostly Conv models. But it can be easily reproduced on transformers-js models, for example - https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

<img width="651" height="345" alt="Image" src="https://github.com/user-attachments/assets/e327fcdd-f14b-4acd-a639-1118ec59e18e" />

Its especially reproducing on Windows Web and Electron. On mac it's not hurts that much.



### Urgency

_No response_

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.22.0

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] WebGPU first‑run model warm‑up causes long GPU‑blocking operations (maxDispatchNumber & synchronous pipeline creation) #25882

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] WebGPU first‑run model warm‑up causes long GPU‑blocking operations (maxDispatchNumber & synchronous pipeline creation) #25882

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions