[V3 proposal] Improved defaults for quantization and device selection #960

xenova · 2024-10-04T09:59:02Z

Feature request

Currently, Transformers.js V3 defaults to use CPU (WASM) instead of GPU (WebGPU) due to lack of support and instability across browsers (specifically Firefox and Safari, and Chrome in Ubuntu). However, this provides a poor user experience since is performance left on the table. As browser support for WebGPU increases (currently ~70%), this will become more important since users may experience poor performance when better settings are available.

A better proposal should be to use device: "auto" instead of device: null by default, which should select (1) quantization and (2) device) based on the following:

Browser support (e.g., whether WebGPU is enabled)
Device capabilities (OS, mobile vs. desktop, fp16 support)
Model architecture/type (BERT models are more likely to succeed than encoder-decoder models) - some models have ops which are not supported in WebGPU.

Motivation

Improve user experience and performance with better defaults

Your contribution

Will work with @FL33TW00D on this

The text was updated successfully, but these errors were encountered:

FL33TW00D · 2024-10-04T10:03:43Z

xenova · 2024-10-04T10:07:23Z

Current logic for session selection: https://github.com/xenova/transformers.js/blob/6505abb164a3eea1dd5e80e56a72f7d805715f0a/src/models.js#L148-L262

Request / commit for ability to set adapter in ORT-web

FL33TW00D · 2024-10-17T17:47:47Z

Some thoughts:

Most users of Transformers.JS will want the optimal device to be selected for them, using Device.AUTO.
Some advanced users will want to force an ExecutionProvider to be used.
We need to support both of the above use cases.

Currently, the distinction between our DEVICE_TYPES and ORT is quite blurry.

I propose:

Create a class Device which is a Transformers.JS Device.
Simplify DEVICE_TYPES to just handle Auto, CPU, GPU, and NPU.
Use conversion functions to convert from Device -> ORTBackend etc.

This class will encapsulate all of the logic wrt to devices and converting a Device into the required ExecutionProvider when it is required. This device will also implement the above flow chart to ensure that users get the best experience.

The class Device will primarily expose CPU, GPU, and NPU, but users will also be able to directly provide an ORTExecutionProvider to skip our whole flowchart and force their required EP.

We should create a class Device and keep the enum DEVICE_TYPES here. The enum should be changed to ensure no bleeding between us and ORT.

xenova added the enhancement New feature or request label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V3 proposal] Improved defaults for quantization and device selection #960

[V3 proposal] Improved defaults for quantization and device selection #960

xenova commented Oct 4, 2024

FL33TW00D commented Oct 4, 2024

xenova commented Oct 4, 2024 •

edited

Loading

FL33TW00D commented Oct 17, 2024 •

edited

Loading

[V3 proposal] Improved defaults for quantization and device selection #960

[V3 proposal] Improved defaults for quantization and device selection #960

Comments

xenova commented Oct 4, 2024

Feature request

Motivation

Your contribution

FL33TW00D commented Oct 4, 2024

xenova commented Oct 4, 2024 • edited Loading

FL33TW00D commented Oct 17, 2024 • edited Loading

xenova commented Oct 4, 2024 •

edited

Loading

FL33TW00D commented Oct 17, 2024 •

edited

Loading