Add Transformer.js as a backend for image classification #289

pearmini · 2025-11-13T13:11:55Z

For single image:

function preload() {
  classifier = ml5.imageClassifier("VisionTransformer");
  img = loadImage("images/bird.jpg");
}

function setup() {
  createCanvas(400, 400);
  classifier.classify(img, gotResult);
  image(img, 0, 0, width, height);
}

For webcam:

function preload() {
  classifier = ml5.imageClassifier("VisionTransformer");
}

function setup() {
  createCanvas(640, 480);
  video = createCapture(VIDEO);
  video.size(640, 480);
  video.hide();
  classifier.classifyStart(video, gotResult);
}

Specify options:

function preload() {
  classifier = ml5.imageClassifier("VisionTransformer", {dtype: "32",  device: "wasm", topK: 5});
}

TODO

Copilot

Pull request overview

This pull request adds Transformer.js as a new backend for image classification in ml5.js, introducing a Vision Transformer model alongside the existing TensorFlow.js-based models. The implementation creates a new ImageClassifierTransformer class that wraps Hugging Face's Transformers.js library and integrates it into the existing imageClassifier API through a factory pattern.

Key changes:

Added @huggingface/transformers dependency (v3.7.6) with associated dependencies for ONNX runtime and image processing
Implemented ImageClassifierTransformer class with support for WebGPU/WASM inference
Created three example sketches demonstrating single image classification, top-k results, and webcam classification

Reviewed changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
package.json	Added `@huggingface/transformers` v3.7.6 dependency
yarn.lock	Added all transitive dependencies for Transformers.js including ONNX runtime, Sharp, and Protobuf packages
webpack.config.js	Added warning suppression for ESM import.meta usage in Transformers.js
src/utils/imageUtilities.js	Exported existing `drawToCanvas` helper function for video-to-canvas conversion
src/ImageClassifier/transformer.js	New implementation of Vision Transformer-based image classifier
src/ImageClassifier/index.js	Modified factory function to route "VisionTransformer" model name to new implementation
examples/imageClassifier-transformer-single-image/*	Example demonstrating single image classification with default top-k
examples/imageClassifier-transformer-single-image-topk/*	Example demonstrating custom top-k parameter usage
examples/imageClassifier-transformer-webcam/*	Example demonstrating continuous webcam classification with start/stop

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ImageClassifier/index.js

src/ImageClassifier/transformer.js

Copilot · 2025-12-04T04:18:39Z

@pearmini I've opened a new pull request, #293, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gohai · 2025-12-24T09:30:08Z

examples/imageClassifier-transformer-single-image-topk/sketch.js

@@ -0,0 +1,45 @@
+/*


This might be just for testing - thought I point this out just in case: I don't think it's worth having a separate example just to demonstrate the topk option (documenting this seems sufficient imho)

gohai · 2025-12-24T09:31:12Z

examples/imageClassifier-transformer-single-image-topk/sketch.js

+let confidence = "";
+
+function preload() {
+  classifier = ml5.imageClassifier("VisionTransformer", { topK: 2 });


Elsewhere in the codebase, we use topk as an option with lowercase k. Probably good to retain that for consistency?

gohai · 2025-12-24T09:32:25Z

examples/imageClassifier-transformer-webcam/index.html

+  Learn more about the ml5.js project: https://ml5js.org/
+  ml5.js license and Code of Conduct: https://github.com/ml5js/ml5-next-gen/blob/main/LICENSE.md
+
+  This example demonstrates detecting objects in a live video through ml5.imageClassifier.


add using a transformer model (here and in <title>, sketch.js)?

gohai · 2025-12-24T09:32:59Z

examples/imageClassifier-transformer-single-image-topk/index.html

+    <script src="sketch.js"></script>
+  </body>
+</html>
+


nitpick: extra newline

gohai · 2025-12-24T09:33:04Z

examples/imageClassifier-transformer-single-image/index.html

+    <script src="sketch.js"></script>
+  </body>
+</html>
+


nitpick: extra newline

gohai · 2025-12-24T09:35:04Z

src/ImageClassifier/index.js

 };

-export default imageClassifier;
+export default imageClassifier;


Git prefers to have a newline character at the end of each file (since diff operates on whole lines)

gohai · 2025-12-24T09:39:29Z

src/ImageClassifier/transformer.js

+        // WebGPU is very fast, so we can call the next frame immediately
+        if (this.device === "webgpu") next();
+        // Wasm is slower, so we wait for 1 second before calling the next frame
+        else setTimeout(next, 1000);


Does this limit WASM to 1 fps (regardless of compute power)? Is there any way to schedule this dynamically instead?

gohai · 2025-12-24T09:48:49Z

examples/imageClassifier-transformer-single-image/sketch.js

+let confidence = "";
+
+function preload() {
+  classifier = ml5.imageClassifier("VisionTransformer");


Personally a bit on the fence if "VisionTransformer" is beneficial or not vs "vit-base-patch16-224" ... searching for the former brings up some articles of the general architecture (that by now different models implement) - only the latter lets me know that this was e.g. trained on 14 million images, with 21 thousand classes, and uses a resolution of 224x224.

If we'll be using "VisionTransformer": how about printing the actual name of the model that is being used to the console?

gohai · 2025-12-24T09:55:26Z

Thank you @pearmini - added a few comments to specific lines.

I had two general observations:

In China, using NYU Shanghai's otherwise excellent VPN, loading the 345 MB model took me 3.3 minutes - which is significantly worse than with Google's tensorflow.js models. (Most users will already have given up at this point.) Curious if this is a known issue (throttling?) in our corner of the woods, or if there are other CDNs in front of HuggingFace we might want to use?

In my Chrome, I received a warning message and two error messages in the console while it ran. Ideally, we don't print those, since people using the library might think those are their fault, or something they might need to address..? Is there any way to reduce transformers.js verbosity, or otherwise filtering its output?

nasif-co · 2026-01-18T03:07:04Z

In my Chrome, I received a warning message and two error messages in the console while it ran. Ideally, we don't print those, since people using the library might think those are their fault, or something they might need to address..? Is there any way to reduce transformers.js verbosity, or otherwise filtering its output?

These console warnings seem to be an ongoing problem with the onnxruntime itself, here's a github issue on the matter: huggingface/transformers.js#270

pearmini marked this pull request as draft November 13, 2025 13:12

pearmini force-pushed the image-classifier-transformer-js branch from 791087b to 605d8ee Compare November 20, 2025 13:04

pearmini force-pushed the image-classifier-transformer-js branch 2 times, most recently from 40bdcb9 to e7dea85 Compare December 3, 2025 19:36

Classify a single image

7fabf65

pearmini force-pushed the image-classifier-transformer-js branch from e7dea85 to 7fabf65 Compare December 3, 2025 19:40

pearmini added 3 commits December 3, 2025 14:54

Support topK

9cd05f6

Classify video

57ca0ee

Update comments

0eccced

pearmini force-pushed the image-classifier-transformer-js branch from 34f9ae0 to 0eccced Compare December 4, 2025 03:05

Add test for classify stop

f3f5055

pearmini requested a review from Copilot December 4, 2025 03:15

pearmini marked this pull request as ready for review December 4, 2025 03:15

Copilot started reviewing on behalf of pearmini December 4, 2025 03:15 View session

pearmini requested review from gohai and shiffman December 4, 2025 03:18

Copilot finished reviewing on behalf of pearmini December 4, 2025 03:18

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 4, 2025

[WIP] Address feedback on adding Transformer.js for image classification #293

Closed

7 tasks

pearmini and others added 2 commits December 3, 2025 23:22

Update src/ImageClassifier/transformer.js

9845453

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix CR

32a6c04

gohai reviewed Dec 24, 2025

View reviewed changes

examples/imageClassifier-transformer-single-image/index.html

<script src="sketch.js"></script>

</body>

</html>

Copy link

Member

gohai Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: extra newline

gohai reviewed Dec 24, 2025

View reviewed changes

nasif-co mentioned this pull request Jan 18, 2026

Add Depth Anything with transformers.js #298

Draft

3 tasks

Add Transformer.js as a backend for image classification #289

Are you sure you want to change the base?

Add Transformer.js as a backend for image classification #289

Uh oh!

Conversation

pearmini commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Dec 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gohai commented Dec 24, 2025

Uh oh!

nasif-co commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pearmini commented Nov 13, 2025 •

edited

Loading