05 Dec 14:09

5748774

2.10.0

What's new?

🎵 New task: Zero-shot audio classification

The task of classifying audio into classes that are unseen during training. See here for more information.

Example: Perform zero-shot audio classification with Xenova/clap-htsat-unfused.

import { pipeline } from '@xenova/transformers';

// Create a zero-shot audio classification pipeline
const classifier = await pipeline('zero-shot-audio-classification', 'Xenova/clap-htsat-unfused');

const audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/dog_barking.wav';
const candidate_labels = ['dog', 'vaccum cleaner'];
const scores = await classifier(audio, candidate_labels);
// [
//   { score: 0.9993992447853088, label: 'dog' },
//   { score: 0.0006007603369653225, label: 'vaccum cleaner' }
// ]

Audio used

dog_barking.webm

💻 New architectures: CLAP, Audio Spectrogram Transformer, ConvNeXT, and ConvNeXT-v2

We added support for 4 new architectures, bringing the total up to 65!

CLAP for zero-shot audio classification, text embeddings, and audio embeddings (#427). See here for the list of available models.

Zero-shot audio classification (same as above)

Text embeddings with Xenova/clap-htsat-unfused:

import { AutoTokenizer, ClapTextModelWithProjection } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clap-htsat-unfused');
const text_model = await ClapTextModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused');

// Run tokenization
const texts = ['a sound of a cat', 'a sound of a dog'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 512 ],
//   type: 'float32',
//   data: Float32Array(1024) [ ... ],
//   size: 1024
// }

Audio embeddings with Xenova/clap-htsat-unfused:

import { AutoProcessor, ClapAudioModelWithProjection, read_audio } from '@xenova/transformers';

// Load processor and audio model
const processor = await AutoProcessor.from_pretrained('Xenova/clap-htsat-unfused');
const audio_model = await ClapAudioModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused');

// Read audio and run processor
const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav');
const audio_inputs = await processor(audio);

// Compute embeddings
const { audio_embeds } = await audio_model(audio_inputs);
// Tensor {
//   dims: [ 1, 512 ],
//   type: 'float32',
//   data: Float32Array(512) [ ... ],
//   size: 512
// }

Audio Spectrogram Transformer for audio classification (#427). See here for the list of available models.

import { pipeline } from '@xenova/transformers';

// Create an audio classification pipeline
const classifier = await pipeline('audio-classification', 'Xenova/ast-finetuned-audioset-10-10-0.4593');

// Predict class
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav';
const output = await classifier(url, { topk: 4 });
// [
//   { label: 'Meow', score: 0.5617874264717102 },
//   { label: 'Cat', score: 0.22365376353263855 },
//   { label: 'Domestic animals, pets', score: 0.1141069084405899 },
//   { label: 'Animal', score: 0.08985692262649536 },
// ]

ConvNeXT for image classification (#428). See here for the list of available models.

import { pipeline } from '@xenova/transformers';

// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/convnext-tiny-224');

// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
const output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.6153212785720825 }]

ConvNeXT-v2 for image classification (#428). See here for the list of available models.

import { pipeline } from '@xenova/transformers';

// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/convnextv2-atto-1k-224');

// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
const output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.6391205191612244 }]

🔨 Other improvements

Support decoding of tensors in #416

Full Changelog: 2.9.0...2.10.0

Assets 2

21 Nov 14:00

xenova

2.9.0

768a2e2

2.9.0

What's new?

😍 Exciting new tasks!

Transformers.js v2.9.0 adds support for three new tasks: (1) Depth estimation, (2) Zero-shot object detection, and (3) Optical document understanding.

🕵️‍♂️ Depth Estimation

The task of predicting the depth of objects present in an image. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create depth estimation pipeline
let depth_estimator = await pipeline('depth-estimation', 'Xenova/dpt-hybrid-midas');

// Predict depth for image
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let output = await depth_estimator(url);

Input	Output

Raw output

// {
//   predicted_depth: Tensor {
//     dims: [ 384, 384 ],
//     type: 'float32',
//     data: Float32Array(147456) [ 542.859130859375, 545.2833862304688, 546.1649169921875, ... ],
//     size: 147456
//   },
//   depth: RawImage {
//     data: Uint8Array(307200) [ 86, 86, 86, ... ],
//     width: 640,
//     height: 480,
//     channels: 1
//   }
// }

🎯 Zero-shot Object Detection

The task of identifying objects of classes that are unseen during training. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create zero-shot object detection pipeline
let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

// Predict bounding boxes
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
let output = await detector(url, candidate_labels);

Raw output

// [
//   {
//     score: 0.24392342567443848,
//     label: 'human face',
//     box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 }
//   },
//   {
//     score: 0.15129457414150238,
//     label: 'american flag',
//     box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 }
//   },
//   {
//     score: 0.13649864494800568,
//     label: 'helmet',
//     box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 }
//   },
//   {
//     score: 0.10262022167444229,
//     label: 'rocket',
//     box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 }
//   }
// ]

📝 Optical Document Understanding (image-to-text)

This task involves translating images of scientific PDFs to markdown, enabling easier access to them. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create image-to-text pipeline
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');

// Generate markdown
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let output = await pipe(url, {
  min_length: 1,
  max_new_tokens: 40,
  bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: [email protected]\n\nGuillem Cucur" }]

See input image

💻 New architectures: Nougat, DPT, GLPN, OwlViT

We added support for 4 new architectures, bringing the total up to 61!

DPT for depth estimation. See here for the list of available models.
GLPN for depth estimation. See here for the list of available models.
OwlViT for zero-shot object detection. See here for the list of available models.
Nougat for optical understanding of academic documents (image-to-text). See here for the list of available models.

🔨 Other improvements

Add support for Grouped Query Attention on Llama Model by @felladrin in #393
Implement max character check by @samlhuillier in #398
Add CLIPFeatureExtractor (and tests) in #387
Add jsDelivr stats to README in #395
Update sharp dependency version in #400

🐛 Bug fixes

Move tensor clone to fix Worker ownership NaN issue by @kungfooman in #404
Add default token_type_ids for multilingual-e5-* models by @do-me in #403
Ensure WASM fallback does not crash in GH actions in #402

🤗 New contributors

@felladrin made their first contribution in #393
@samlhuillier made their first contribution in #398
@do-me made their first contribution in #403

Full Changelog: 2.8.0...2.9.0

Contributors

felladrin, kungfooman, and 2 other contributors

Assets 2

09 Nov 16:53

xenova

2.8.0

c980730

2.8.0

What's new?

🖼️ New task: Image-to-image

This release adds support for image-to-image translation (e.g., super-resolution) with Swin2SR models.

Side-by-side (full)	Animated (zoomed)

As always, you can get started in just a few lines of code!

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/testsets/real-inputs/0855.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-compressed-sr-x4-48');
let output = await upscaler(url);
// RawImage {
//   data: Uint8Array(12582912) [165, 166, 163, ...],
//   width: 2048,
//   height: 2048,
//   channels: 3
// }

💻 New architectures: TrOCR, Swin2SR, Mistral, and Falcon

We also added support for 4 new architectures, bringing the total up to 57! 🤯

TrOCR for optical character recognition (OCR).

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
let captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');
let output = await captioner(url);
// [{ generated_text: 'Mr. Brown commented icily.' }]

Added in #375. See here for the list of available models.

Swin2SR for super-resolution and image restoration.

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-classical-sr-x2-64');
let output = await upscaler(url);
// RawImage {
//   data: Uint8Array(786432) [ 41, 31, 24,  43, ... ],
//   width: 512,
//   height: 512,
//   channels: 3
// }

Added in #381. See here for the list of available models.

Mistral and Falcon for text-generation. Added in #379.
Note: Other than testing models, we haven't yet converted any of the larger (≥7B parameter) models. Stay tuned for more updates on this!

🐛 Bug fixes:

By default, do not add special tokens at start of text-generation (see commit)
Fix Firefox bug when displaying progress events while reading file from browser cache in #374. Thanks to @felladrin for reporting this issue!
Fix text2text-generation pipeline output inconsistency w/ python library in #384

🔨 Minor improvements:

Upgrade typescript dependency version by @Kit-p in #368
Improve docs in #385

🤗 New Contributors

@Kit-p made their first contribution in #368

Full Changelog: 2.7.0...2.8.0

Contributors

felladrin and Kit-p

Assets 2

23 Oct 15:52

xenova

2.7.0

1ca6999

2.7.0

What's new?

🗣️ New task: Text to speech/audio

Due to popular demand, we've added text-to-speech support to Transformers.js! 😍

TTS.waveform.mp4

You can get started in just a few lines of code!

import { pipeline } from '@xenova/transformers';

let speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin';
let synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts', { quantized: false });
let out = await synthesizer('Hello, my dog is cute', { speaker_embeddings });
// {
//   audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, ...],
//   sampling_rate: 16000
// }

You can then save the audio to a .wav file with the wavefile package:

import wavefile from 'wavefile';
import fs from 'fs';

let wav = new wavefile.WaveFile();
wav.fromScratch(1, out.sampling_rate, '32f', out.audio);
fs.writeFileSync('out.wav', wav.toBuffer());

Alternatively, you can play the file in your browser (see below).

Don't like the speaker's voice? Well, you can choose another from the >7000 speaker embeddings in the CMU Arctic dataset (see here)!

Note: currently, we only support TTS w/ speecht5, but in future we'll add others like bark and MMS!

🖥️ TTS demo and example app

To showcase the power of in-browser TTS, we're also releasing a simple example app (demo, code). Feel free to make improvements to it... and if you do (or end up building your own), please tag me on Twitter! 🤗

TTS.demo.mp4

Misc. changes

Update falcon tokenizer in #344
Add more links to example section in #343
Improve electron example template in #342
Update example app dependencies in #347
Do not post-process < and > symbols generated from docs in #335

Full Changelog: 2.6.2...2.7.0

Assets 2

27 Sep 14:14

xenova

2.6.2

5b31129

2.6.2

What's new?

📝 New task: Document Question Answering

Document Question Answering is the task of answering questions based on an image of a document. Document Question Answering models take a (document, question) pair as input and return an answer in natural language. Check out the docs for more info!

Example code

// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

let image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
let question = 'What is the invoice number?';

// Create document question answering pipeline
let qa_pipeline = await pipeline('document-question-answering', 'Xenova/donut-base-finetuned-docvqa');

// Run the pipeline
let output = await qa_pipeline(image, question);
// [{ answer: 'us-001' }]

🤖 New models

Add support for DonutSwin models in #320
Add support for Blenderbot and BlenderbotSmall in #292
Add support for LongT5 models #316

💻 New example application

In-browser semantic image search in #326 (demo, code, tweet)

semantic-image-search-client.mp4

🐛 Misc. improvements

Fixing more _call LSP errors + extra typings by @kungfooman in #304
Remove CustomCache requirement for example browser extension project in #325

Full Changelog: 2.6.1...2.6.2

Contributors

kungfooman

Assets 2

18 Sep 13:40

xenova

2.6.1

b3a2a5b

2.6.1

What's new?

Add Vanilla JavaScript tutorial by @perborgen in #271. This includes an interactive video tutorial ("scrim"), which walks you through the code! Let us know if you want to see more of these video tutorials! 🤗
Add support for min_length and min_new_tokens generation parameters in #308
Fix issues with minification in #307
Fix ByteLevel pretokenizer and improve whisper test cases in #287
Misc. documentation improvements by @rubiagatra in #293

New Contributors

@rubiagatra made their first contribution in #293

Full Changelog: 2.6.0...2.6.1

Contributors

perborgen and rubiagatra

Assets 2

08 Sep 15:27

xenova

2.6.0

ad7e875

2.6.0

What's new?

🤯 14 new architectures

In this release, we've added a ton of new architectures: BLOOM, MPT, BeiT, CamemBERT, CodeLlama, GPT NeoX, GPT-J, HerBERT, mBART, mBART-50, OPT, ResNet, WavLM, and XLM. This brings the total number of supported architectures up to 46! Here's some example code to help you get started:

Text-generation with MPT (models):

import { pipeline } from '@xenova/transformers';
const generator = await pipeline('text-generation', 'Xenova/ipt-350m', {
    quantized: false, // using unquantized to ensure it matches python version
});

const output = await generator('La nostra azienda');
// { generated_text: "La nostra azienda è specializzata nella vendita di prodotti per l'igiene orale e per la salute." }

Other text-generation models: BLOOM, GPT-NeoX, CodeLlama, GPT-J, OPT.

CamemBERT for masked language modelling, text classification, token classification, question answering, and feature extraction (models). For example:

import { pipeline } from '@xenova/transformers';
let pipe = await pipeline('token-classification', 'Xenova/camembert-ner-with-dates');
let output = await pipe("Je m'appelle jean-baptiste et j'habite à montréal depuis fevr 2012");
// [
//   { entity: 'I-PER', score: 0.9258053302764893, index: 5, word: 'jean' },
//   { entity: 'I-PER', score: 0.9048717617988586, index: 6, word: '-' },
//   { entity: 'I-PER', score: 0.9227054119110107, index: 7, word: 'ba' },
//   { entity: 'I-PER', score: 0.9385354518890381, index: 8, word: 'pt' },
//   { entity: 'I-PER', score: 0.9139659404754639, index: 9, word: 'iste' },
//   { entity: 'I-LOC', score: 0.9877734780311584, index: 15, word: 'montré' },
//   { entity: 'I-LOC', score: 0.9891639351844788, index: 16, word: 'al' },
//   { entity: 'I-DATE', score: 0.9858269691467285, index: 18, word: 'fe' },
//   { entity: 'I-DATE', score: 0.9780661463737488, index: 19, word: 'vr' },
//   { entity: 'I-DATE', score: 0.980688214302063, index: 20, word: '2012' }
// ]

WavLM for feature-extraction (models). For example:

import { AutoProcessor, AutoModel, read_audio } from '@xenova/transformers';

// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base');
const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav', 16000);
const inputs = await processor(audio);

// Run model with inputs
const model = await AutoModel.from_pretrained('Xenova/wavlm-base');
const output = await model(inputs);
// {
//   last_hidden_state: Tensor {
//     dims: [ 1, 549, 768 ],
//     type: 'float32',
//     data: Float32Array(421632) [-0.349443256855011, -0.39341306686401367,  0.022836603224277496, ...],
//     size: 421632
//   }
// }

MBart +MBart50 for multilingual translation (models). For example:

import { pipeline } from '@xenova/transformers';
let translator = await pipeline('translation', 'Xenova/mbart-large-50-many-to-many-mmt');
let output = await translator('संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है', {
  src_lang: 'hi_IN', // Hindi
  tgt_lang: 'fr_XX', // French
});
// [{ translation_text: 'Le chef des Nations affirme qu 'il n 'y a military solution in Syria.' }]

See here for the full list of languages and their corresponding codes.

BeiT for image classification (models):

import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/beit-base-patch16-224');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7168469429016113 }]

ResNet for image classification (models):

import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/resnet-50');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7576608061790466 }]

😍 Over 150 newly-converted models

To get started with these new architectures (and expand coverage for other models), we're releasing over 150 new models on the Hugging Face Hub! Check out the full list here.

🏋️ HUGE reduction in model sizes (up to -40%)

Thanks to a recent update of 🤗 Optimum, we were able to remove duplicate weights across various models. In some cases, like whisper-tiny's decoder, this resulted in a 40% reduction in size! Here are some improvements we saw:

Whisper-tiny decoder: 50MB → 30MB (-40%)
NLLB decoder: 732MB → 476MB (-35%)
bloom: 819MB → 562MB (-31%)
T5 decoder: 59MB → 42MB (-28%)
distilbert-base: 91MB → 68MB (-25%)
bart-base decoder: 207MB → 155MB (-25%)
roberta-base: 165MB → 126MB (-24%)
gpt2: 167MB → 127MB (-24%)
bert-base: 134MB → 111MB (-17%)
many more!

Play around with some of the smaller whisper models (for automatic speech recognition) here!

Other

Transformers.js integration with LangChain JS (docs)

import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";

const model = new HuggingFaceTransformersEmbeddings({
  modelName: "Xenova/all-MiniLM-L6-v2",
});

/* Embed queries */
const res = await model.embedQuery(
  "What would be a good company name for a company that makes colorful socks?"
);
console.log({ res });
/* Embed documents */
const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]);
console.log({ documentRes });

Refactored PreTrainedModel to require significantly less code when adding new models
Typing improvements by @kungfooman

Contributors

kungfooman

Assets 2

28 Aug 19:06

xenova

2.5.4

0c2dcc7

2.5.4

What's new?

Add support for 3 new vision architectures (Swin, DeiT, Yolos) in #262. Check out the Hugging Face Hub to see which models you can use!

Swin for image classification. e.g.:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/swin-base-patch4-window7-224-in22k');
let output = await classifier(url, { topk: null });
// [
//   { label: 'Bengal_tiger', score: 0.2258443683385849 },
//   { label: 'tiger, Panthera_tigris', score: 0.21161635220050812 },
//   { label: 'predator, predatory_animal', score: 0.09135803580284119 },
//   { label: 'tigress', score: 0.08038495481014252 },
//   // ... 21838 more items
// ]

DeiT for image classification. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/deit-tiny-distilled-patch16-224');
let output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.9804046154022217 }]

Yolos for object detection. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let detector = await pipeline('object-detection', 'Xenova/yolos-small-300');
let output = await detector(url);
// [
//   { label: 'remote', score: 0.9837935566902161, box: { xmin: 331, ymin: 80, xmax: 367, ymax: 192 } },
//   { label: 'cat', score: 0.94994056224823, box: { xmin: 8, ymin: 57, xmax: 316, ymax: 470 } },
//   { label: 'couch', score: 0.9843178987503052, box: { xmin: 0, ymin: 0, xmax: 639, ymax: 474 } },
//   { label: 'remote', score: 0.9704685211181641, box: { xmin: 39, ymin: 71, xmax: 179, ymax: 114 } },
//   { label: 'cat', score: 0.9921762943267822, box: { xmin: 339, ymin: 17, xmax: 642, ymax: 380 } }
// ]

Documentation improvements by @perborgen in #261

New contributors 🤗

@perborgen made their first contribution in #261

Full Changelog: 2.5.3...2.5.4

Contributors

perborgen

Assets 2

22 Aug 21:52

xenova

2.5.3

7076c8e

2.5.3

What's new?

Fix whisper timestamps for non-English languages in #253
Fix caching for some LFS files from the Hugging Face Hub in #251
Improve documentation (w/ example code and links) in #255 and #257. Thanks @josephrocca for helping with this!

New contributors 🤗

@josephrocca made their first contribution in #257

Full Changelog: 2.5.2...2.5.3

Contributors

josephrocca

Assets 2

14 Aug 21:25

xenova

2.5.2

254e99e

2.5.2

What's new?

Add audio-classification with MMS and Wav2Vec2 in #220. Example usage:

// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

// Create audio classification pipeline
let classifier = await pipeline('audio-classification', 'Xenova/mms-lid-4017');

// Run inference
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jeanNL.wav';
let output = await classifier(url);
// [
//   { label: 'fra', score: 0.9995712041854858 },
//   { label: 'hat', score: 0.00003788191679632291 },
//   { label: 'lin', score: 0.00002646935718075838 },
//   { label: 'hun', score: 0.000015628289474989288 },
//   { label: 'bre', score: 0.000007014674793026643 }
// ]

Adds automatic-speech-recognition for Wav2Vec2 models in #220 (MMS coming soon).
Add support for multi-label classification problem type in #249. Thanks @KiterWork for reporting!
Add M2M100 tokenizer in #250. Thanks @AAnirudh07 for the feature request!
Documentation improvements

New Contributors

@celsodias12 made their first contribution in #247

Full Changelog: 2.5.1...2.5.2

Contributors

celsodias12, AAnirudh07, and KiterWork

Assets 2

Releases: huggingface/transformers.js

2.10.0

What's new?

🎵 New task: Zero-shot audio classification

💻 New architectures: CLAP, Audio Spectrogram Transformer, ConvNeXT, and ConvNeXT-v2

🔨 Other improvements

2.9.0

What's new?

😍 Exciting new tasks!

🕵️‍♂️ Depth Estimation

🎯 Zero-shot Object Detection

📝 Optical Document Understanding (image-to-text)

💻 New architectures: Nougat, DPT, GLPN, OwlViT

🔨 Other improvements

🐛 Bug fixes

🤗 New contributors

Contributors

2.8.0

What's new?

🖼️ New task: Image-to-image

💻 New architectures: TrOCR, Swin2SR, Mistral, and Falcon

🐛 Bug fixes:

🔨 Minor improvements:

🤗 New Contributors

Contributors

2.7.0

What's new?

🗣️ New task: Text to speech/audio

🖥️ TTS demo and example app

Misc. changes

2.6.2

What's new?

📝 New task: Document Question Answering

🤖 New models

💻 New example application

🐛 Misc. improvements

Contributors

2.6.1

What's new?

New Contributors

Contributors

2.6.0

What's new?

🤯 14 new architectures

😍 Over 150 newly-converted models

🏋️ HUGE reduction in model sizes (up to -40%)

Other

Contributors

2.5.4

What's new?

New contributors 🤗

Contributors

2.5.3

What's new?

New contributors 🤗

Contributors

2.5.2

What's new?

New Contributors

Contributors