What's new?

💬 StableLM text-generation models

This version adds support for the StableLM family of text-generation models (up to 1.6B params), developed by Stability AI. Huge thanks to @D4ve-R for this contribution in #616! See here for the full list of supported models.

Example: Text generation with Xenova/stablelm-2-zephyr-1_6b.

import { pipeline } from '@xenova/transformers';

// Create text generation pipeline
const generator = await pipeline('text-generation', 'Xenova/stablelm-2-zephyr-1_6b');

// Define the prompt and list of messages
const prompt = "Tell me a funny joke."
const messages = [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": prompt },
]

// Apply chat template
const inputs = generator.tokenizer.apply_chat_template(messages, {
    tokenize: false,
    add_generation_prompt: true,
});

// Generate text
const output = await generator(inputs, { max_new_tokens: 20 });
console.log(output[0].generated_text);
// "<|system|>\nYou are a helpful assistant.\n<|user|>\nTell me a funny joke.\n<|assistant|>\nHere's a joke for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

Note: these models may be too large to run in your browser at the moment, so for now, we recommend using them in Node.js. Stay tuned for updates on this!

🔉 Speaker verification and diarization models

Example: Speaker verification w/ Xenova/wavlm-base-plus-sv.

import { AutoProcessor, AutoModel, read_audio, cos_sim } from '@xenova/transformers';

// Load processor and model
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sv');
const model = await AutoModel.from_pretrained('Xenova/wavlm-base-plus-sv');

// Helper function to compute speaker embedding from audio URL
async function compute_embedding(url) {
    const audio = await read_audio(url, 16000);
    const inputs = await processor(audio);
    const { embeddings } = await model(inputs);
    return embeddings.data;
}

// Generate speaker embeddings
const BASE_URL = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/sv_speaker';
const speaker_1_1 = await compute_embedding(`${BASE_URL}-1_1.wav`);
const speaker_1_2 = await compute_embedding(`${BASE_URL}-1_2.wav`);
const speaker_2_1 = await compute_embedding(`${BASE_URL}-2_1.wav`);
const speaker_2_2 = await compute_embedding(`${BASE_URL}-2_2.wav`);

// Compute similarity scores
console.log(cos_sim(speaker_1_1, speaker_1_2)); // 0.959439158881247 (Both are speaker 1)
console.log(cos_sim(speaker_1_2, speaker_2_1)); // 0.618130172602329 (Different speakers)
console.log(cos_sim(speaker_2_1, speaker_2_2)); // 0.962999814169370 (Both are speaker 2)

Example: Perform speaker diarization with Xenova/wavlm-base-plus-sd.

import { AutoProcessor, AutoModelForAudioFrameClassification, read_audio } from '@xenova/transformers';

// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base-plus-sd');
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const audio = await read_audio(url, 16000);
const inputs = await processor(audio);

// Run model with inputs
const model = await AutoModelForAudioFrameClassification.from_pretrained('Xenova/wavlm-base-plus-sd');
const { logits } = await model(inputs);
// {
//   logits: Tensor {
//     dims: [ 1, 549, 2 ],  // [batch_size, num_frames, num_speakers]
//     type: 'float32',
//     data: Float32Array(1098) [-3.5301010608673096, ...],
//     size: 1098
//   }
// }

const labels = logits[0].sigmoid().tolist().map(
    frames => frames.map(speaker => speaker > 0.5 ? 1 : 0)
);
console.log(labels); // labels is a one-hot array of shape (num_frames, num_speakers)
// [
//     [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
//     [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0],
//     [0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1],
//     ...
// ]

These additions were made possible thanks to the following PRs:

Add support for WavLMForXVector by @D4ve-R in #603
Add support for WavLMForAudioFrameClassification and Wav2Vec2ForAudioFrameClassification by @D4ve-R in #611
Add support for UniSpeech and UniSpeechSat models in #624

📝 Improved chat templating operation coverage

With this release, we're pleased to announce that Transformers.js is now able to parse every single valid chat template that is currently on the Hugging Face Hub! 🤯 As of 2024/03/05, this is around ~12k conversational models (of which there were ~250 unique templates). Of course, future models may introduce more complex chat templates, and we'll continue to add support for them!

For example, transformers.js can now generate the prompt for highly complex function-calling models (e.g., fireworks-ai/firefunction-v1):

See code

import { AutoTokenizer } from '@xenova/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('fireworks-ai/firefunction-v1')

const function_spec = [
    {
        name: 'get_stock_price',
        description: 'Get the current stock price',
        parameters: {
            type: 'object',
            properties: {
                symbol: {
                    type: 'string',
                    description: 'The stock symbol, e.g. AAPL, GOOG'
                }
            },
            required: ['symbol']
        }
    },
    {
        name: 'check_word_anagram',
        description: 'Check if two words are anagrams of each other',
        parameters: {
            type: 'object',
            properties: {
                word1: {
                    type: 'string',
                    description: 'The first word'
                },
                word2: {
                    type: 'string',
                    description: 'The second word'
                }
            },
            required: ['word1', 'word2']
        }
    }
]

const messages = [
    { role: 'functions', content: JSON.stringify(function_spec, null, 4) },
    { role: 'system', content: 'You are a helpful assistant with access to functions. Use them if required.' },
    { role: 'user', content: 'Hi, can you tell me the current stock price of AAPL?' }
]

const inputs = tokenizer.apply_chat_template(messages, { tokenize: false });
console.log(inputs);
// <s>SYSTEM: You are a helpful assistant ...

🎨 New example applications and demos

Create video object detection demo in #607 (try it out).
Create cross-encoder demo in #617 (try it out).
Add Claude 3 and Mistral to the tokenizer playground in #625 (try it out).

🛠️ Misc. improvements

Add support for the starcoder2 architecture in #622. Note: we haven't yet added transformers.js-compatible versions of the 3B and 7B models.
Check for existence of onnx_env.wasm before updating wasmPaths in #621

🤗 New contributors

@D4ve-R made their first contribution in #603

Full Changelog: 2.15.1...2.16.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.16.0