opus-transcriber-proxy

Real-time WebSocket transcription proxy supporting multiple speech-to-text backends. Routes audio to OpenAI, Deepgram, or Google Gemini and streams transcription results back to clients.

Features

Multi-provider support - OpenAI Realtime, Deepgram Nova, Google Gemini
Provider fallback - Configurable priority order with automatic failover
Multi-participant sessions - Single WebSocket handles multiple audio streams
Real-time streaming - Interim and final transcription results
Flexible deployment - Node.js standalone or Cloudflare Workers with Containers
Dispatcher integration - Forward transcriptions to external services
Audio debugging - Dump and replay WebSocket sessions

Quick Start

# Install dependencies
npm install

# Build WASM decoder (first time only)
npm run configure  # Install Emscripten
npm run build:wasm

# Set API key(s)
export OPENAI_API_KEY=sk-...
# or
export DEEPGRAM_API_KEY=...

# Start server
npm run dev

Connect via WebSocket:

ws://localhost:8080/transcribe?sessionId=test&sendBack=true

Installation

Prerequisites

Node.js 22+
Emscripten (for WASM compilation)

Build

npm install
npm run configure   # Setup Emscripten (one-time)
npm run build       # Build WASM + TypeScript + bundle

Docker

npm run docker:build
npm run docker:run

Configuration

Set environment variables or use a .env file:

Provider Selection

Variable	Default	Description
`PROVIDERS_PRIORITY`	`openai,deepgram,gemini`	Provider priority order

API Keys

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`DEEPGRAM_API_KEY`	Deepgram API key
`GEMINI_API_KEY`	Google Gemini API key

Provider Options

Variable	Default	Description
`OPENAI_MODEL`	`gpt-4o-mini-transcribe`	OpenAI model
`DEEPGRAM_MODEL`	`nova-2`	Deepgram model
`DEEPGRAM_LANGUAGE`	`multi`	Language code or `multi` for auto
`DEEPGRAM_ENCODING`	`linear16`	`linear16`, `opus`, or `ogg-opus`
`GEMINI_MODEL`	`gemini-2.0-flash-exp`	Gemini model

Server

Variable	Default	Description
`PORT`	`8080`	Listen port
`HOST`	`0.0.0.0`	Listen address
`DEBUG`	`false`	Enable debug logging
`FORCE_COMMIT_TIMEOUT`	`2`	Seconds before finalizing pending audio

Dispatcher (Optional)

Variable	Default	Description
`USE_DISPATCHER`	`false`	Enable dispatcher forwarding
`DISPATCHER_WS_URL`	(empty)	Dispatcher WebSocket URL
`DISPATCHER_HEADERS`	`{}`	Auth headers (JSON)

See DISPATCHER_INTEGRATION.md for details.

WebSocket Protocol

Connection

ws://host:port/transcribe?sessionId=xxx&sendBack=true

Query Parameters:

Parameter	Default	Description
`sessionId`	(required)	Session identifier
`sendBack`	`false`	Return final transcriptions
`sendBackInterim`	`false`	Return interim transcriptions
`provider`	(auto)	Override provider selection
`encoding`	`opus`	Audio encoding: `opus` or `ogg-opus`
`lang`	(auto)	Language hint

Client Messages

Audio data:

{
  "event": "media",
  "media": {
    "tag": "participant-id",
    "chunk": 0,
    "timestamp": 1768341932,
    "payload": "base64-encoded-audio"
  }
}

Ping:

{"event": "ping", "id": 123}

Server Messages

Transcription result:

{
  "type": "transcription-result",
  "is_interim": false,
  "transcript": [{"text": "hello world", "confidence": 0.98}],
  "participant": {"id": "participant-id"},
  "timestamp": 1768341932000,
  "language": "en"
}

Pong:

{"event": "pong", "id": 123}

Supported Providers

Provider	Features
OpenAI	Server VAD, confidence scores, streaming
Deepgram	Punctuation, diarization, code-switching, streaming
Gemini	Multimodal, multilingual

See BACKENDS.md for detailed comparison and configuration.

Deployment

Node.js

npm start

Docker

docker build -t opus-transcriber-proxy .
docker run -p 8080:8080 -e OPENAI_API_KEY=sk-... opus-transcriber-proxy

Cloudflare Workers

npm run cf:deploy

See CLOUDFLARE_DEPLOYMENT.md for setup instructions.

Development

npm run dev          # Dev server with hot reload
npm run test         # Run tests
npm run typecheck    # Type checking

Project Structure

src/
├── server.ts              # HTTP/WebSocket server
├── transcriberproxy.ts    # Main proxy orchestration
├── OutgoingConnection.ts  # Per-participant backend handler
├── config.ts              # Configuration
├── backends/              # Transcription backends
│   ├── OpenAIBackend.ts
│   ├── DeepgramBackend.ts
│   └── GeminiBackend.ts
└── OpusDecoder/           # WASM Opus decoder
worker/
└── index.ts               # Cloudflare Worker entry

Adding a Backend

Create src/backends/YourBackend.ts implementing TranscriptionBackend
Add configuration to src/config.ts
Register in src/backends/BackendFactory.ts

See BACKENDS.md for the template and details.

Debugging

Dump WebSocket Messages

DUMP_WEBSOCKET_MESSAGES=true npm run dev
# Messages saved to /tmp/{sessionId}/media.jsonl

Replay Recorded Session

node scripts/replay-dump.cjs media.jsonl "ws://localhost:8080/transcribe?sendBack=true"

Mix Recorded Audio

npm run mix-audio /tmp/session123/media.jsonl output.wav

See WEBSOCKET_DUMP.md and AUDIO_MIXING.md.

Documentation

BACKENDS.md - Provider details and comparison
CLOUDFLARE_DEPLOYMENT.md - Cloudflare setup
DISPATCHER_INTEGRATION.md - External dispatcher
CONTAINER_ROUTING.md - Container routing modes
WEBSOCKET_DUMP.md - Message debugging
AUDIO_MIXING.md - Audio extraction tool

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/workflows		.github/workflows
.vscode		.vscode
scripts		scripts
src		src
test		test
worker		worker
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AUDIO_MIXING.md		AUDIO_MIXING.md
AUTOSCALING.md		AUTOSCALING.md
BACKENDS.md		BACKENDS.md
CLOUDFLARE_DEPLOYMENT.md		CLOUDFLARE_DEPLOYMENT.md
CONTAINER_ROUTING.md		CONTAINER_ROUTING.md
DISPATCHER_INTEGRATION.md		DISPATCHER_INTEGRATION.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
WEBSOCKET_DUMP.md		WEBSOCKET_DUMP.md
build.mjs		build.mjs
env.example		env.example
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.mts		vitest.config.mts
worker-configuration.d.ts		worker-configuration.d.ts
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

opus-transcriber-proxy

Features

Quick Start

Installation

Prerequisites

Build

Docker

Configuration

Provider Selection

API Keys

Provider Options

Server

Dispatcher (Optional)

WebSocket Protocol

Connection

Client Messages

Server Messages

Supported Providers

Deployment

Node.js

Docker

Cloudflare Workers

Development

Project Structure

Adding a Backend

Debugging

Dump WebSocket Messages

Replay Recorded Session

Mix Recorded Audio

Documentation

License

About

Uh oh!

Releases

Packages

Languages

License

jitsi/opus-transcriber-proxy

Folders and files

Latest commit

History

Repository files navigation

opus-transcriber-proxy

Features

Quick Start

Installation

Prerequisites

Build

Docker

Configuration

Provider Selection

API Keys

Provider Options

Server

Dispatcher (Optional)

WebSocket Protocol

Connection

Client Messages

Server Messages

Supported Providers

Deployment

Node.js

Docker

Cloudflare Workers

Development

Project Structure

Adding a Backend

Debugging

Dump WebSocket Messages

Replay Recorded Session

Mix Recorded Audio

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages