Real-time WebSocket transcription proxy supporting multiple speech-to-text backends. Routes audio to OpenAI, Deepgram, or Google Gemini and streams transcription results back to clients.
- Multi-provider support - OpenAI Realtime, Deepgram Nova, Google Gemini
- Provider fallback - Configurable priority order with automatic failover
- Multi-participant sessions - Single WebSocket handles multiple audio streams
- Real-time streaming - Interim and final transcription results
- Flexible deployment - Node.js standalone or Cloudflare Workers with Containers
- Dispatcher integration - Forward transcriptions to external services
- Audio debugging - Dump and replay WebSocket sessions
# Install dependencies
npm install
# Build WASM decoder (first time only)
npm run configure # Install Emscripten
npm run build:wasm
# Set API key(s)
export OPENAI_API_KEY=sk-...
# or
export DEEPGRAM_API_KEY=...
# Start server
npm run devConnect via WebSocket:
ws://localhost:8080/transcribe?sessionId=test&sendBack=true
- Node.js 22+
- Emscripten (for WASM compilation)
npm install
npm run configure # Setup Emscripten (one-time)
npm run build # Build WASM + TypeScript + bundlenpm run docker:build
npm run docker:runSet environment variables or use a .env file:
| Variable | Default | Description |
|---|---|---|
PROVIDERS_PRIORITY |
openai,deepgram,gemini |
Provider priority order |
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
DEEPGRAM_API_KEY |
Deepgram API key |
GEMINI_API_KEY |
Google Gemini API key |
| Variable | Default | Description |
|---|---|---|
OPENAI_MODEL |
gpt-4o-mini-transcribe |
OpenAI model |
DEEPGRAM_MODEL |
nova-2 |
Deepgram model |
DEEPGRAM_LANGUAGE |
multi |
Language code or multi for auto |
DEEPGRAM_ENCODING |
linear16 |
linear16, opus, or ogg-opus |
GEMINI_MODEL |
gemini-2.0-flash-exp |
Gemini model |
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
Listen port |
HOST |
0.0.0.0 |
Listen address |
DEBUG |
false |
Enable debug logging |
FORCE_COMMIT_TIMEOUT |
2 |
Seconds before finalizing pending audio |
| Variable | Default | Description |
|---|---|---|
USE_DISPATCHER |
false |
Enable dispatcher forwarding |
DISPATCHER_WS_URL |
(empty) | Dispatcher WebSocket URL |
DISPATCHER_HEADERS |
{} |
Auth headers (JSON) |
See DISPATCHER_INTEGRATION.md for details.
ws://host:port/transcribe?sessionId=xxx&sendBack=true
Query Parameters:
| Parameter | Default | Description |
|---|---|---|
sessionId |
(required) | Session identifier |
sendBack |
false |
Return final transcriptions |
sendBackInterim |
false |
Return interim transcriptions |
provider |
(auto) | Override provider selection |
encoding |
opus |
Audio encoding: opus or ogg-opus |
lang |
(auto) | Language hint |
Audio data:
{
"event": "media",
"media": {
"tag": "participant-id",
"chunk": 0,
"timestamp": 1768341932,
"payload": "base64-encoded-audio"
}
}Ping:
{"event": "ping", "id": 123}Transcription result:
{
"type": "transcription-result",
"is_interim": false,
"transcript": [{"text": "hello world", "confidence": 0.98}],
"participant": {"id": "participant-id"},
"timestamp": 1768341932000,
"language": "en"
}Pong:
{"event": "pong", "id": 123}| Provider | Features |
|---|---|
| OpenAI | Server VAD, confidence scores, streaming |
| Deepgram | Punctuation, diarization, code-switching, streaming |
| Gemini | Multimodal, multilingual |
See BACKENDS.md for detailed comparison and configuration.
npm startdocker build -t opus-transcriber-proxy .
docker run -p 8080:8080 -e OPENAI_API_KEY=sk-... opus-transcriber-proxynpm run cf:deploySee CLOUDFLARE_DEPLOYMENT.md for setup instructions.
npm run dev # Dev server with hot reload
npm run test # Run tests
npm run typecheck # Type checkingsrc/
├── server.ts # HTTP/WebSocket server
├── transcriberproxy.ts # Main proxy orchestration
├── OutgoingConnection.ts # Per-participant backend handler
├── config.ts # Configuration
├── backends/ # Transcription backends
│ ├── OpenAIBackend.ts
│ ├── DeepgramBackend.ts
│ └── GeminiBackend.ts
└── OpusDecoder/ # WASM Opus decoder
worker/
└── index.ts # Cloudflare Worker entry
- Create
src/backends/YourBackend.tsimplementingTranscriptionBackend - Add configuration to
src/config.ts - Register in
src/backends/BackendFactory.ts
See BACKENDS.md for the template and details.
DUMP_WEBSOCKET_MESSAGES=true npm run dev
# Messages saved to /tmp/{sessionId}/media.jsonlnode scripts/replay-dump.cjs media.jsonl "ws://localhost:8080/transcribe?sendBack=true"npm run mix-audio /tmp/session123/media.jsonl output.wavSee WEBSOCKET_DUMP.md and AUDIO_MIXING.md.
- BACKENDS.md - Provider details and comparison
- CLOUDFLARE_DEPLOYMENT.md - Cloudflare setup
- DISPATCHER_INTEGRATION.md - External dispatcher
- CONTAINER_ROUTING.md - Container routing modes
- WEBSOCKET_DUMP.md - Message debugging
- AUDIO_MIXING.md - Audio extraction tool
Apache 2.0