@@ -241,7 +241,7 @@ Wyoming is a peer-to-peer protocol for voice assistants that combines JSONL (JSO
241241- ** Wyoming Protocol + Opus Decoding** : Combines Wyoming session management with OMI Opus decoding
242242- ** Continuous Streaming** : OMI devices stream continuously, audio-start/stop events are optional
243243- ** Timestamp Preservation** : Uses timestamps from Wyoming headers when provided
244- - ** OMI-Optimized ** : Hardcoded 16kHz mono format for OMI device compatibility
244+ - ** Dynamic Sample Rate ** : Automatically detects and adapts to client sample rate (typically 16kHz for OMI devices, but supports other rates)
245245
246246** Simple Backend (` /ws ` )** :
247247- ** Minimal Wyoming Support** : Parses audio-chunk events, silently ignores control events
@@ -317,6 +317,24 @@ client_state = ClientState(
317317- ** Connection Tracking** : Real-time monitoring of active clients
318318- ** State Management** : Simplified client state for conversation tracking only
319319- ** Centralized Processing** : Application-level processors handle all background tasks
320+ - ** Dynamic Sample Rate** : Client state tracks actual sample rate from audio chunks
321+ - ** Audio Buffer Management** : Sophisticated buffer system with timing and collection management
322+
323+ ### Audio Buffer Management
324+
325+ The system implements advanced audio buffer management for reliable processing:
326+
327+ ** Buffer Collection** :
328+ - ** Retention** : Configurable buffer retention (default 120 seconds for speaker identification)
329+ - ** Timeout** : 1.5 minute collection timeout to prevent indefinite buffering
330+ - ** Isolation** : Each client maintains isolated buffer state
331+ - ** Dynamic Sizing** : Adapts to actual sample rate and chunk sizes
332+
333+ ** Buffer State Tracking** :
334+ - Sample rate detection from incoming audio chunks
335+ - Automatic fallback to default rates when not specified
336+ - Buffer timing synchronization for accurate segment extraction
337+ - Memory-efficient circular buffer implementation
320338
321339### Application-Level Processing Architecture
322340
@@ -785,6 +803,45 @@ flowchart TB
7858035 . ** Authorization** : Per-endpoint permission checking with simplified ownership validation
7868046 . ** Data Isolation** : User-scoped data access via client ID mapping and ownership validation
787805
806+ ## Speaker Recognition Integration
807+
808+ The advanced backend integrates with an external speaker recognition service for real-time speaker identification during conversations.
809+
810+ ### Integration Architecture
811+
812+ ** Service Communication** :
813+ - ** HTTP API** : RESTful endpoints for speaker enrollment and management
814+ - ** Real-time Processing** : Speaker identification during live transcription
815+ - ** Asynchronous Pipeline** : Non-blocking speaker identification parallel to transcription
816+
817+ ** Key Features** :
818+ - ** Dynamic Enrollment** : Add speakers through audio samples
819+ - ** Live Identification** : Real-time speaker recognition during conversations
820+ - ** Confidence Scoring** : Adjustable thresholds for identification accuracy
821+ - ** Multi-speaker Support** : Handles conversations with multiple participants
822+
823+ ### Speaker Recognition Flow
824+
825+ 1 . ** Audio Collection** : Capture audio chunks with proper buffering
826+ 2 . ** Feature Extraction** : Generate speaker embeddings from audio segments
827+ 3 . ** Identity Matching** : Compare against enrolled speaker database
828+ 4 . ** Result Integration** : Enhance transcripts with speaker identification
829+
830+ ### Configuration
831+
832+ ``` yaml
833+ # Environment variables for speaker recognition
834+ SPEAKER_SERVICE_URL : " http://speaker-recognition:8001"
835+ SPEAKER_CONFIDENCE_THRESHOLD : 0.15 # Adjustable confidence level
836+ ` ` `
837+
838+ ### API Endpoints
839+
840+ - ` POST /api/speaker/enroll` - Enroll new speaker with audio samples
841+ - ` GET /api/speaker/list` - List enrolled speakers
842+ - ` POST /api/speaker/identify` - Identify speaker from audio segment
843+ - ` DELETE /api/speaker/{speaker_id}` - Remove enrolled speaker
844+
788845# # Security Architecture
789846
790847# ## Authentication Layers
0 commit comments