SimpleOpenSoftware
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 17 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎backends/advanced/.env.template‎
Lines changed: 3 additions & 1 deletion b/‎backends/advanced/.env.template‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎backends/advanced/Docs/README_speaker_enrollment.md‎
Lines changed: 4 additions & 4 deletions b/‎backends/advanced/Docs/README_speaker_enrollment.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎backends/advanced/Docs/architecture.md‎
Lines changed: 58 additions & 1 deletion b/‎backends/advanced/Docs/architecture.md‎
Lines changed: 58 additions & 1 deletion
diff --git a/‎backends/advanced/Docs/memory-configuration-guide.md‎
Lines changed: 132 additions & 0 deletions b/‎backends/advanced/Docs/memory-configuration-guide.md‎
Lines changed: 132 additions & 0 deletions
diff --git a/‎backends/advanced/docker-compose.yml‎
Lines changed: 1 addition & 31 deletions b/‎backends/advanced/docker-compose.yml‎
Lines changed: 1 addition & 31 deletions
@@ -42,3 +42,9 @@ extras/speaker-omni-experimental/cache/*
 
 # AI Stuff
 .claude
+
+# SSL
+extras/speaker-recognition/ssl/*
+
+# nginx
+extras/speaker-recognition/nginx.conf
@@ -307,6 +307,11 @@ websocket.send(JSON.stringify(audioStop) + '\n');
 ### Code Style
 - **Python**: Black formatter with 100-character line length, isort for imports
 - **TypeScript**: Standard React Native conventions
+- **Import Guidelines**: 
+  - NEVER import modules in the middle of functions or files
+  - ALL imports must be at the top of the file after the docstring
+  - Use lazy imports sparingly and only when absolutely necessary for circular import issues
+  - Group imports: standard library, third-party, local imports
 
 ### Health Monitoring
 The system includes comprehensive health checks:
@@ -405,6 +410,11 @@ Access via: `extras/speaker-recognition/webui` → Live Inference
 3. Adjust speaker identification settings (confidence threshold)
 4. Start live session to begin real-time transcription and speaker ID
 
+**Technical Details:**
+- **Audio Processing**: Uses browser's native sample rate (typically 44.1kHz or 48kHz, not hardcoded 16kHz)
+- **Buffer Retention**: 120 seconds of audio for improved utterance capture
+- **Real-time Updates**: Live transcription with speaker identification results
+
 #### Using Speaker Analysis
 1. Go to Speakers page → Embedding Analysis tab
 2. Select analysis method (UMAP, t-SNE, PCA)
@@ -418,6 +428,13 @@ Access via: `extras/speaker-recognition/webui` → Live Inference
 - Live inference requires Deepgram API key for streaming transcription
 - Speaker identification uses existing enrolled speakers from database
 
+### Live Inference Troubleshooting
+- **"NaN:NaN" timestamps**: Fixed in recent updates, ensure you're using latest version
+- **Poor speaker identification**: Try adjusting confidence threshold or re-enrolling speakers
+- **Audio processing delays**: Check browser console for sample rate detection logs
+- **Buffer overflow issues**: Extended to 120-second retention for better performance
+- **"extraction_failed" errors**: Usually indicates audio buffer timing issues - check console logs for buffer availability
+
 ## Notes for Claude
 Check if the src/ is volume mounted. If not, do compose build so that code changes are reflected. Do not simply run `docker compose restart` as it will not rebuild the image.
 Check backend/advanced-backend/Docs for up to date information on advanced backend.
 
@@ -83,11 +83,13 @@ DEBUG_DIR=./data/debug_dir
 # ========================================
 # These settings control how the browser accesses the backend for audio playback
 
-# The IP address or hostname where your backend is publicly accessible
+# The IP address or hostname where your backend is publicly accessible from the browser
 # Examples:
 #   - For local development: localhost or 127.0.0.1
 #   - For LAN access: your machine's IP (e.g., 192.168.1.100)
+#   - For VPN/Tailscale access: your VPN IP (e.g., 100.64.x.x for Tailscale)
 #   - For internet access: your domain or public IP (e.g., friend.example.com)
+# Note: This must be accessible from your browser, not from the Docker container
 HOST_IP=localhost
 
 # Backend API port (where audio files are served)
 
@@ -181,10 +181,10 @@ Edit `speaker_recognition/speaker_recognition.py` to adjust:
 
 ### Audio Settings
 
-The system is configured for:
-- Sample rate: 16kHz  
-- Channels: Mono
-- Format: WAV files
+The system supports:
+- Sample rate: Dynamic detection (commonly 16kHz, 44.1kHz, or 48kHz)
+- Channels: Mono (stereo converted to mono automatically)
+- Format: WAV files (recommended), WebM, MP4
 
 ## Troubleshooting
 
 
@@ -241,7 +241,7 @@ Wyoming is a peer-to-peer protocol for voice assistants that combines JSONL (JSO
 - **Wyoming Protocol + Opus Decoding**: Combines Wyoming session management with OMI Opus decoding
 - **Continuous Streaming**: OMI devices stream continuously, audio-start/stop events are optional
 - **Timestamp Preservation**: Uses timestamps from Wyoming headers when provided
-- **OMI-Optimized**: Hardcoded 16kHz mono format for OMI device compatibility
+- **Dynamic Sample Rate**: Automatically detects and adapts to client sample rate (typically 16kHz for OMI devices, but supports other rates)
 
 **Simple Backend (`/ws`)**:
 - **Minimal Wyoming Support**: Parses audio-chunk events, silently ignores control events
@@ -317,6 +317,24 @@ client_state = ClientState(
 - **Connection Tracking**: Real-time monitoring of active clients
 - **State Management**: Simplified client state for conversation tracking only
 - **Centralized Processing**: Application-level processors handle all background tasks
+- **Dynamic Sample Rate**: Client state tracks actual sample rate from audio chunks
+- **Audio Buffer Management**: Sophisticated buffer system with timing and collection management
+
+### Audio Buffer Management
+
+The system implements advanced audio buffer management for reliable processing:
+
+**Buffer Collection**:
+- **Retention**: Configurable buffer retention (default 120 seconds for speaker identification)
+- **Timeout**: 1.5 minute collection timeout to prevent indefinite buffering
+- **Isolation**: Each client maintains isolated buffer state
+- **Dynamic Sizing**: Adapts to actual sample rate and chunk sizes
+
+**Buffer State Tracking**:
+- Sample rate detection from incoming audio chunks
+- Automatic fallback to default rates when not specified
+- Buffer timing synchronization for accurate segment extraction
+- Memory-efficient circular buffer implementation
 
 ### Application-Level Processing Architecture
 
@@ -785,6 +803,45 @@ flowchart TB
 5. **Authorization**: Per-endpoint permission checking with simplified ownership validation
 6. **Data Isolation**: User-scoped data access via client ID mapping and ownership validation
 
+## Speaker Recognition Integration
+
+The advanced backend integrates with an external speaker recognition service for real-time speaker identification during conversations.
+
+### Integration Architecture
+
+**Service Communication**:
+- **HTTP API**: RESTful endpoints for speaker enrollment and management
+- **Real-time Processing**: Speaker identification during live transcription
+- **Asynchronous Pipeline**: Non-blocking speaker identification parallel to transcription
+
+**Key Features**:
+- **Dynamic Enrollment**: Add speakers through audio samples
+- **Live Identification**: Real-time speaker recognition during conversations
+- **Confidence Scoring**: Adjustable thresholds for identification accuracy
+- **Multi-speaker Support**: Handles conversations with multiple participants
+
+### Speaker Recognition Flow
+
+1. **Audio Collection**: Capture audio chunks with proper buffering
+2. **Feature Extraction**: Generate speaker embeddings from audio segments
+3. **Identity Matching**: Compare against enrolled speaker database
+4. **Result Integration**: Enhance transcripts with speaker identification
+
+### Configuration
+
+```yaml
+# Environment variables for speaker recognition
+SPEAKER_SERVICE_URL: "http://speaker-recognition:8001"
+SPEAKER_CONFIDENCE_THRESHOLD: 0.15  # Adjustable confidence level
+```
+
+### API Endpoints
+
+- `POST /api/speaker/enroll` - Enroll new speaker with audio samples
+- `GET /api/speaker/list` - List enrolled speakers
+- `POST /api/speaker/identify` - Identify speaker from audio segment
+- `DELETE /api/speaker/{speaker_id}` - Remove enrolled speaker
+
 ## Security Architecture
 
 ### Authentication Layers
 
@@ -0,0 +1,132 @@
+# Memory Configuration Guide
+
+This guide helps you set up and configure the memory system for the Friend Advanced Backend.
+
+## Quick Start
+
+1. **Copy the template configuration**:
+```bash
+cp memory_config.yaml.template memory_config.yaml
+```
+
+2. **Edit memory_config.yaml** with your preferred settings:
+```yaml
+memory:
+  provider: "mem0"  # or "basic" for simpler setup
+  
+  # Provider-specific configuration
+  mem0:
+    model_provider: "openai"  # or "ollama" for local
+    embedding_model: "text-embedding-3-small"
+    llm_model: "gpt-4o-mini"
+```
+
+3. **Set environment variables** in `.env`:
+```bash
+# For OpenAI
+OPENAI_API_KEY=your-api-key
+
+# For Ollama (local)
+OLLAMA_BASE_URL=http://ollama:11434
+```
+
+## Configuration Options
+
+### Memory Providers
+
+#### mem0 (Recommended)
+Advanced memory system with semantic search and context awareness.
+
+**Configuration**:
+```yaml
+memory:
+  provider: "mem0"
+  mem0:
+    model_provider: "openai"  # or "ollama"
+    embedding_model: "text-embedding-3-small"
+    llm_model: "gpt-4o-mini"
+    prompt_template: "custom_prompt_here"  # Optional
+```
+
+#### basic
+Simple memory storage without advanced features.
+
+**Configuration**:
+```yaml
+memory:
+  provider: "basic"
+  # No additional configuration needed
+```
+
+### Model Selection
+
+#### OpenAI Models
+- **LLM**: `gpt-4o-mini`, `gpt-4o`, `gpt-3.5-turbo`
+- **Embeddings**: `text-embedding-3-small`, `text-embedding-3-large`
+
+#### Ollama Models (Local)
+- **LLM**: `llama3`, `mistral`, `qwen2.5`
+- **Embeddings**: `nomic-embed-text`, `all-minilm`
+
+## Hot Reload
+
+The configuration supports hot reloading - changes are applied automatically without restarting the service.
+
+## Validation
+
+The system validates your configuration on startup and logs any issues:
+- Missing required fields
+- Invalid provider names
+- Incompatible model combinations
+
+## Troubleshooting
+
+### Common Issues
+
+1. **"Provider not found"**: Check spelling in `provider` field
+2. **"API key missing"**: Ensure environment variables are set
+3. **"Model not available"**: Verify model names match provider's available models
+4. **"Connection refused"**: Check Ollama is running if using local models
+
+### Debug Mode
+
+Enable debug logging by setting:
+```bash
+DEBUG=true
+```
+
+This provides detailed information about memory processing and configuration loading.
+
+## Examples
+
+### OpenAI Setup
+```yaml
+memory:
+  provider: "mem0"
+  mem0:
+    model_provider: "openai"
+    embedding_model: "text-embedding-3-small"
+    llm_model: "gpt-4o-mini"
+```
+
+### Local Ollama Setup
+```yaml
+memory:
+  provider: "mem0"
+  mem0:
+    model_provider: "ollama"
+    embedding_model: "nomic-embed-text"
+    llm_model: "llama3"
+```
+
+### Minimal Setup
+```yaml
+memory:
+  provider: "basic"
+```
+
+## Next Steps
+
+- Configure action items detection in `memory_config.yaml`
+- Set up custom prompt templates for your use case
+- Monitor memory processing in the debug dashboard
@@ -104,36 +104,6 @@ services:
   #     - ./nginx.conf:/etc/nginx/nginx.conf:ro
   #   ports: ["80:80"]          # publish once; ngrok points here
 
-  # speaker-recognition:
-  #   build:
-  #     context: ../../extras/speaker-recognition
-  #     dockerfile: Dockerfile
-  #   # image: speaker-recognition:latest
-  #   ports:
-  #     - "8001:8001"
-  #   volumes:
-  #     # Persist Hugging Face cache (models) between restarts
-  #     - ./data/speaker_model_cache:/models
-  #     - ./data/audio_chunks:/app/audio_chunks  # Share audio chunks with backend
-  #     - ./data/speaker_debug:/app/debug
-  #   deploy:
-  #     resources:
-  #       reservations:
-  #         devices:
-  #           - driver: nvidia
-  #             count: all
-  #             capabilities: [gpu]
-  #   environment:
-  #     - HF_HOME=/models
-  #     - HF_TOKEN=${HF_TOKEN}
-  #     - SIMILARITY_THRESHOLD=0.85
-  #   restart: unless-stopped
-  #   healthcheck:
-  #     test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
-  #     interval: 30s
-  #     timeout: 10s
-  #     retries: 3
-    
   # ollama:
   #   image: ollama/ollama:latest
   #   container_name: ollama
@@ -172,4 +142,4 @@ services:
 #   neo4j_data:
 #     driver: local
 #   neo4j_logs:
-#     driver: local
+#     driver: local