Skip to content

Conversation

@vishesh-sachan
Copy link
Member

Summary

This PR adds the most requested feature: an extensive recording overlay system with real-time audio visualization, transformation, and cross-recorder support. The overlay provides visual feedback during recording, transcription, and AI transformations across all recording methods (Navigator, CPAL, and FFmpeg).

Type of Change

  • feat: New feature
  • docs: Documentation update

Related Issue

Closes #848, #1129

Changes Made

Recording Overlay System

  • Real-time audio visualization with 9-bar frequency spectrum display
  • Always-on-top overlay window that follows your cursor across monitors
  • Three display modes:
    • Recording: Animated audio bars showing mic input levels
    • Transcribing: Pulsing text animation during transcription
    • Transforming: Pulsing text animation during AI transformations
  • Configurable positioning: Top, Bottom, or None (disabled)
  • Cancel button: Stop recording directly from the overlay

Centralized Overlay Service Architecture

  • Type-safe TypeScript service (OverlayService) for managing overlay state
  • Unified Rust commands replacing scattered overlay logic
  • Event-driven communication between main window and overlay window
  • Settings integration: Position automatically read from user preferences
  • Error handling: Full Result<T, E> pattern with user-friendly error messages

Cross-Recorder Support

  • Navigator (Browser): RMS-based audio analysis with 5.0x amplification
  • CPAL (Rust): Native audio capture with 8.0x amplification and event forwarding
  • FFmpeg: Pulsing animation fallback (CLI tool provides no real-time levels)

Transformation Progress

  • Clipboard transformations: Shows "Transforming..." overlay during API calls
  • Recording transformations: Shows progress after transcription completes
  • Automatic hide: Overlay disappears when transformation finishes

Svelte inspector disabled

  • Vite plugin inspector set to showToggleButton: 'never' to prevent interference with overlay window

📁 Files Changed

New Files (10)

  • src/overlay/ - New overlay window entry point
    • index.html - Overlay HTML page
    • main.ts - Overlay app initialization
    • RecordingOverlay.svelte - Main overlay UI component
    • RecordingOverlay.css - Overlay styling
    • icons.ts - SVG icons (microphone, transcription, cancel)
  • src/lib/services/overlay/ - Overlay service module
    • overlay-service.ts - Main service class
    • types.ts - TypeScript types
    • index.ts - Module exports
  • src/lib/services/cpal-audio-forwarder.ts - Bridge for CPAL audio events
  • docs/overlay-service-architecture.md - Architecture documentation
  • docs/overlay-service-developer-guide.md - Developer guide

Modified Files (15)

  • src/lib/services/recorder/ - Recorder integrations
    • navigator.ts - Added overlay service calls
    • cpal.ts - Added overlay service calls
    • ffmpeg.ts - Added overlay service with pulsing fallback
  • src/lib/query/actions.ts - Added overlay to transformation pipeline
  • src/lib/services/audio-levels.ts - RMS time-domain audio analysis
  • src/routes/(app)/+layout.svelte - Added overlay cancel event listener
  • src-tauri/src/overlay.rs - Unified overlay commands
  • src-tauri/src/lib.rs - Registered overlay commands
  • vite.overlay.config.ts - Separate Vite config for overlay build
  • vite.config.ts - Added overlay dev server middleware
  • package.json - Updated build scripts
  • And more...

Testing

Prerequisites

⚠️ IMPORTANT: New packages and permissions were added. You must:

# 1. Install new dependencies
bun install

# 2. Register new Tauri permissions
cd apps/whispering/src-tauri
cargo build  # This will register the new event listener permissions

# 3. If on macOS, you may need to re-grant microphone permissions
# System Settings → Privacy & Security → Microphone → Re-enable Whispering

Testing the Overlay

1. Basic Recording Test

  1. Start Whispering in dev mode: bun run dev
  2. Navigate to Settings → Recorder → Overlay Position
  3. Select "Bottom" or "Top" position
  4. Click "Preview Overlay" button → Overlay should appear for 3 seconds
  5. Start a manual recording (⌘+Shift+R or equivalent)
  6. Expected: Overlay appears with animated audio bars
  7. Speak into microphone
  8. Expected: Bars respond to audio input (Navigator/CPAL) or pulse (FFmpeg)
  9. Stop recording
  10. Expected: Overlay switches to "Transcribing..." with pulsing animation
  11. Expected: Overlay hides after transcription completes

2. Cancel Button Test

  1. Start a manual recording
  2. Click the X button in the overlay
  3. Expected: Recording stops immediately, overlay hides
  4. Expected: No transcription occurs
  5. Expected: Audio file is deleted

3. Transformation Test (Clipboard)

  1. Copy some text: "Hello world"
  2. Trigger clipboard transformation (⌘+Shift+T or equivalent)
  3. Expected: Overlay appears showing "Transforming..."
  4. Expected: Overlay hides when transformation completes
  5. Expected: Transformed text is pasted

4. Transformation Test (Recording)

  1. Start a recording, speak, and stop
  2. Wait for transcription to complete
  3. Select a transformation (e.g., "Fix grammar")
  4. Expected: Overlay appears showing "Transforming..."
  5. Expected: Overlay hides when transformation completes

5. FFmpeg Recorder Test

  1. Go to Settings → Recorder → Method
  2. Select "FFmpeg"
  3. Start a recording
  4. Expected: Overlay shows pulsing animated bars (no real audio levels)
  5. Stop recording
  6. Expected: Overlay switches to "Transcribing..."

6. Multi-Recorder Test

  1. Test with Navigator (browser): Real audio bars
  2. Test with CPAL (Rust native): Real audio bars
  3. Test with FFmpeg: Pulsing animation
  4. Expected: All three recorders show overlay correctly

7. Position Test

  1. Change overlay position in settings:
    • Test "Top" position
    • Test "Bottom" position
    • Test "None" (overlay disabled)
  2. Expected: Overlay appears at correct position or not at all

8. Multi-Monitor Test (if avail

  1. Move cursor to different monitor
  2. Start recording
  3. Expected: Overlay appears on the monitor with cursor (currently uses primary monitor - known limitation)

Screenshots/Recordings

Screenshot 2025-12-21 at 5 39 34 PM
rec.mov

📖 Documentation

Comprehensive documentation added:

  • Architecture Guide (apps/whispering/docs/overlay-service-architecture.md): System design, data flow, audio pipelines
  • Developer Guide (apps/whisperingdocs/overlay-service-developer-guide.md): Step-by-step integration examples

Checklist

  • All recorders integrated (Navigator, CPAL, FFmpeg)
  • Transformation pipeline integrated
  • Cancel button working from overlay
  • Settings integration (position preference)
  • Error handling with Result types
  • Type-safe API with TypeScript
  • Documentation (architecture + developer guide)
  • Audio level forwarding (CPAL bridge)
  • Pulsing animation fallback (FFmpeg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Meta: Restore / Add Minimize-to-Tray or Close-to-Tray Support (with Visualizer Enhancements)

1 participant