Add Audio Input Widget #7363

MarcSkovMadsen · 2024-10-06T07:58:45Z

Closing #4048.

For now this is exploration in order to be able to design the widget.

Related issues

Make it easy to play audio input in Audio pane #7035
pn.pane.Audio fails with stereo numpy ndarray #7090
Integrate SpeechToText with ChatInterface #7021 (We should show how to integrate audio input with chat)
Enable easy AI workflows #4861

Design

Inspiration

Gradio Real time speech recognition: https://www.gradio.app/guides/real-time-speech-recognition
- https://github.com/gradio-app/gradio/blob/main/gradio/components/audio.py
- https://github.com/gradio-app/gradio/blob/main/js/audio/recorder/AudioRecorder.svelte
Conversion in the browser: https://stackoverflow.com/questions/57365486/converting-blob-webm-to-audio-file-wav-or-mp3
Wave Surfer: https://wavesurfer.xyz/docs/ and recording example https://wavesurfer.xyz/examples/?record.js
Streamlit Experimental Audio Recorder https://docs.streamlit.io/develop/api-reference/widgets/st.audio_input and https://github.com/streamlit/streamlit/tree/develop/frontend/lib/src/components/widgets/AudioInput.
Streamlit Audio Recorder, https://github.com/stefanrmmr/streamlit-audio-recorder, Audio React Recorder: https://doppelgunner.github.io/audio-react-recorder/, https://github.com/doppelgunner/audio-react-recorder?tab=readme-ov-file
Audio Recorder Streamlit: https://github.com/Joooohan/audio-recorder-streamlit
Streamlit-audiorecorder: https://github.com/theevann/streamlit-audiorecorder and https://github.com/samhirtarif/react-audio-recorder
https://github.com/whitphx/streamlit-webrtc
OpenAI Realtime API https://platform.openai.com/docs/guides/realtime/overview

Questions

Design Decisions to be taken

Do we want to focus on Audio input or combine Audio and Video? The Media Stream Api Supports both?
- Audio.
- Video
What should the name be
- Microphone
- AudioInput
- AudioRecorder
Do we want to enable incremental streaming? Or just sending value when recording is finished?
- Final Value
- Streaming value
Do we want to support more value formats than default webm? mp3, ogg, wav etc. Converting on client side might require cross origin isolation. Converting on server side might require ffmpeg installation.
- mp3, ogg, wav
- Conversion on client side
Do we want bare minimum UI (Start, Stop, Pause)? Or extra features:
- submit button
- playback button?
- wave graph?
- editing possibilities?
Do we want compact UI like Streamlit or large UI like Gradio?
- Compact UI
- Large UI
Do we want to build on raw Media Stream Recording API or library?
- Wavesurfer https://wavesurfer.xyz/docs/ (What Gradio and Streamlit are using)
- React Audio Recorder https://github.com/samhirtarif/react-audio-recorder (looks really simple to implement. But uses React).
Do we want to make it easy for users to
- Play the value in the Audio pane?
- work with the value as a Numpy Array?
- work with the value as a data url?
- Save the value to a file?
How do we most efficiently transfer the media from client to server
- Bytes (I don't yet know how to do this)
- data url (This is easily doable)

Items marked with [x] are the choices we should select to implement.

codecov · 2024-10-06T08:08:43Z

Codecov Report

Attention: Patch coverage is 0% with 39 lines in your changes missing coverage. Please review.

Project coverage is 82.14%. Comparing base (5ef8909) to head (a0d9dce).

Files with missing lines	Patch %	Lines
panel/widgets/microphone.py	0.00%	39 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7363      +/-   ##
==========================================
- Coverage   82.21%   82.14%   -0.08%     
==========================================
  Files         337      338       +1     
  Lines       50513    50552      +39     
==========================================
- Hits        41529    41524       -5     
- Misses       8984     9028      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MarcSkovMadsen · 2024-10-06T08:14:52Z

If you are interested in audio input feel free to comment on questions above @philippjfr and @ahuang11 .

philippjfr · 2024-10-15T12:24:24Z

Not questions (yet), but is there a corresponding JS file?

MarcSkovMadsen · 2024-10-15T17:53:29Z

No. Currently just exploring the design.

add initial design

a0d9dce

MarcSkovMadsen added in progress need input from Philipp labels Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Audio Input Widget #7363

Add Audio Input Widget #7363

MarcSkovMadsen commented Oct 6, 2024 •

edited

Loading

codecov bot commented Oct 6, 2024 •

edited

Loading

MarcSkovMadsen commented Oct 6, 2024 •

edited

Loading

philippjfr commented Oct 15, 2024

MarcSkovMadsen commented Oct 15, 2024

Add Audio Input Widget #7363

Are you sure you want to change the base?

Add Audio Input Widget #7363

Conversation

MarcSkovMadsen commented Oct 6, 2024 • edited Loading

Related issues

Design

Inspiration

Questions

codecov bot commented Oct 6, 2024 • edited Loading

Codecov Report

MarcSkovMadsen commented Oct 6, 2024 • edited Loading

philippjfr commented Oct 15, 2024

MarcSkovMadsen commented Oct 15, 2024

MarcSkovMadsen commented Oct 6, 2024 •

edited

Loading

codecov bot commented Oct 6, 2024 •

edited

Loading

MarcSkovMadsen commented Oct 6, 2024 •

edited

Loading