-
Notifications
You must be signed in to change notification settings - Fork 61.1k
Feature/ support (alibaba tts + alibaba function calling + network search) #6588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/ support (alibaba tts + alibaba function calling + network search) #6588
Conversation
WalkthroughIntroduces Yarn/Volta tooling, widens LLM API speech types and adds optional streaming API, implements Alibaba (Qwen) SSE PCM audio streaming, refactors TTS config and player into a streaming-capable manager, adds a network toggle and config, and adds locale strings for the toggle. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Chat UI
participant Client as ClientApi.llm
participant Qwen as Alibaba SSE
participant Player as TTSPlayManager
UI->>Client: request streamSpeech(options, audioManager)
Client->>Qwen: POST /Alibaba.SpeechPath (SSE, params include tools/enable_search)
UI->>Player: startStreamPlay(onended)
loop SSE chunks
Qwen-->>Client: SSE chunk (data: JSON with output.audio.data = PCM base64)
Client->>Client: decode base64 → AudioBuffer (pcmBase64ToAudioBuffer)
Client-->>UI: yield AudioBuffer
UI->>Player: addToQueue(AudioBuffer)
end
Client-->>UI: stream complete
UI->>Player: finishStreamPlay()
Player-->>UI: onended()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (2)
app/components/chat.tsx (2)
287-287
: Remove unused state variablespeechCooldown
.This state variable and its setter are declared but never used throughout the component. Consider removing it to reduce code clutter.
- const [speechCooldown, setSpeechCooldown] = useState(false);
302-311
: Add error handling for Edge TTS branch.The Edge TTS implementation lacks try-catch error handling that exists in other branches. This could lead to unhandled promise rejections.
if (config.ttsConfig.engine === "Edge") { const edgeVoiceName = accessStore.edgeVoiceName(); const tts = new MsEdgeTTS(); - await tts.setMetadata( - edgeVoiceName, - OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3, - ); - audioBuffer = await tts.toArrayBuffer(textContent); - playSpeech(audioBuffer); + try { + await tts.setMetadata( + edgeVoiceName, + OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3, + ); + audioBuffer = await tts.toArrayBuffer(textContent); + playSpeech(audioBuffer); + } catch (e) { + console.error("[Edge Speech]", e); + showToast(prettyObject(e)); + setSpeechStatus(false); + setSpeechLoading(false); + }
🧹 Nitpick comments (3)
app/utils/audio.ts (2)
57-58
: Translate Chinese comment to English for consistency.Please translate the comment to maintain code consistency across the codebase.
- // 在流模式下,如果队列为空但流还没结束,等待 + // In stream mode, if queue is empty but stream hasn't finished, wait
109-114
: Consider logging when rejecting audio buffer additions to finished streams.When
streamFinished
is true, the method silently ignores the addition. This could make debugging difficult.const addToQueue = (audioBuffer: ArrayBuffer | AudioBuffer) => { if (streamFinished) { + console.debug("[TTSPlayer] Cannot add to queue - stream already finished"); return; } playQueue.push(audioBuffer); };
app/constant.ts (1)
483-488
: Document the empty arrays for Edge TTS configuration.The Edge TTS configuration has empty
Model
andVoices
arrays. If this is intentional for dynamic loading, consider adding a comment to clarify this design decision.export const EDGE_TTS = { Provider: "Edge" as const, ModelProvider: ModelProvider.GPT, + // Models and voices are dynamically loaded for Edge TTS Model: [] as string[], Voices: [] as string[], } as const;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.yarnrc.yml
(1 hunks)app/client/api.ts
(1 hunks)app/client/platforms/alibaba.ts
(3 hunks)app/components/chat.tsx
(1 hunks)app/components/tts-config.tsx
(4 hunks)app/constant.ts
(3 hunks)app/store/config.ts
(3 hunks)app/utils/audio.ts
(2 hunks)package.json
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
app/store/config.ts (1)
Learnt from: Noctiro
PR: #6139
File: app/store/access.ts:251-258
Timestamp: 2025-02-04T03:46:50.895Z
Learning: The getModelProvider function in app/utils/model.ts uses safe string operations to parse model names with optional provider (format: modelName@providerId), making explicit error handling unnecessary as it always returns a valid [model, provider] tuple where provider can be undefined.
🧬 Code Graph Analysis (1)
app/store/config.ts (1)
app/constant.ts (2)
DEFAULT_TTS_ENGINE
(465-465)TTSEngineType
(490-490)
🔇 Additional comments (14)
.yarnrc.yml (1)
1-1
: LGTM! Valid Yarn configuration.Setting
nodeLinker
tonode-modules
is a standard configuration that switches from Yarn's PnP to traditional node_modules structure, which can improve compatibility with certain dependencies.app/client/api.ts (1)
110-111
: Well-designed API extension for streaming TTS support.The changes properly extend the
LLMApi
interface to support streaming speech synthesis:
- The
speech
method maintains backward compatibility by supporting bothArrayBuffer
andAudioBuffer
return types- The optional
streamSpeech
method allows implementations to provide streaming audio without breaking existing code- The
AsyncGenerator<AudioBuffer>
type is appropriate for streaming audio dataapp/store/config.ts (2)
90-91
: LGTM! Well-structured TTS configuration extension.The addition of
modelProvider
field with proper typing and the updatedengine
typing toTTSEngineType
properly supports the expanded TTS engine functionality while maintaining type safety.
143-145
: LGTM! Consistent validator pattern.The new
modelProvider
validator follows the established pattern used by other validators in the codebase and provides proper type safety.app/components/chat.tsx (3)
296-296
: Good refactoring for dynamic TTS provider support.The change from hardcoded provider to configuration-based provider selection improves flexibility and supports multiple TTS engines.
312-337
: Well-implemented streaming TTS with proper error handling.The streaming implementation correctly:
- Manages playback state before starting the stream
- Uses async generator pattern for chunk processing
- Handles errors with appropriate user feedback
- Ensures loading state is cleared in the finally block
1350-1362
: Good extraction of common audio playback logic.The
playSpeech
helper function effectively eliminates code duplication and provides consistent error handling across different TTS branches.app/utils/audio.ts (2)
32-53
: Good implementation of flexible audio buffer handling!The play method now elegantly handles both
ArrayBuffer
andAudioBuffer
types, maintaining backward compatibility while supporting pre-decoded audio buffers for better performance.
132-149
: Excellent async handling in stop method!The stop method properly awaits
audioContext.close()
and comprehensively resets all state variables, ensuring clean shutdown.app/constant.ts (2)
235-235
: LGTM!The addition of
SpeechPath
to the Alibaba configuration is consistent with the existing API path structure.
465-507
: Excellent refactoring of TTS configuration!The transformation from simple constants to a structured, typed configuration system is well-designed:
- Type safety with const assertions
- Centralized configuration management via
TTS_CONFIGS
- Clear separation of concerns for different TTS providers
- Extensible design that supports adding new providers easily
app/client/platforms/alibaba.ts (3)
101-161
: Well-implemented streaming speech functionality!The
streamSpeech
method is well-structured with:
- Proper async generator pattern for streaming
- Comprehensive error handling and timeout management
- Clean SSE parsing logic
- Appropriate cleanup with
reader.releaseLock()
398-409
: Efficient PCM conversion implementation.The
pcm16ToFloat32
method correctly handles:
- Little-endian byte order
- Signed 16-bit integer conversion
- Proper normalization to [-1, 1] range
368-369
: Audio parameters validated
The hardcoded values already match Alibaba Cloud’s Qwen-TTS output (mono, 24 kHz PCM16). You can mark them in code for clarity or expose them as config if you anticipate supporting other services.Suggested update:
private convertToAudioBuffer(pcmData: Uint8Array) { const audioContext = new (window.AudioContext || window.webkitAudioContext)(); + // Alibaba Qwen-TTS returns 16-bit PCM WAV: mono (1 channel) at 24 000 Hz const channels = 1; const sampleRate = 24000;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)
98-100
: The unimplemented speech method issue persists.This is the same issue flagged in the previous review - the method still throws an unimplemented error without clear documentation about why Alibaba only supports streaming speech.
🧹 Nitpick comments (3)
app/client/platforms/alibaba.ts (3)
102-162
: Well-structured streaming implementation with minor improvements needed.The async generator approach for streaming TTS is well-implemented with proper SSE handling, error management, and cleanup. However, consider these improvements:
- Add null safety for audio data access (Line 150)
- Improve timeout handling - the timeout is cleared only on success
- Enhance buffer parsing robustness
Apply these improvements:
- if (json.output.audio.data) { + if (json.output?.audio?.data) { yield this.PCMBase64ToAudioBuffer(json.output.audio.data); }const res = await fetch(speechPath, speechPayload); + clearTimeout(requestTimeoutId); // Clear timeout on successful connection const reader = res.body!.getReader();
345-363
: Consider using modern APIs and consistent error messages.The base64 decoding works correctly, but consider these improvements:
- Use Uint8Array.from() for cleaner conversion
- Consistent error messages in English
- // 解码 base64 - const binaryString = atob(base64Data); - const bytes = new Uint8Array(binaryString.length); - for (let i = 0; i < binaryString.length; i++) { - bytes[i] = binaryString.charCodeAt(i); - } + // Decode base64 + const binaryString = atob(base64Data); + const bytes = Uint8Array.from(binaryString, c => c.charCodeAt(0));- console.error("播放 PCM 数据失败:", error); + console.error("Failed to convert PCM data:", error);
366-400
: AudioContext reuse partially implemented, but could be optimized further.The lazy initialization of audioContext improves upon creating a new instance each time. However, the previous review suggestion of using a static getter method would be more efficient and follow better patterns.
The previous review's suggestion of a static AudioContext getter would still be beneficial:
+ private static audioContext: AudioContext | null = null; + + private static getAudioContext(): AudioContext { + if (!QwenApi.audioContext) { + QwenApi.audioContext = new (window.AudioContext || + window.webkitAudioContext)(); + } + return QwenApi.audioContext; + } private convertToAudioBuffer(pcmData: Uint8Array) { - if (!this.audioContext) { - this.audioContext = new (window.AudioContext || - window.webkitAudioContext)(); - } - const audioContext = this.audioContext; + const audioContext = QwenApi.getAudioContext();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/client/platforms/alibaba.ts
(4 hunks)
🔇 Additional comments (2)
app/client/platforms/alibaba.ts (2)
67-67
: Good addition of audioContext property for reuse.Adding the private audioContext property is a step in the right direction for reusing the AudioContext instance, which addresses performance concerns from the previous review.
402-413
: Correct PCM conversion implementation.The 16-bit PCM to 32-bit float conversion is mathematically sound with proper signed integer handling and normalization. The bit manipulation and range conversion are implemented correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)
101-103
: Previous review comment still applies.The speech method remains unimplemented as noted in the previous review. The suggestion to add documentation explaining why it's not implemented is still valid.
🧹 Nitpick comments (1)
app/client/platforms/alibaba.ts (1)
354-422
: Make audio parameters configurable and improve error messages.The audio processing implementation is technically sound, but consider these improvements:
- Hardcoded audio parameters: Consider making channels and sample rate configurable
- Internationalization: Replace Chinese error message with English
- Endianness assumption: Document that the PCM conversion assumes little-endian format
// 播放 PCM base64 数据 private async PCMBase64ToAudioBuffer(base64Data: string) { try { // 解码 base64 const binaryString = atob(base64Data); const bytes = new Uint8Array(binaryString.length); for (let i = 0; i < binaryString.length; i++) { bytes[i] = binaryString.charCodeAt(i); } // 转换为 AudioBuffer const audioBuffer = await this.convertToAudioBuffer(bytes); return audioBuffer; } catch (error) { - console.error("播放 PCM 数据失败:", error); + console.error("Failed to convert PCM data to AudioBuffer:", error); throw error; } } // 将 PCM 字节数据转换为 AudioBuffer -private convertToAudioBuffer(pcmData: Uint8Array) { +private convertToAudioBuffer(pcmData: Uint8Array, channels = 1, sampleRate = 24000) { if (!this.audioContext) { this.audioContext = new (window.AudioContext || window.webkitAudioContext)(); } const audioContext = this.audioContext; - const channels = 1; - const sampleRate = 24000; return new Promise<AudioBuffer>((resolve, reject) => {Consider adding JSDoc comments to document the endianness assumption:
-// 16位 PCM 转 32位浮点数 +/** + * Convert 16-bit PCM data to 32-bit float (assumes little-endian format) + */ private pcm16ToFloat32(pcmData: Uint8Array) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
app/icons/network.svg
is excluded by!**/*.svg
📒 Files selected for processing (25)
app/client/platforms/alibaba.ts
(8 hunks)app/components/chat.tsx
(6 hunks)app/locales/ar.ts
(1 hunks)app/locales/bn.ts
(1 hunks)app/locales/cn.ts
(1 hunks)app/locales/cs.ts
(1 hunks)app/locales/da.ts
(1 hunks)app/locales/de.ts
(2 hunks)app/locales/en.ts
(1 hunks)app/locales/es.ts
(1 hunks)app/locales/fr.ts
(1 hunks)app/locales/id.ts
(1 hunks)app/locales/it.ts
(1 hunks)app/locales/jp.ts
(1 hunks)app/locales/ko.ts
(1 hunks)app/locales/no.ts
(1 hunks)app/locales/pt.ts
(1 hunks)app/locales/ru.ts
(1 hunks)app/locales/sk.ts
(1 hunks)app/locales/tr.ts
(1 hunks)app/locales/tw.ts
(1 hunks)app/locales/vi.ts
(1 hunks)app/store/config.ts
(3 hunks)app/utils.ts
(2 hunks)public/plugins.json
(1 hunks)
✅ Files skipped from review due to trivial changes (21)
- app/locales/cs.ts
- public/plugins.json
- app/locales/it.ts
- app/locales/fr.ts
- app/locales/vi.ts
- app/locales/ko.ts
- app/locales/pt.ts
- app/locales/no.ts
- app/locales/jp.ts
- app/locales/id.ts
- app/locales/ar.ts
- app/locales/da.ts
- app/locales/sk.ts
- app/locales/en.ts
- app/locales/ru.ts
- app/locales/tw.ts
- app/locales/de.ts
- app/locales/tr.ts
- app/locales/bn.ts
- app/locales/cn.ts
- app/locales/es.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- app/utils.ts
- app/components/chat.tsx
- app/store/config.ts
🔇 Additional comments (3)
app/client/platforms/alibaba.ts (3)
14-14
: LGTM! Interface properly extended for function calling support.The addition of
FunctionToolItem
import and the newtools
andenable_search
fields inRequestParam
correctly support the function calling capabilities mentioned in the PR objectives.Also applies to: 60-61
70-70
: Good optimization! AudioContext reuse implemented.Adding the private
audioContext
property enables reuse of the AudioContext instance across multiple audio conversions, addressing the performance concern from previous reviews.
203-203
: LGTM! Function calling and network search properly integrated.The modifications correctly implement:
- Network search capability via
enable_search
parameter- Function calling support with proper tools integration
- Safe argument concatenation to prevent undefined errors
Also applies to: 236-240, 278-278
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
app/client/platforms/alibaba.ts (2)
101-103
: The previous review comment about documenting the unimplemented speech method still applies.The
speech
method continues to throw an unimplemented error without explanation. Consider adding a comment explaining that Alibaba only supports streaming speech viastreamSpeech
method.
105-165
: Previous suggestions for input validation, JSON parsing safety, and dynamic timeout haven't been implemented.The
streamSpeech
method still lacks:
- Input parameter validation for required fields
- Safe JSON parsing with try-catch for malformed SSE data
- Dynamic timeout using
getTimeoutMSByModel
instead of hardcodedREQUEST_TIMEOUT_MS
These improvements would enhance robustness and prevent runtime errors.
🧹 Nitpick comments (1)
app/client/platforms/alibaba.ts (1)
354-426
: Excellent audio processing implementation!The audio processing methods properly implement:
- Base64 decoding and PCM-to-AudioBuffer conversion
- Static AudioContext reuse (addressing previous performance suggestions)
- Correct 16-bit PCM to 32-bit float conversion with proper bit manipulation
- Appropriate error handling throughout the pipeline
Consider using English comments for consistency with the rest of the codebase:
- // 播放 PCM base64 数据 + // Convert PCM base64 data to AudioBuffer private async PCMBase64ToAudioBuffer(base64Data: string) { try { - // 解码 base64 + // Decode base64 const binaryString = atob(base64Data); // ... rest of method - // 将 PCM 字节数据转换为 AudioBuffer + // Convert PCM byte data to AudioBuffer private convertToAudioBuffer(pcmData: Uint8Array) { - // 16位 PCM 转 32位浮点数 + // Convert 16-bit PCM to 32-bit float private pcm16ToFloat32(pcmData: Uint8Array) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/client/platforms/alibaba.ts
(8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).
Applied to files:
app/client/platforms/alibaba.ts
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.
Applied to files:
app/client/platforms/alibaba.ts
🔇 Additional comments (3)
app/client/platforms/alibaba.ts (3)
2-7
: LGTM!The new imports for
REQUEST_TIMEOUT_MS
andFunctionToolItem
are properly used in the streaming speech and function calling implementations.Also applies to: 14-14
60-62
: LGTM!The interface extensions properly support function calling (
tools
) and network search (enable_search
) features. The staticaudioContext
property addresses the previous performance optimization suggestion.Also applies to: 70-70
245-245
: Passing an Empty Array for tools Is SafeThe
streamWithThink
signature inapp/utils/chat.ts
clearly definestools: any[]
. Supplying[]
simply omits thetools
field in the request payload (it only includestools
when the array is non-empty), which matches the intended “no tools” behavior. No further changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/components/chat.tsx (1)
1307-1307
: Unused state variable detected.The
speechCooldown
state is declared but doesn't appear to be used anywhere in the visible code. Consider removing it if it's not needed or implement its usage if it serves a purpose.- const [speechCooldown, setSpeechCooldown] = useState(false);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/components/chat.tsx
(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).
Applied to files:
app/components/chat.tsx
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.
Applied to files:
app/components/chat.tsx
🧬 Code Graph Analysis (1)
app/components/chat.tsx (6)
app/utils.ts (1)
canUseNetWork
(299-306)app/client/api.ts (2)
ClientApi
(137-230)config
(186-186)app/store/access.ts (1)
edgeVoiceName
(169-173)app/utils/ms_edge_tts.ts (1)
MsEdgeTTS
(119-391)app/components/ui-lib.tsx (1)
showToast
(232-256)app/utils/format.ts (1)
prettyObject
(1-13)
🔇 Additional comments (6)
app/components/chat.tsx (6)
51-51
: LGTM!The new imports for NetWorkIcon and canUseNetWork utility function are correctly added and necessary for the network toggle feature.
Also applies to: 79-79
515-515
: LGTM!The network toggle implementation follows the established pattern used by other toggles in this component. State management and session updates are handled correctly.
Also applies to: 525-530
845-853
: LGTM!The network toggle UI implementation is consistent with other chat actions. Conditional rendering based on model capability and proper localization are correctly implemented.
1316-1316
: Good refactoring of TTS configuration handling.The dynamic API client initialization based on
ttsConfig.modelProvider
and the conditional engine handling improve the flexibility and maintainability of the TTS system.Also applies to: 1319-1319, 1322-1322
1323-1330
: LGTM!The Edge TTS implementation is well-structured with proper async handling and consistent use of the MsEdgeTTS API. The use of the
playSpeech
helper function improves code organization.
1370-1382
: Excellent refactoring with the playSpeech helper function.The extraction of common speech playback logic into a dedicated helper function improves code organization and maintainability. Error handling and state management are properly implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
app/client/platforms/alibaba.ts (2)
236-252
: Likely bug: tools passed as [] into streamWithThink.You attach tools to requestPayload.parameters.tools, but pass an empty array to streamWithThink. This may break the tool execution loop on subsequent turns.
return streamWithThink( chatPath, requestPayload, headers, - [], + _tools, funcs, controller,
255-285
: Make SSE parsing in chat resilient to malformed JSON and guard runTools index.A bad chunk will currently throw and break the stream. Also guard when updating arguments by index.
- (text: string, runTools: ChatMessageTool[]) => { + (text: string, runTools: ChatMessageTool[]) => { // console.log("parseSSE", text, runTools); - const json = JSON.parse(text); + let json: any; + try { + json = JSON.parse(text); + } catch (e) { + console.warn("[Alibaba Chat] Bad SSE JSON chunk:", e, text); + return { isThinking: false, content: "" }; + } - const choices = json.output.choices as Array<{ + const choices = json?.output?.choices as Array<{ message: { content: string | null | MultimodalContentForAlibaba[]; tool_calls: ChatMessageTool[]; reasoning_content: string | null; }; }>; @@ - if (tool_calls?.length > 0) { + if (tool_calls?.length > 0) { const index = tool_calls[0]?.index; const id = tool_calls[0]?.id; const args = tool_calls[0]?.function?.arguments; if (id) { runTools.push({ id, type: tool_calls[0]?.type, function: { name: tool_calls[0]?.function?.name as string, arguments: args, }, }); } else { - // @ts-ignore - runTools[index]["function"]["arguments"] += args || ""; + if ( + typeof index === "number" && + runTools[index] && + runTools[index].function + ) { + // @ts-ignore + runTools[index]["function"]["arguments"] += args || ""; + } else { + console.warn( + "[Alibaba Chat] tool_calls missing id and invalid index:", + index, + ); + } } }
♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)
96-98
: Implement or explicitly document the unimplemented speech() method.Still throws. If Alibaba only supports streaming, add a clear comment and guide callers to use streamSpeech to avoid runtime surprises.
async speech(options: SpeechOptions): Promise<ArrayBuffer> { - throw new Error("Method not implemented."); + // Note: Alibaba Qwen TTS supports streaming via streamSpeech only. + // Non-streaming speech is not supported by this provider. + throw new Error("Alibaba speech() is not supported. Use streamSpeech()."); }
🧹 Nitpick comments (2)
app/client/platforms/alibaba.ts (2)
388-419
: Optional: use copyToChannel for simpler, faster buffer writes.Modern browsers support AudioBuffer.copyToChannel, avoiding manual loops.
- // 复制数据到 AudioBuffer - for (let channel = 0; channel < channels; channel++) { - const channelData = audioBuffer.getChannelData(channel); - for (let i = 0; i < channelData.length; i++) { - channelData[i] = float32Array[i * channels + channel]; - } - } + // 复制数据到 AudioBuffer + audioBuffer.copyToChannel(float32Array, 0);
389-432
: Assumption check: 16-bit little-endian PCM.pcm16ToFloat32 assumes 16-bit LE PCM with 1 channel. If server outputs different bit depth/channels, decoding will be wrong. Consider plumbing bit depth/channels from the API or response meta, or document the assumption inline.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/client/platforms/alibaba.ts
(8 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/client/platforms/alibaba.ts (6)
app/store/plugin.ts (1)
FunctionToolItem
(25-32)app/client/api.ts (1)
SpeechOptions
(67-74)app/constant.ts (1)
Alibaba
(227-236)app/utils.ts (2)
getTimeoutMSByModel
(308-320)fetch
(368-376)app/utils/stream.ts (1)
fetch
(22-108)app/utils/chat.ts (1)
streamWithThink
(392-667)
🔇 Additional comments (2)
app/client/platforms/alibaba.ts (2)
55-57
: Confirm server contract for new parameters (tools, enable_search).Ensure Alibaba endpoint expects tools under parameters.tools and the network flag as enable_search (snake_case). If the API expects different nesting or naming, requests may be ignored or rejected.
Would you like a quick repo scan to confirm other platforms use the same shape, or to gate this behind a capability flag?
65-65
: Good: Reusing a single AudioContext instance.This addresses browser limits and CPU overhead from multiple contexts. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
package.json (1)
96-100
: Yarn versions are now consistent across packageManager and Volta.Good fix—both are set to 1.22.19, resolving the earlier mismatch noted in previous reviews.
🧹 Nitpick comments (2)
package.json (2)
96-100
: Add engines to communicate required Node/Yarn to non-Volta environments.Volta enforces versions locally/CI if used, but adding engines helps external contributors and hosting platforms align tooling.
Apply this diff:
"volta": { "node": "20.19.4", "yarn": "1.22.19" - } + }, + "engines": { + "node": ">=20.19.4", + "yarn": "^1.22.19 <2" + }
96-100
: Yarn configuration mismatch: Yarn 1.x pinned but.yarnrc.yml
(Berry-only) still presentWe’ve confirmed that:
package.json
pins Yarn Classic v1.22.19 (bothpackageManager
and Volta)- A
.yarnrc.yml
exists with thenodeLinker
setting (Berry-only)Yarn 1 ignores
.yarnrc.yml
, which can confuse contributors. Please choose one of these paths:• Option A (recommended): Migrate to Yarn Berry (3.x+)
– Updatepackage.json
to pin a Berry version, e.g.
```diff
"packageManager": "[email protected]",
"volta": {
"node": "20.19.4",
"yarn": "1.22.19"
}"yarn": "3.6.0"
– Keep your existing `.yarnrc.yml` (Berry will use it)
• Option B: Stay on Yarn Classic (1.x)
– Remove or rename.yarnrc.yml
→ use a classic.yarnrc
instead
– Drop Berry-only keys (nodeLinker
,yarnPath
, etc.)Let me know which direction you’d like, and I can draft the necessary changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
package.json
(1 hunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/client/platforms/alibaba.ts (1)
274-339
: Wrap JSON.parse in try/catch inside parseSSE callbackMalformed or partial SSE chunks can throw and break the stream. Return an empty slice on parse error.
- const json = JSON.parse(text); + let json: any; + try { + json = JSON.parse(text); + } catch (e) { + console.warn("[Alibaba SSE] JSON parse error:", e); + return { isThinking: false, content: "" }; + }
♻️ Duplicate comments (2)
app/components/chat.tsx (1)
1335-1363
: Streaming TTS: use English comment and show user-friendly error toast
The Chinese comment persists and the toast leaks technical details to users. Keep console logs technical; show a concise user-facing message.- if (api.llm.streamSpeech) { - // 使用流式播放,边接收边播放 + if (api.llm.streamSpeech) { + // Use streaming playback for real-time audio setSpeechStatus(true); ttsPlayer.startStreamPlay(() => { setSpeechStatus(false); }); @@ - } catch (e) { - console.error("[Stream Speech]", e); - showToast(prettyObject(e)); + } catch (e) { + console.error("[Stream Speech]", e); + showToast("Speech generation failed. Please try again."); setSpeechStatus(false); ttsPlayer.stop(); } finally { setSpeechLoading(false); }app/client/platforms/alibaba.ts (1)
96-99
: Unimplemented speech() should be documented or delegate to streamSpeechspeech() throws without context. This can surprise integrators when streamSpeech is unavailable. Either implement speech via a non-SSE endpoint (if supported) or document that Alibaba TTS is streaming-only and instruct callers to use streamSpeech.
async speech(options: SpeechOptions): Promise<ArrayBuffer> { - throw new Error("Method not implemented."); + // Alibaba Qwen currently provides TTS via SSE streaming. + // Use streamSpeech(options) for incremental playback. + // Non-streaming speech synthesis is not supported at the moment. + throw new Error("[Alibaba TTS] Non-streaming speech is not supported; use streamSpeech instead."); }
🧹 Nitpick comments (7)
app/utils/audio.ts (3)
39-40
: Tighten type of currentOnended(() => void | null) | null is odd—either a function or null is sufficient. Use (() => void) | null for clarity.
- private currentOnended: (() => void | null) | null = null; + private currentOnended: (() => void) | null = null;
115-120
: Consider suspending instead of closing AudioContext to reduce churnClosing the AudioContext on every stop forces a new context for subsequent plays and may hit browser limits. Suspending is often enough and cheaper; only close when tearing down the app.
If you want to keep close() as a hard stop, consider adding a separate destroy() and make stop() only stop playback and suspend the context.
122-132
: Stream wait loop: prefer await over recursive setTimeout chainplayNext uses setTimeout + recursion every 100ms while waiting for streamed chunks. This can grow the call stack and complicate testing. A small async delay keeps the stack flat.
- if (this.isStreamMode && !this.streamFinished) { - setTimeout(() => this.playNext(), 100); - return; - } + if (this.isStreamMode && !this.streamFinished) { + await new Promise((r) => setTimeout(r, 100)); + return this.playNext(); + }app/components/chat.tsx (3)
1376-1388
: Non-streaming playback: show user-friendly error toast and avoid leaking internalsMirror the streaming branch behavior: keep console.error, present a clean toast.
function playSpeech(audioBuffer: ArrayBuffer | AudioBuffer) { setSpeechStatus(true); ttsPlayer .play(audioBuffer, () => { setSpeechStatus(false); }) .catch((e) => { console.error("[OpenAI Speech]", e); - showToast(prettyObject(e)); + showToast("Speech playback failed. Please try again."); setSpeechStatus(false); }) .finally(() => setSpeechLoading(false)); }
1310-1311
: Remove unused speechCooldown stateThe speechCooldown state is declared but not used. Drop it to reduce noise.
- const [speechCooldown, setSpeechCooldown] = useState(false);
848-856
: Naming consistency: NetWork → NetworkenableNetWork and canUseNetWork are inconsistent with common naming (“Network”). Consider renaming in a future cleanup to reduce confusion. Not blocking this PR.
Also applies to: 710-713, 515-531
app/client/platforms/alibaba.ts (1)
261-304
: Tool-calls accumulation: guard array access and undefined argsWhen accumulating streamed tool arguments, runTools[index] might be undefined and args can be falsy. Guard both to avoid runtime errors.
- if (tool_calls?.length > 0) { + if (tool_calls?.length > 0) { const index = tool_calls[0]?.index; const id = tool_calls[0]?.id; const args = tool_calls[0]?.function?.arguments; if (id) { runTools.push({ id, type: tool_calls[0]?.type, function: { name: tool_calls[0]?.function?.name as string, arguments: args, }, }); } else { - // @ts-ignore - runTools[index]["function"]["arguments"] += args || ""; + if ( + typeof index === "number" && + runTools[index] && + runTools[index]["function"] + ) { + // @ts-ignore + runTools[index]["function"]["arguments"] += args || ""; + } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
app/client/api.ts
(2 hunks)app/client/platforms/alibaba.ts
(6 hunks)app/components/chat.tsx
(7 hunks)app/utils/audio.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/client/api.ts
🧰 Additional context used
🧬 Code graph analysis (2)
app/client/platforms/alibaba.ts (5)
app/store/plugin.ts (1)
FunctionToolItem
(25-32)app/client/api.ts (1)
SpeechOptions
(68-75)app/utils/audio.ts (1)
TTSPlayManager
(34-286)app/constant.ts (1)
Alibaba
(227-236)app/utils.ts (2)
getTimeoutMSByModel
(308-320)fetch
(368-376)
app/components/chat.tsx (6)
app/utils.ts (1)
canUseNetWork
(299-306)app/client/api.ts (2)
ClientApi
(141-234)config
(190-190)app/store/access.ts (1)
edgeVoiceName
(169-173)app/utils/ms_edge_tts.ts (1)
MsEdgeTTS
(119-391)app/components/ui-lib.tsx (1)
showToast
(232-256)app/utils/format.ts (1)
prettyObject
(1-13)
🔇 Additional comments (2)
app/components/chat.tsx (1)
848-856
: Network toggle locale keys verified
Allapp/locales/*.ts
files defineChat.InputActions.NetWork.on
and…off
, soLocale.Chat.InputActions.NetWork[enableNetWork ? "on" : "off"]
cannot be undefined at runtime. No changes required.app/client/platforms/alibaba.ts (1)
218-231
: Enable network search field integration looks goodPassing enable_search from modelConfig.enableNetWork into parameters matches the UI toggle and keeps the request payload minimal when false. LGTM.
async *streamSpeech( | ||
options: SpeechOptions, | ||
audioManager?: TTSPlayManager, | ||
): AsyncGenerator<AudioBuffer> { | ||
if (!options.input || !options.model) { | ||
throw new Error("Missing required parameters: input and model"); | ||
} | ||
const requestPayload = { | ||
model: options.model, | ||
input: { | ||
text: options.input, | ||
voice: options.voice, | ||
}, | ||
speed: options.speed, | ||
response_format: options.response_format, | ||
}; | ||
const controller = new AbortController(); | ||
options.onController?.(controller); | ||
|
||
if (audioManager) { | ||
audioManager.setStreamController(controller); | ||
} | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioManager is optional but used with non-null assertion; guard or make it required
streamSpeech() references audioManager! which will throw if undefined. Either require the param or create a local TTSPlayManager fallback so decoding still works.
- async *streamSpeech(
- options: SpeechOptions,
- audioManager?: TTSPlayManager,
- ): AsyncGenerator<AudioBuffer> {
+ async *streamSpeech(
+ options: SpeechOptions,
+ audioManager?: TTSPlayManager,
+ ): AsyncGenerator<AudioBuffer> {
@@
- if (audioManager) {
- audioManager.setStreamController(controller);
- }
+ const player = audioManager ?? new TTSPlayManager();
+ player.setStreamController(controller);
@@
- if (json.output?.audio?.data) {
- yield await audioManager!.pcmBase64ToAudioBuffer(
- json.output.audio.data,
- { channels: 1, sampleRate: 24000, bitDepth: 16 },
- );
- }
+ if (json.output?.audio?.data) {
+ const sr = json.output?.audio?.sample_rate ?? 24000;
+ yield await player.pcmBase64ToAudioBuffer(
+ json.output.audio.data,
+ { channels: 1, sampleRate: sr, bitDepth: 16 },
+ );
+ }
@@
- if (audioManager) {
- audioManager.clearStreamController();
- }
+ player.clearStreamController();
Also applies to: 161-165, 185-189
🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 100 to 122 (and similarly at
161-165 and 185-189), streamSpeech currently uses audioManager with a non-null
assertion which will throw if the optional parameter is undefined; update the
function to either (a) require audioManager by making it a mandatory parameter,
or (b) guard every usage by checking if audioManager exists before calling
methods (e.g., if (audioManager) audioManager.setStreamController(controller)),
and provide a local no-op or fallback TTSPlayManager instance for
decoding/playback when audioManager is not provided so the function safely
handles undefined without runtime errors.
const speechPayload = { | ||
method: "POST", | ||
body: JSON.stringify(requestPayload), | ||
signal: controller.signal, | ||
headers: { | ||
...getHeaders(), | ||
"X-DashScope-SSE": "enable", | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden SSE request: set headers, check res.ok/body, and clear timeout on all paths
Missing Accept/Content-Type, no res.ok check, and no guard for res.body. Also ensure timeout is cleared in finally.
const speechPayload = {
method: "POST",
body: JSON.stringify(requestPayload),
signal: controller.signal,
headers: {
...getHeaders(),
"X-DashScope-SSE": "enable",
+ Accept: "text/event-stream",
+ "Content-Type": "application/json",
},
};
@@
- const res = await fetch(speechPath, speechPayload);
- clearTimeout(requestTimeoutId); // Clear timeout on successful connection
+ const res = await fetch(speechPath, speechPayload);
+ if (!res.ok) {
+ const errText = await res.text().catch(() => "");
+ throw new Error(
+ `[Alibaba TTS] HTTP ${res.status} ${res.statusText} ${errText}`,
+ );
+ }
+ if (!res.body) {
+ throw new Error("[Alibaba TTS] Missing response body for SSE stream.");
+ }
And move timeout cleanup into finally (see next comment).
Also applies to: 140-146
🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 124-132 (and similarly for lines
140-146), the SSE request is missing explicit Accept and Content-Type headers,
does not check res.ok or guard against a null res.body, and does not clear the
timeout on all code paths; update the speechPayload.headers to include "Accept":
"text/event-stream" and "Content-Type": "application/json" (or appropriate
content type), perform the fetch then check if (!res.ok) throw or handle the
error before proceeding, ensure you verify res.body exists before using it
(guard null), and move timeout cleanup into a finally block so the controller
timeout is cleared regardless of success or error.
const requestTimeoutId = setTimeout( | ||
() => controller.abort(), | ||
getTimeoutMSByModel(options.model), | ||
); | ||
|
||
const res = await fetch(speechPath, speechPayload); | ||
clearTimeout(requestTimeoutId); // Clear timeout on successful connection | ||
|
||
const reader = res.body!.getReader(); | ||
const decoder = new TextDecoder(); | ||
let buffer = ""; | ||
while (true) { | ||
const { done, value } = await reader.read(); | ||
if (done) { | ||
break; | ||
} | ||
buffer += decoder.decode(value, { stream: true }); | ||
const lines = buffer.split("\n"); | ||
buffer = lines.pop() || ""; | ||
|
||
for (const line of lines) { | ||
const data = line.slice(5); | ||
try { | ||
if (line.startsWith("data:")) { | ||
const json = JSON.parse(data); | ||
if (json.output?.audio?.data) { | ||
yield await audioManager!.pcmBase64ToAudioBuffer( | ||
json.output.audio.data, | ||
{ channels: 1, sampleRate: 24000, bitDepth: 16 }, | ||
); | ||
} | ||
} | ||
} catch (parseError) { | ||
console.warn( | ||
"[StreamSpeech] Failed to parse SSE data:", | ||
parseError, | ||
); | ||
continue; | ||
} | ||
} | ||
} | ||
reader.releaseLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Always release reader and clear timeout (finally), and improve SSE parsing
Ensure requestTimeoutId and reader are cleaned up on all paths. Also, only slice when the line starts with data:, and handle the [DONE] sentinel. Keep parsing in try/catch.
- const requestTimeoutId = setTimeout(
+ let requestTimeoutId: any = setTimeout(
() => controller.abort(),
getTimeoutMSByModel(options.model),
);
-
- const res = await fetch(speechPath, speechPayload);
- const reader = res.body!.getReader();
+ const res = await fetch(speechPath, speechPayload);
+ let reader: ReadableStreamDefaultReader<Uint8Array> | undefined;
+ reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
- for (const line of lines) {
- const data = line.slice(5);
+ for (const rawLine of lines) {
+ const line = rawLine.trim();
try {
- if (line.startsWith("data:")) {
- const json = JSON.parse(data);
+ if (!line.startsWith("data:")) continue;
+ const data = line.slice(5).trim();
+ if (data === "[DONE]") {
+ // end-of-stream marker
+ break;
+ }
+ const json = JSON.parse(data);
if (json.output?.audio?.data) {
- yield await player.pcmBase64ToAudioBuffer(
- json.output.audio.data,
- { channels: 1, sampleRate: 24000, bitDepth: 16 },
- );
+ const sr = json.output?.audio?.sample_rate ?? 24000;
+ yield await player.pcmBase64ToAudioBuffer(
+ json.output.audio.data,
+ { channels: 1, sampleRate: sr, bitDepth: 16 },
+ );
}
} catch (parseError) {
console.warn(
"[StreamSpeech] Failed to parse SSE data:",
parseError,
);
continue;
}
}
}
- reader.releaseLock();
+ reader?.releaseLock();
@@
- } finally {
- if (audioManager) {
- audioManager.clearStreamController();
- }
+ } finally {
+ try {
+ clearTimeout(requestTimeoutId);
+ } catch {}
+ try {
+ // releasing is idempotent; safe to attempt
+ // @ts-ignore - reader may be undefined in some error paths
+ reader?.releaseLock?.();
+ } catch {}
+ try {
+ (audioManager ?? player)?.clearStreamController();
+ } catch {}
}
Also applies to: 185-189
🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 135-176 (and similarly 185-189),
the fetch stream handling doesn't guarantee cleanup and incorrectly slices every
line; wrap the read loop and downstream parsing in a try/finally so you always
clearTimeout(requestTimeoutId) and releaseLock() on the reader regardless of
errors or early returns, check line.startsWith("data:") before slicing and
handle the "[DONE]" sentinel (break or return when encountered), and keep
JSON.parse inside a try/catch around each data segment so parsing errors for one
event don't break the stream.
async function openaiSpeech(text: string) { | ||
if (speechStatus) { | ||
ttsPlayer.stop(); | ||
setSpeechStatus(false); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Await ttsPlayer.stop() to match the async API and ensure cleanup before toggling state
After aligning TTSPlayer.stop to return Promise, await it here to avoid races (e.g., stop finishes after UI flips status).
- if (speechStatus) {
- ttsPlayer.stop();
+ if (speechStatus) {
+ await ttsPlayer.stop();
setSpeechStatus(false);
} else {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async function openaiSpeech(text: string) { | |
if (speechStatus) { | |
ttsPlayer.stop(); | |
setSpeechStatus(false); | |
} else { | |
async function openaiSpeech(text: string) { | |
if (speechStatus) { | |
await ttsPlayer.stop(); | |
setSpeechStatus(false); | |
} else { |
🤖 Prompt for AI Agents
In app/components/chat.tsx around lines 1312 to 1316, the code calls
ttsPlayer.stop() without awaiting it which can race with state toggling; update
the call to await ttsPlayer.stop() (inside an async function) before calling
setSpeechStatus(false) so the stop operation completes and any cleanup finishes
before the UI state is flipped.
startStreamPlay: (onended: () => void | null) => void; | ||
finishStreamPlay: () => void; | ||
setStreamController: (controller: AbortController) => void; | ||
clearStreamController: () => void; | ||
stop: () => void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface vs implementation mismatch: stop() is async but typed as sync
TTSPlayManager.stop is async (returns Promise) but the TTSPlayer type declares stop: () => void. This will break “implements TTSPlayer” type checking and can leak unhandled promises at call sites.
Apply this diff to align the public API:
type TTSPlayer = {
init: () => void;
@@
- stop: () => void;
+ stop: () => Promise<void>;
Also applies to: 86-121
🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 16 and 86-121, the TTSPlayer interface
declares stop: () => void but the concrete TTSPlayManager.stop implementation is
async (returns Promise<void>), causing an interface/implementation mismatch and
potential unhandled promises; change the TTSPlayer.stop signature to stop: () =>
Promise<void>, update any related type aliases/exports, ensure
TTSPlayManager.stop remains async or explicitly returns a Promise<void>, and
update all call sites to await or handle the returned promise so the public API
and implementations align.
// Audio processing utilities | ||
export interface PCMConfig { | ||
channels?: number; | ||
sampleRate?: number; | ||
bitDepth?: 16 | 24 | 32; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Declare only supported bit depths (16-bit) or implement 24/32
PCMConfig advertises 16 | 24 | 32, but pcmDataToAudioBuffer only implements 16-bit and throws otherwise. Either narrow the type to 16 or finish the 24/32-bit implementations to avoid misleading API.
Minimal change (narrow the API):
export interface PCMConfig {
channels?: number;
sampleRate?: number;
- bitDepth?: 16 | 24 | 32;
+ bitDepth?: 16;
}
Optionally, leave the runtime guard in place for defense-in-depth:
- switch (bitDepth) {
- case 16:
- float32Array = this.pcm16ToFloat32(pcmData);
- break;
- default:
- throw new Error(`Unsupported bit depth: ${bitDepth}`);
- }
+ if (bitDepth !== 16) {
+ throw new Error(`Unsupported bit depth: ${bitDepth}`);
+ }
+ float32Array = this.pcm16ToFloat32(pcmData);
Also applies to: 242-249
async play( | ||
audioBuffer: ArrayBuffer | AudioBuffer, | ||
onended: () => void | null, | ||
) { | ||
if (this.audioBufferSourceNode) { | ||
this.audioBufferSourceNode.stop(); | ||
this.audioBufferSourceNode.disconnect(); | ||
} | ||
let buffer: AudioBuffer; | ||
if (audioBuffer instanceof AudioBuffer) { | ||
buffer = audioBuffer; | ||
} else { | ||
buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer); | ||
} | ||
this.audioBufferSourceNode = | ||
TTSPlayManager.audioContext!.createBufferSource(); | ||
this.audioBufferSourceNode.buffer = buffer; | ||
this.audioBufferSourceNode.connect( | ||
TTSPlayManager.audioContext!.destination, | ||
); | ||
this.getAudioContext.resume().then(() => { | ||
this.audioBufferSourceNode!.start(); | ||
}); | ||
audioBufferSourceNode.onended = onended; | ||
}; | ||
this.audioBufferSourceNode.onended = onended; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use the initialized AudioContext via the getter to avoid null deref
play() uses TTSPlayManager.audioContext directly for decodeAudioData/createBufferSource. If init() wasn’t called or the context was closed by stop(), this can be null. Always go through the getter which lazily initializes the context.
async play(
audioBuffer: ArrayBuffer | AudioBuffer,
onended: () => void | null,
) {
@@
- let buffer: AudioBuffer;
- if (audioBuffer instanceof AudioBuffer) {
- buffer = audioBuffer;
- } else {
- buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);
- }
- this.audioBufferSourceNode =
- TTSPlayManager.audioContext!.createBufferSource();
+ const ctx = this.getAudioContext;
+ let buffer: AudioBuffer;
+ if (audioBuffer instanceof AudioBuffer) {
+ buffer = audioBuffer;
+ } else {
+ buffer = await ctx.decodeAudioData(audioBuffer);
+ }
+ this.audioBufferSourceNode = ctx.createBufferSource();
this.audioBufferSourceNode.buffer = buffer;
- this.audioBufferSourceNode.connect(
- TTSPlayManager.audioContext!.destination,
- );
- this.getAudioContext.resume().then(() => {
+ this.audioBufferSourceNode.connect(ctx.destination);
+ ctx.resume().then(() => {
this.audioBufferSourceNode!.start();
});
this.audioBufferSourceNode.onended = onended;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async play( | |
audioBuffer: ArrayBuffer | AudioBuffer, | |
onended: () => void | null, | |
) { | |
if (this.audioBufferSourceNode) { | |
this.audioBufferSourceNode.stop(); | |
this.audioBufferSourceNode.disconnect(); | |
} | |
let buffer: AudioBuffer; | |
if (audioBuffer instanceof AudioBuffer) { | |
buffer = audioBuffer; | |
} else { | |
buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer); | |
} | |
this.audioBufferSourceNode = | |
TTSPlayManager.audioContext!.createBufferSource(); | |
this.audioBufferSourceNode.buffer = buffer; | |
this.audioBufferSourceNode.connect( | |
TTSPlayManager.audioContext!.destination, | |
); | |
this.getAudioContext.resume().then(() => { | |
this.audioBufferSourceNode!.start(); | |
}); | |
audioBufferSourceNode.onended = onended; | |
}; | |
this.audioBufferSourceNode.onended = onended; | |
} | |
async play( | |
audioBuffer: ArrayBuffer | AudioBuffer, | |
onended: () => void | null, | |
) { | |
if (this.audioBufferSourceNode) { | |
this.audioBufferSourceNode.stop(); | |
this.audioBufferSourceNode.disconnect(); | |
} | |
const ctx = this.getAudioContext; | |
let buffer: AudioBuffer; | |
if (audioBuffer instanceof AudioBuffer) { | |
buffer = audioBuffer; | |
} else { | |
buffer = await ctx.decodeAudioData(audioBuffer); | |
} | |
this.audioBufferSourceNode = ctx.createBufferSource(); | |
this.audioBufferSourceNode.buffer = buffer; | |
this.audioBufferSourceNode.connect(ctx.destination); | |
ctx.resume().then(() => { | |
this.audioBufferSourceNode!.start(); | |
}); | |
this.audioBufferSourceNode.onended = onended; | |
} |
🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 60–84, play() currently accesses
TTSPlayManager.audioContext directly which can be null; instead obtain the
initialized context via the existing getter (await this.getAudioContext()),
assign it to a local const (e.g., const ctx = await this.getAudioContext()),
then use ctx.decodeAudioData(...) and ctx.createBufferSource() and connect to
ctx.destination; use ctx.resume() for resuming before starting the source and
replace direct null assertions on TTSPlayManager.audioContext with the local ctx
to avoid null deref.
// 创建 AudioBuffer | ||
const audioBuffer = audioContext.createBuffer( | ||
channels, | ||
float32Array.length / channels, | ||
sampleRate, | ||
); | ||
|
||
// 复制数据到 AudioBuffer | ||
for (let channel = 0; channel < channels; channel++) { | ||
const channelData = audioBuffer.getChannelData(channel); | ||
for (let i = 0; i < channelData.length; i++) { | ||
channelData[i] = float32Array[i * channels + channel]; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Channel/sample alignment: guard against misaligned PCM and avoid fractional frame lengths
If pcmData length is not divisible by (bytesPerSample * channels), createBuffer will still accept length but the interleaving copy will read past the end or skip samples. Compute frameCount explicitly and validate alignment.
- const audioBuffer = audioContext.createBuffer(
- channels,
- float32Array.length / channels,
- sampleRate,
- );
+ if (float32Array.length % channels !== 0) {
+ console.warn(
+ "[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame",
+ );
+ }
+ const frameCount = Math.floor(float32Array.length / channels);
+ const audioBuffer = audioContext.createBuffer(
+ channels,
+ frameCount,
+ sampleRate,
+ );
@@
- for (let channel = 0; channel < channels; channel++) {
- const channelData = audioBuffer.getChannelData(channel);
- for (let i = 0; i < channelData.length; i++) {
- channelData[i] = float32Array[i * channels + channel];
- }
- }
+ for (let channel = 0; channel < channels; channel++) {
+ const channelData = audioBuffer.getChannelData(channel);
+ for (let i = 0; i < frameCount; i++) {
+ channelData[i] = float32Array[i * channels + channel] ?? 0;
+ }
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// 创建 AudioBuffer | |
const audioBuffer = audioContext.createBuffer( | |
channels, | |
float32Array.length / channels, | |
sampleRate, | |
); | |
// 复制数据到 AudioBuffer | |
for (let channel = 0; channel < channels; channel++) { | |
const channelData = audioBuffer.getChannelData(channel); | |
for (let i = 0; i < channelData.length; i++) { | |
channelData[i] = float32Array[i * channels + channel]; | |
} | |
} | |
// 创建 AudioBuffer | |
if (float32Array.length % channels !== 0) { | |
console.warn( | |
"[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame", | |
); | |
} | |
const frameCount = Math.floor(float32Array.length / channels); | |
const audioBuffer = audioContext.createBuffer( | |
channels, | |
frameCount, | |
sampleRate, | |
); | |
// 复制数据到 AudioBuffer | |
for (let channel = 0; channel < channels; channel++) { | |
const channelData = audioBuffer.getChannelData(channel); | |
for (let i = 0; i < frameCount; i++) { | |
channelData[i] = float32Array[i * channels + channel] ?? 0; | |
} | |
} |
🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 251 to 264, the code creates an AudioBuffer
using float32Array.length / channels without validating alignment, which may
read past the end or skip samples if pcm length isn't divisible by channels;
compute an explicit frameCount = Math.floor(float32Array.length / channels),
validate that float32Array.length === frameCount * channels (or else throw or
log and handle remaining bytes), use frameCount when creating the AudioBuffer
and in the channel copy loops, and ensure the inner copy bounds use frameCount
to avoid reading beyond the float32Array (or pad/truncate as appropriate).
💻 变更类型 | Change Type
🔀 变更说明 | Description of Change
📝 补充信息 | Additional Information
20250731110339_rec_.mp4
Summary by CodeRabbit
New Features
Improvements
Chores