Feature/ support (alibaba tts + alibaba function calling + network search) #6588

Little-LittleProgrammer · 2025-07-31T03:05:16Z

💻 变更类型 | Change Type

🔀 变更说明 | Description of Change

增加阿里云千问语音模型(qwen-tts, qwen-tts-latest)支持
支持流式语音内容播放，支持边接收边播放功能
重构音频处理逻辑，统一PCM转AudioBuffer的实现
更新TTS配置选项，支持多引擎动态切换
阿里云模型支持 function calling
添加联网搜索功能，更新相关配置和多语言支持

📝 补充信息 | Additional Information

20250731110339_rec_.mp4

Summary by CodeRabbit

New Features
- Web Search toggle in chat for supported models.
- Streaming TTS playback with real-time audio streaming and queued playback.
Improvements
- Dynamic multi-engine TTS settings (OpenAI, Alibaba, Edge) with selectable models/voices/providers.
- Robust, class-based TTS playback manager with streaming controls and PCM support.
- Many new localized strings for the Web Search toggle.
Chores
- Yarn/node versions pinned via Volta; Yarn set to use node_modules.

coderabbitai · 2025-07-31T03:05:26Z

Walkthrough

Introduces Yarn/Volta tooling, widens LLM API speech types and adds optional streaming API, implements Alibaba (Qwen) SSE PCM audio streaming, refactors TTS config and player into a streaming-capable manager, adds a network toggle and config, and adds locale strings for the toggle.

Changes

Cohort / File(s)	Summary
Tooling `./.yarnrc.yml`, `package.json`	Add `.yarnrc.yml` with `nodeLinker: node-modules`; add `volta` pin in `package.json` (node 20.19.4, yarn 1.22.19).
API surface `app/client/api.ts`	Widen `LLMApi.speech` return type to `Promise<ArrayBuffer \| AudioBuffer>` and add optional `streamSpeech?(options, audioManager?): AsyncGenerator<AudioBuffer>`.
Alibaba platform & constants `app/client/platforms/alibaba.ts`, `app/constant.ts`	Add `QwenApi.streamSpeech` SSE-based audio streaming (parses SSE JSON, converts PCM base64 → AudioBuffer), make `speech` async, extend request params with `tools?` and `enable_search?`, add `Alibaba.SpeechPath`.
Chat UI & TTS behavior `app/components/chat.tsx`, `app/components/tts-config.tsx`, `app/store/config.ts`, `app/utils.ts`	Add NetWork toggle UI and `enableNetWork` config; dynamic TTS provider selection; support streaming TTS via `llm.streamSpeech` and new TTS provider flows; refactor TTS configs to typed `TTS_CONFIGS`; add `modelProvider` to ttsConfig; add `canUseNetWork` util and update plugin visibility.
TTS player / audio utils `app/utils/audio.ts`	Replace function-based player with `TTSPlayManager` class implementing `TTSPlayer`, add queueing, streaming controls (start/finish stream, stream controller), PCM conversion helpers (`pcmBase64ToAudioBuffer`, `pcmDataToAudioBuffer`), and broaden `play` to accept `ArrayBuffer
Locales `app/locales/*.ts`	Add `Chat.InputActions.NetWork` translations (`on`/`off`) across locale files (`ar`, `bn`, `cn`, `cs`, `da`, `de`, `en`, `es`, `fr`, `id`, `it`, `jp`, `ko`, `no`, `pt`, `ru`, `sk`, `tr`, `tw`, `vi`).
Small additions `./.yarnrc.yml`	New Yarn config file to force node_modules linker.

Sequence Diagram(s)

sequenceDiagram
  participant UI as Chat UI
  participant Client as ClientApi.llm
  participant Qwen as Alibaba SSE
  participant Player as TTSPlayManager

  UI->>Client: request streamSpeech(options, audioManager)
  Client->>Qwen: POST /Alibaba.SpeechPath (SSE, params include tools/enable_search)
  UI->>Player: startStreamPlay(onended)
  loop SSE chunks
    Qwen-->>Client: SSE chunk (data: JSON with output.audio.data = PCM base64)
    Client->>Client: decode base64 → AudioBuffer (pcmBase64ToAudioBuffer)
    Client-->>UI: yield AudioBuffer
    UI->>Player: addToQueue(AudioBuffer)
  end
  Client-->>UI: stream complete
  UI->>Player: finishStreamPlay()
  Player-->>UI: onended()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Feat/support 302ai provider #6542: Updates LLMApi speech signature and adds optional streamSpeech — directly related to API changes here.
feat(alibaba): Added alibaba vision model and omni model support #6292: Modifies Alibaba platform paths/behavior — overlaps with Alibaba SSE and constant changes.
google gemini support function call #5581: Touches chat streaming and tool integration — relates to chat streaming/tool parameter changes.

Suggested labels

enhancement

Suggested reviewers

lloydzhou
Dogtiti
ElricLiu

Poem

I twitch my ears at streams that flow,
Small PCM crumbs in audio snow.
I queue each nibble, hop in time,
While toggled nets go fetch and find.
New engines hum — a rabbit’s rhyme. 🐇🎧

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (2)

app/components/chat.tsx (2)

287-287: Remove unused state variable speechCooldown.

This state variable and its setter are declared but never used throughout the component. Consider removing it to reduce code clutter.

-  const [speechCooldown, setSpeechCooldown] = useState(false);

302-311: Add error handling for Edge TTS branch.

The Edge TTS implementation lacks try-catch error handling that exists in other branches. This could lead to unhandled promise rejections.

   if (config.ttsConfig.engine === "Edge") {
     const edgeVoiceName = accessStore.edgeVoiceName();
     const tts = new MsEdgeTTS();
-    await tts.setMetadata(
-      edgeVoiceName,
-      OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3,
-    );
-    audioBuffer = await tts.toArrayBuffer(textContent);
-    playSpeech(audioBuffer);
+    try {
+      await tts.setMetadata(
+        edgeVoiceName,
+        OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3,
+      );
+      audioBuffer = await tts.toArrayBuffer(textContent);
+      playSpeech(audioBuffer);
+    } catch (e) {
+      console.error("[Edge Speech]", e);
+      showToast(prettyObject(e));
+      setSpeechStatus(false);
+      setSpeechLoading(false);
+    }

🧹 Nitpick comments (3)

app/utils/audio.ts (2)
57-58: Translate Chinese comment to English for consistency.

Please translate the comment to maintain code consistency across the codebase.
-      // 在流模式下，如果队列为空但流还没结束，等待
+      // In stream mode, if queue is empty but stream hasn't finished, wait
109-114: Consider logging when rejecting audio buffer additions to finished streams.

When streamFinished is true, the method silently ignores the addition. This could make debugging difficult.
 const addToQueue = (audioBuffer: ArrayBuffer | AudioBuffer) => {
   if (streamFinished) {
+    console.debug("[TTSPlayer] Cannot add to queue - stream already finished");
     return;
   }
   playQueue.push(audioBuffer);
 };
app/constant.ts (1)
483-488: Document the empty arrays for Edge TTS configuration.

The Edge TTS configuration has empty Model and Voices arrays. If this is intentional for dynamic loading, consider adding a comment to clarify this design decision.
 export const EDGE_TTS = {
     Provider: "Edge" as const,
     ModelProvider: ModelProvider.GPT,
+    // Models and voices are dynamically loaded for Edge TTS
     Model: [] as string[],
     Voices: [] as string[],
 } as const;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 557a2cc and 221229c.

📒 Files selected for processing (9)

.yarnrc.yml (1 hunks)
app/client/api.ts (1 hunks)
app/client/platforms/alibaba.ts (3 hunks)
app/components/chat.tsx (1 hunks)
app/components/tts-config.tsx (4 hunks)
app/constant.ts (3 hunks)
app/store/config.ts (3 hunks)
app/utils/audio.ts (2 hunks)
package.json (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

app/store/config.ts (1)

Learnt from: Noctiro
PR: #6139
File: app/store/access.ts:251-258
Timestamp: 2025-02-04T03:46:50.895Z
Learning: The getModelProvider function in app/utils/model.ts uses safe string operations to parse model names with optional provider (format: modelName@providerId), making explicit error handling unnecessary as it always returns a valid [model, provider] tuple where provider can be undefined.

🧬 Code Graph Analysis (1)

app/store/config.ts (1)

app/constant.ts (2)

DEFAULT_TTS_ENGINE (465-465)

TTSEngineType (490-490)

🔇 Additional comments (14)

.yarnrc.yml (1)

1-1: LGTM! Valid Yarn configuration.

Setting nodeLinker to node-modules is a standard configuration that switches from Yarn's PnP to traditional node_modules structure, which can improve compatibility with certain dependencies.

app/client/api.ts (1)

110-111: Well-designed API extension for streaming TTS support.

The changes properly extend the LLMApi interface to support streaming speech synthesis:

The speech method maintains backward compatibility by supporting both ArrayBuffer and AudioBuffer return types

The optional streamSpeech method allows implementations to provide streaming audio without breaking existing code

The AsyncGenerator<AudioBuffer> type is appropriate for streaming audio data

app/store/config.ts (2)

90-91: LGTM! Well-structured TTS configuration extension.

The addition of modelProvider field with proper typing and the updated engine typing to TTSEngineType properly supports the expanded TTS engine functionality while maintaining type safety.

143-145: LGTM! Consistent validator pattern.

The new modelProvider validator follows the established pattern used by other validators in the codebase and provides proper type safety.

app/components/chat.tsx (3)

296-296: Good refactoring for dynamic TTS provider support.

The change from hardcoded provider to configuration-based provider selection improves flexibility and supports multiple TTS engines.

312-337: Well-implemented streaming TTS with proper error handling.

The streaming implementation correctly:

Manages playback state before starting the stream

Uses async generator pattern for chunk processing

Handles errors with appropriate user feedback

Ensures loading state is cleared in the finally block

1350-1362: Good extraction of common audio playback logic.

The playSpeech helper function effectively eliminates code duplication and provides consistent error handling across different TTS branches.

app/utils/audio.ts (2)

32-53: Good implementation of flexible audio buffer handling!

The play method now elegantly handles both ArrayBuffer and AudioBuffer types, maintaining backward compatibility while supporting pre-decoded audio buffers for better performance.

132-149: Excellent async handling in stop method!

The stop method properly awaits audioContext.close() and comprehensively resets all state variables, ensuring clean shutdown.

app/constant.ts (2)

235-235: LGTM!

The addition of SpeechPath to the Alibaba configuration is consistent with the existing API path structure.

465-507: Excellent refactoring of TTS configuration!

The transformation from simple constants to a structured, typed configuration system is well-designed:

Type safety with const assertions

Centralized configuration management via TTS_CONFIGS

Clear separation of concerns for different TTS providers

Extensible design that supports adding new providers easily
app/client/platforms/alibaba.ts (3)

101-161: Well-implemented streaming speech functionality!

The streamSpeech method is well-structured with:

Proper async generator pattern for streaming

Comprehensive error handling and timeout management

Clean SSE parsing logic

Appropriate cleanup with reader.releaseLock()

398-409: Efficient PCM conversion implementation.

The pcm16ToFloat32 method correctly handles:

Little-endian byte order

Signed 16-bit integer conversion

Proper normalization to [-1, 1] range

368-369: Audio parameters validated
The hardcoded values already match Alibaba Cloud’s Qwen-TTS output (mono, 24 kHz PCM16). You can mark them in code for clarity or expose them as config if you anticipate supporting other services.

Suggested update:
   private convertToAudioBuffer(pcmData: Uint8Array) {
     const audioContext = new (window.AudioContext ||
       window.webkitAudioContext)();
+    // Alibaba Qwen-TTS returns 16-bit PCM WAV: mono (1 channel) at 24 000 Hz
     const channels = 1;
     const sampleRate = 24000;

app/client/platforms/alibaba.ts

app/components/tts-config.tsx

package.json

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

app/client/platforms/alibaba.ts (1)

98-100: The unimplemented speech method issue persists.

This is the same issue flagged in the previous review - the method still throws an unimplemented error without clear documentation about why Alibaba only supports streaming speech.

🧹 Nitpick comments (3)

app/client/platforms/alibaba.ts (3)
102-162: Well-structured streaming implementation with minor improvements needed.

The async generator approach for streaming TTS is well-implemented with proper SSE handling, error management, and cleanup. However, consider these improvements:

Add null safety for audio data access (Line 150)

Improve timeout handling - the timeout is cleared only on success

Enhance buffer parsing robustness

Apply these improvements:
-            if (json.output.audio.data) {
+            if (json.output?.audio?.data) {
               yield this.PCMBase64ToAudioBuffer(json.output.audio.data);
             }
       const res = await fetch(speechPath, speechPayload);
+      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
 
       const reader = res.body!.getReader();
345-363: Consider using modern APIs and consistent error messages.

The base64 decoding works correctly, but consider these improvements:

Use Uint8Array.from() for cleaner conversion

Consistent error messages in English
-      // 解码 base64
-      const binaryString = atob(base64Data);
-      const bytes = new Uint8Array(binaryString.length);
-      for (let i = 0; i < binaryString.length; i++) {
-        bytes[i] = binaryString.charCodeAt(i);
-      }
+      // Decode base64
+      const binaryString = atob(base64Data);
+      const bytes = Uint8Array.from(binaryString, c => c.charCodeAt(0));
-      console.error("播放 PCM 数据失败:", error);
+      console.error("Failed to convert PCM data:", error);
366-400: AudioContext reuse partially implemented, but could be optimized further.

The lazy initialization of audioContext improves upon creating a new instance each time. However, the previous review suggestion of using a static getter method would be more efficient and follow better patterns.

The previous review's suggestion of a static AudioContext getter would still be beneficial:
+ private static audioContext: AudioContext | null = null;
+ 
+ private static getAudioContext(): AudioContext {
+   if (!QwenApi.audioContext) {
+     QwenApi.audioContext = new (window.AudioContext || 
+       window.webkitAudioContext)();
+   }
+   return QwenApi.audioContext;
+ }

  private convertToAudioBuffer(pcmData: Uint8Array) {
-   if (!this.audioContext) {
-     this.audioContext = new (window.AudioContext ||
-       window.webkitAudioContext)();
-   }
-   const audioContext = this.audioContext;
+   const audioContext = QwenApi.getAudioContext();

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 221229c and fe484fd.

📒 Files selected for processing (1)

app/client/platforms/alibaba.ts (4 hunks)

🔇 Additional comments (2)

app/client/platforms/alibaba.ts (2)

67-67: Good addition of audioContext property for reuse.

Adding the private audioContext property is a step in the right direction for reusing the AudioContext instance, which addresses performance concerns from the previous review.

402-413: Correct PCM conversion implementation.

The 16-bit PCM to 32-bit float conversion is mathematically sound with proper signed integer handling and normalization. The bit manipulation and range conversion are implemented correctly.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

app/client/platforms/alibaba.ts (1)

101-103: Previous review comment still applies.

The speech method remains unimplemented as noted in the previous review. The suggestion to add documentation explaining why it's not implemented is still valid.

🧹 Nitpick comments (1)

app/client/platforms/alibaba.ts (1)

354-422: Make audio parameters configurable and improve error messages.

The audio processing implementation is technically sound, but consider these improvements:

Hardcoded audio parameters: Consider making channels and sample rate configurable
Internationalization: Replace Chinese error message with English
Endianness assumption: Document that the PCM conversion assumes little-endian format

 // 播放 PCM base64 数据
 private async PCMBase64ToAudioBuffer(base64Data: string) {
   try {
     // 解码 base64
     const binaryString = atob(base64Data);
     const bytes = new Uint8Array(binaryString.length);
     for (let i = 0; i < binaryString.length; i++) {
       bytes[i] = binaryString.charCodeAt(i);
     }

     // 转换为 AudioBuffer
     const audioBuffer = await this.convertToAudioBuffer(bytes);

     return audioBuffer;
   } catch (error) {
-    console.error("播放 PCM 数据失败:", error);
+    console.error("Failed to convert PCM data to AudioBuffer:", error);
     throw error;
   }
 }

 // 将 PCM 字节数据转换为 AudioBuffer
-private convertToAudioBuffer(pcmData: Uint8Array) {
+private convertToAudioBuffer(pcmData: Uint8Array, channels = 1, sampleRate = 24000) {
   if (!this.audioContext) {
     this.audioContext = new (window.AudioContext ||
       window.webkitAudioContext)();
   }
   const audioContext = this.audioContext;
-  const channels = 1;
-  const sampleRate = 24000;
   return new Promise<AudioBuffer>((resolve, reject) => {

Consider adding JSDoc comments to document the endianness assumption:

-// 16位 PCM 转 32位浮点数
+/**
+ * Convert 16-bit PCM data to 32-bit float (assumes little-endian format)
+ */
 private pcm16ToFloat32(pcmData: Uint8Array) {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e3f166 and 044298e.

⛔ Files ignored due to path filters (1)

app/icons/network.svg is excluded by !**/*.svg

📒 Files selected for processing (25)

app/client/platforms/alibaba.ts (8 hunks)
app/components/chat.tsx (6 hunks)
app/locales/ar.ts (1 hunks)
app/locales/bn.ts (1 hunks)
app/locales/cn.ts (1 hunks)
app/locales/cs.ts (1 hunks)
app/locales/da.ts (1 hunks)
app/locales/de.ts (2 hunks)
app/locales/en.ts (1 hunks)
app/locales/es.ts (1 hunks)
app/locales/fr.ts (1 hunks)
app/locales/id.ts (1 hunks)
app/locales/it.ts (1 hunks)
app/locales/jp.ts (1 hunks)
app/locales/ko.ts (1 hunks)
app/locales/no.ts (1 hunks)
app/locales/pt.ts (1 hunks)
app/locales/ru.ts (1 hunks)
app/locales/sk.ts (1 hunks)
app/locales/tr.ts (1 hunks)
app/locales/tw.ts (1 hunks)
app/locales/vi.ts (1 hunks)
app/store/config.ts (3 hunks)
app/utils.ts (2 hunks)
public/plugins.json (1 hunks)

✅ Files skipped from review due to trivial changes (21)

app/locales/cs.ts
public/plugins.json
app/locales/it.ts
app/locales/fr.ts
app/locales/vi.ts
app/locales/ko.ts
app/locales/pt.ts
app/locales/no.ts
app/locales/jp.ts
app/locales/id.ts
app/locales/ar.ts
app/locales/da.ts
app/locales/sk.ts
app/locales/en.ts
app/locales/ru.ts
app/locales/tw.ts
app/locales/de.ts
app/locales/tr.ts
app/locales/bn.ts
app/locales/cn.ts
app/locales/es.ts

🚧 Files skipped from review as they are similar to previous changes (3)

app/utils.ts
app/components/chat.tsx
app/store/config.ts

🔇 Additional comments (3)

app/client/platforms/alibaba.ts (3)

14-14: LGTM! Interface properly extended for function calling support.

The addition of FunctionToolItem import and the new tools and enable_search fields in RequestParam correctly support the function calling capabilities mentioned in the PR objectives.

Also applies to: 60-61

70-70: Good optimization! AudioContext reuse implemented.

Adding the private audioContext property enables reuse of the AudioContext instance across multiple audio conversions, addressing the performance concern from previous reviews.

203-203: LGTM! Function calling and network search properly integrated.

The modifications correctly implement:

Network search capability via enable_search parameter

Function calling support with proper tools integration

Safe argument concatenation to prevent undefined errors

Also applies to: 236-240, 278-278

app/client/platforms/alibaba.ts

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

app/client/platforms/alibaba.ts (2)

101-103: The previous review comment about documenting the unimplemented speech method still applies.

The speech method continues to throw an unimplemented error without explanation. Consider adding a comment explaining that Alibaba only supports streaming speech via streamSpeech method.

105-165: Previous suggestions for input validation, JSON parsing safety, and dynamic timeout haven't been implemented.

The streamSpeech method still lacks:

Input parameter validation for required fields

Safe JSON parsing with try-catch for malformed SSE data

Dynamic timeout using getTimeoutMSByModel instead of hardcoded REQUEST_TIMEOUT_MS

These improvements would enhance robustness and prevent runtime errors.

🧹 Nitpick comments (1)

app/client/platforms/alibaba.ts (1)
354-426: Excellent audio processing implementation!

The audio processing methods properly implement:

Base64 decoding and PCM-to-AudioBuffer conversion

Static AudioContext reuse (addressing previous performance suggestions)

Correct 16-bit PCM to 32-bit float conversion with proper bit manipulation

Appropriate error handling throughout the pipeline

Consider using English comments for consistency with the rest of the codebase:
-  // 播放 PCM base64 数据
+  // Convert PCM base64 data to AudioBuffer
   private async PCMBase64ToAudioBuffer(base64Data: string) {
     try {
-      // 解码 base64
+      // Decode base64
       const binaryString = atob(base64Data);
       // ... rest of method

-  // 将 PCM 字节数据转换为 AudioBuffer  
+  // Convert PCM byte data to AudioBuffer
   private convertToAudioBuffer(pcmData: Uint8Array) {

-  // 16位 PCM 转 32位浮点数
+  // Convert 16-bit PCM to 32-bit float
   private pcm16ToFloat32(pcmData: Uint8Array) {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 044298e and 86f2c67.

📒 Files selected for processing (1)

app/client/platforms/alibaba.ts (8 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...

Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).

Applied to files:

app/client/platforms/alibaba.ts

📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...

Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.

Applied to files:

app/client/platforms/alibaba.ts

🔇 Additional comments (3)

app/client/platforms/alibaba.ts (3)

2-7: LGTM!

The new imports for REQUEST_TIMEOUT_MS and FunctionToolItem are properly used in the streaming speech and function calling implementations.

Also applies to: 14-14

60-62: LGTM!

The interface extensions properly support function calling (tools) and network search (enable_search) features. The static audioContext property addresses the previous performance optimization suggestion.

Also applies to: 70-70

245-245: Passing an Empty Array for tools Is Safe

The streamWithThink signature in app/utils/chat.ts clearly defines tools: any[]. Supplying [] simply omits the tools field in the request payload (it only includes tools when the array is non-empty), which matches the intended “no tools” behavior. No further changes are needed.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

app/components/chat.tsx (1)
1307-1307: Unused state variable detected.

The speechCooldown state is declared but doesn't appear to be used anywhere in the visible code. Consider removing it if it's not needed or implement its usage if it serves a purpose.
-  const [speechCooldown, setSpeechCooldown] = useState(false);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86f2c67 and 9cb7275.

📒 Files selected for processing (1)

app/components/chat.tsx (6 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...

Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).

Applied to files:

app/components/chat.tsx

📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...

Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.

Applied to files:

app/components/chat.tsx

🧬 Code Graph Analysis (1)

app/components/chat.tsx (6)

app/utils.ts (1)

canUseNetWork (299-306)

app/client/api.ts (2)

ClientApi (137-230)

config (186-186)

app/store/access.ts (1)

edgeVoiceName (169-173)

app/utils/ms_edge_tts.ts (1)

MsEdgeTTS (119-391)

app/components/ui-lib.tsx (1)

showToast (232-256)

app/utils/format.ts (1)

prettyObject (1-13)

🔇 Additional comments (6)

app/components/chat.tsx (6)

51-51: LGTM!

The new imports for NetWorkIcon and canUseNetWork utility function are correctly added and necessary for the network toggle feature.

Also applies to: 79-79

515-515: LGTM!

The network toggle implementation follows the established pattern used by other toggles in this component. State management and session updates are handled correctly.

Also applies to: 525-530

845-853: LGTM!

The network toggle UI implementation is consistent with other chat actions. Conditional rendering based on model capability and proper localization are correctly implemented.

1316-1316: Good refactoring of TTS configuration handling.

The dynamic API client initialization based on ttsConfig.modelProvider and the conditional engine handling improve the flexibility and maintainability of the TTS system.

Also applies to: 1319-1319, 1322-1322

1323-1330: LGTM!

The Edge TTS implementation is well-structured with proper async handling and consistent use of the MsEdgeTTS API. The use of the playSpeech helper function improves code organization.

1370-1382: Excellent refactoring with the playSpeech helper function.

The extraction of common speech playback logic into a dedicated helper function improves code organization and maintainability. Error handling and state management are properly implemented.

app/components/chat.tsx

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (2)

app/client/platforms/alibaba.ts (2)

236-252: Likely bug: tools passed as [] into streamWithThink.

You attach tools to requestPayload.parameters.tools, but pass an empty array to streamWithThink. This may break the tool execution loop on subsequent turns.

         return streamWithThink(
           chatPath,
           requestPayload,
           headers,
-          [],
+          _tools,
           funcs,
           controller,

255-285: Make SSE parsing in chat resilient to malformed JSON and guard runTools index.

A bad chunk will currently throw and break the stream. Also guard when updating arguments by index.

-          (text: string, runTools: ChatMessageTool[]) => {
+          (text: string, runTools: ChatMessageTool[]) => {
             // console.log("parseSSE", text, runTools);
-            const json = JSON.parse(text);
+            let json: any;
+            try {
+              json = JSON.parse(text);
+            } catch (e) {
+              console.warn("[Alibaba Chat] Bad SSE JSON chunk:", e, text);
+              return { isThinking: false, content: "" };
+            }
-            const choices = json.output.choices as Array<{
+            const choices = json?.output?.choices as Array<{
               message: {
                 content: string | null | MultimodalContentForAlibaba[];
                 tool_calls: ChatMessageTool[];
                 reasoning_content: string | null;
               };
             }>;
@@
-            if (tool_calls?.length > 0) {
+            if (tool_calls?.length > 0) {
               const index = tool_calls[0]?.index;
               const id = tool_calls[0]?.id;
               const args = tool_calls[0]?.function?.arguments;
               if (id) {
                 runTools.push({
                   id,
                   type: tool_calls[0]?.type,
                   function: {
                     name: tool_calls[0]?.function?.name as string,
                     arguments: args,
                   },
                 });
               } else {
-                // @ts-ignore
-                runTools[index]["function"]["arguments"] += args || "";
+                if (
+                  typeof index === "number" &&
+                  runTools[index] &&
+                  runTools[index].function
+                ) {
+                  // @ts-ignore
+                  runTools[index]["function"]["arguments"] += args || "";
+                } else {
+                  console.warn(
+                    "[Alibaba Chat] tool_calls missing id and invalid index:",
+                    index,
+                  );
+                }
               }
             }

♻️ Duplicate comments (1)

app/client/platforms/alibaba.ts (1)
96-98: Implement or explicitly document the unimplemented speech() method.

Still throws. If Alibaba only supports streaming, add a clear comment and guide callers to use streamSpeech to avoid runtime surprises.
 async speech(options: SpeechOptions): Promise<ArrayBuffer> {
-  throw new Error("Method not implemented.");
+  // Note: Alibaba Qwen TTS supports streaming via streamSpeech only.
+  // Non-streaming speech is not supported by this provider.
+  throw new Error("Alibaba speech() is not supported. Use streamSpeech().");
 }

🧹 Nitpick comments (2)

app/client/platforms/alibaba.ts (2)
388-419: Optional: use copyToChannel for simpler, faster buffer writes.

Modern browsers support AudioBuffer.copyToChannel, avoiding manual loops.
-        // 复制数据到 AudioBuffer
-        for (let channel = 0; channel < channels; channel++) {
-          const channelData = audioBuffer.getChannelData(channel);
-          for (let i = 0; i < channelData.length; i++) {
-            channelData[i] = float32Array[i * channels + channel];
-          }
-        }
+        // 复制数据到 AudioBuffer
+        audioBuffer.copyToChannel(float32Array, 0);
389-432: Assumption check: 16-bit little-endian PCM.

pcm16ToFloat32 assumes 16-bit LE PCM with 1 channel. If server outputs different bit depth/channels, decoding will be wrong. Consider plumbing bit depth/channels from the API or response meta, or document the assumption inline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45eb96f and 800c96c.

📒 Files selected for processing (1)

app/client/platforms/alibaba.ts (8 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

app/client/platforms/alibaba.ts (6)

app/store/plugin.ts (1)

FunctionToolItem (25-32)

app/client/api.ts (1)

SpeechOptions (67-74)

app/constant.ts (1)

Alibaba (227-236)

app/utils.ts (2)

getTimeoutMSByModel (308-320)

fetch (368-376)

app/utils/stream.ts (1)

fetch (22-108)

app/utils/chat.ts (1)

streamWithThink (392-667)

🔇 Additional comments (2)

app/client/platforms/alibaba.ts (2)

55-57: Confirm server contract for new parameters (tools, enable_search).

Ensure Alibaba endpoint expects tools under parameters.tools and the network flag as enable_search (snake_case). If the API expects different nesting or naming, requests may be ignored or rejected.

Would you like a quick repo scan to confirm other platforms use the same shape, or to gate this behind a capability flag?

65-65: Good: Reusing a single AudioContext instance.

This addresses browser limits and CPU overhead from multiple contexts. LGTM.

app/client/platforms/alibaba.ts

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

package.json (1)

96-100: Yarn versions are now consistent across packageManager and Volta.

Good fix—both are set to 1.22.19, resolving the earlier mismatch noted in previous reviews.

🧹 Nitpick comments (2)

package.json (2)
96-100: Add engines to communicate required Node/Yarn to non-Volta environments.

Volta enforces versions locally/CI if used, but adding engines helps external contributors and hosting platforms align tooling.

Apply this diff:
   "volta": {
     "node": "20.19.4",
     "yarn": "1.22.19"
-  }
+  },
+  "engines": {
+    "node": ">=20.19.4",
+    "yarn": "^1.22.19 <2"
+  }
96-100: Yarn configuration mismatch: Yarn 1.x pinned but .yarnrc.yml (Berry-only) still present

We’ve confirmed that:

package.json pins Yarn Classic v1.22.19 (both packageManager and Volta)

A .yarnrc.yml exists with the nodeLinker setting (Berry-only)

Yarn 1 ignores .yarnrc.yml, which can confuse contributors. Please choose one of these paths:

• Option A (recommended): Migrate to Yarn Berry (3.x+)
– Update package.json to pin a Berry version, e.g.
```diff
"packageManager": "[email protected]",
"volta": {
"node": "20.19.4",
 "yarn": "1.22.19"
 "yarn": "3.6.0"
}
– Keep your existing `.yarnrc.yml` (Berry will use it)  
• Option B: Stay on Yarn Classic (1.x)
– Remove or rename .yarnrc.yml → use a classic .yarnrc instead
– Drop Berry-only keys (nodeLinker, yarnPath, etc.)

Let me know which direction you’d like, and I can draft the necessary changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 800c96c and 16c3255.

📒 Files selected for processing (1)

package.json (1 hunks)

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/client/platforms/alibaba.ts (1)

274-339: Wrap JSON.parse in try/catch inside parseSSE callback

Malformed or partial SSE chunks can throw and break the stream. Return an empty slice on parse error.

-            const json = JSON.parse(text);
+            let json: any;
+            try {
+              json = JSON.parse(text);
+            } catch (e) {
+              console.warn("[Alibaba SSE] JSON parse error:", e);
+              return { isThinking: false, content: "" };
+            }

♻️ Duplicate comments (2)

app/components/chat.tsx (1)

1335-1363: Streaming TTS: use English comment and show user-friendly error toast
The Chinese comment persists and the toast leaks technical details to users. Keep console logs technical; show a concise user-facing message.

-        if (api.llm.streamSpeech) {
-          // 使用流式播放，边接收边播放
+        if (api.llm.streamSpeech) {
+          // Use streaming playback for real-time audio
           setSpeechStatus(true);
           ttsPlayer.startStreamPlay(() => {
             setSpeechStatus(false);
           });
@@
-          } catch (e) {
-            console.error("[Stream Speech]", e);
-            showToast(prettyObject(e));
+          } catch (e) {
+            console.error("[Stream Speech]", e);
+            showToast("Speech generation failed. Please try again.");
             setSpeechStatus(false);
             ttsPlayer.stop();
           } finally {
             setSpeechLoading(false);
           }

app/client/platforms/alibaba.ts (1)

96-99: Unimplemented speech() should be documented or delegate to streamSpeech

speech() throws without context. This can surprise integrators when streamSpeech is unavailable. Either implement speech via a non-SSE endpoint (if supported) or document that Alibaba TTS is streaming-only and instruct callers to use streamSpeech.
   async speech(options: SpeechOptions): Promise<ArrayBuffer> {
-    throw new Error("Method not implemented.");
+    // Alibaba Qwen currently provides TTS via SSE streaming.
+    // Use streamSpeech(options) for incremental playback.
+    // Non-streaming speech synthesis is not supported at the moment.
+    throw new Error("[Alibaba TTS] Non-streaming speech is not supported; use streamSpeech instead.");
   }

🧹 Nitpick comments (7)

app/utils/audio.ts (3)
39-40: Tighten type of currentOnended

(() => void | null) | null is odd—either a function or null is sufficient. Use (() => void) | null for clarity.
-  private currentOnended: (() => void | null) | null = null;
+  private currentOnended: (() => void) | null = null;
115-120: Consider suspending instead of closing AudioContext to reduce churn

Closing the AudioContext on every stop forces a new context for subsequent plays and may hit browser limits. Suspending is often enough and cheaper; only close when tearing down the app.

If you want to keep close() as a hard stop, consider adding a separate destroy() and make stop() only stop playback and suspend the context.

122-132: Stream wait loop: prefer await over recursive setTimeout chain

playNext uses setTimeout + recursion every 100ms while waiting for streamed chunks. This can grow the call stack and complicate testing. A small async delay keeps the stack flat.
-      if (this.isStreamMode && !this.streamFinished) {
-        setTimeout(() => this.playNext(), 100);
-        return;
-      }
+      if (this.isStreamMode && !this.streamFinished) {
+        await new Promise((r) => setTimeout(r, 100));
+        return this.playNext();
+      }
app/components/chat.tsx (3)
1376-1388: Non-streaming playback: show user-friendly error toast and avoid leaking internals

Mirror the streaming branch behavior: keep console.error, present a clean toast.
   function playSpeech(audioBuffer: ArrayBuffer | AudioBuffer) {
     setSpeechStatus(true);
     ttsPlayer
       .play(audioBuffer, () => {
         setSpeechStatus(false);
       })
       .catch((e) => {
         console.error("[OpenAI Speech]", e);
-        showToast(prettyObject(e));
+        showToast("Speech playback failed. Please try again.");
         setSpeechStatus(false);
       })
       .finally(() => setSpeechLoading(false));
   }
1310-1311: Remove unused speechCooldown state

The speechCooldown state is declared but not used. Drop it to reduce noise.
-  const [speechCooldown, setSpeechCooldown] = useState(false);
848-856: Naming consistency: NetWork → Network

enableNetWork and canUseNetWork are inconsistent with common naming (“Network”). Consider renaming in a future cleanup to reduce confusion. Not blocking this PR.

Also applies to: 710-713, 515-531
app/client/platforms/alibaba.ts (1)
261-304: Tool-calls accumulation: guard array access and undefined args

When accumulating streamed tool arguments, runTools[index] might be undefined and args can be falsy. Guard both to avoid runtime errors.
-            if (tool_calls?.length > 0) {
+            if (tool_calls?.length > 0) {
               const index = tool_calls[0]?.index;
               const id = tool_calls[0]?.id;
               const args = tool_calls[0]?.function?.arguments;
               if (id) {
                 runTools.push({
                   id,
                   type: tool_calls[0]?.type,
                   function: {
                     name: tool_calls[0]?.function?.name as string,
                     arguments: args,
                   },
                 });
               } else {
-                // @ts-ignore
-                runTools[index]["function"]["arguments"] += args || "";
+                if (
+                  typeof index === "number" &&
+                  runTools[index] &&
+                  runTools[index]["function"]
+                ) {
+                  // @ts-ignore
+                  runTools[index]["function"]["arguments"] += args || "";
+                }
               }
             }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 16c3255 and bf999b9.

📒 Files selected for processing (4)

app/client/api.ts (2 hunks)
app/client/platforms/alibaba.ts (6 hunks)
app/components/chat.tsx (7 hunks)
app/utils/audio.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/client/api.ts

🧰 Additional context used

🧬 Code graph analysis (2)

app/client/platforms/alibaba.ts (5)

app/store/plugin.ts (1)

FunctionToolItem (25-32)

app/client/api.ts (1)

SpeechOptions (68-75)

app/utils/audio.ts (1)

TTSPlayManager (34-286)

app/constant.ts (1)

Alibaba (227-236)

app/utils.ts (2)

getTimeoutMSByModel (308-320)

fetch (368-376)

app/components/chat.tsx (6)

app/utils.ts (1)

canUseNetWork (299-306)

app/client/api.ts (2)

ClientApi (141-234)

config (190-190)

app/store/access.ts (1)

edgeVoiceName (169-173)

app/utils/ms_edge_tts.ts (1)

MsEdgeTTS (119-391)

app/components/ui-lib.tsx (1)

showToast (232-256)

app/utils/format.ts (1)

prettyObject (1-13)

🔇 Additional comments (2)

app/components/chat.tsx (1)

848-856: Network toggle locale keys verified
All app/locales/*.ts files define Chat.InputActions.NetWork.on and …off, so Locale.Chat.InputActions.NetWork[enableNetWork ? "on" : "off"] cannot be undefined at runtime. No changes required.

app/client/platforms/alibaba.ts (1)

218-231: Enable network search field integration looks good

Passing enable_search from modelConfig.enableNetWork into parameters matches the UI toggle and keeps the request payload minimal when false. LGTM.

coderabbitai · 2025-08-21T05:56:10Z

app/client/platforms/alibaba.ts

+  async *streamSpeech(
+    options: SpeechOptions,
+    audioManager?: TTSPlayManager,
+  ): AsyncGenerator<AudioBuffer> {
+    if (!options.input || !options.model) {
+      throw new Error("Missing required parameters: input and model");
+    }
+    const requestPayload = {
+      model: options.model,
+      input: {
+        text: options.input,
+        voice: options.voice,
+      },
+      speed: options.speed,
+      response_format: options.response_format,
+    };
+    const controller = new AbortController();
+    options.onController?.(controller);
+
+    if (audioManager) {
+      audioManager.setStreamController(controller);
+    }
+    try {


⚠️ Potential issue

audioManager is optional but used with non-null assertion; guard or make it required

streamSpeech() references audioManager! which will throw if undefined. Either require the param or create a local TTSPlayManager fallback so decoding still works.

- async *streamSpeech( - options: SpeechOptions, - audioManager?: TTSPlayManager, - ): AsyncGenerator<AudioBuffer> { + async *streamSpeech( + options: SpeechOptions, + audioManager?: TTSPlayManager, + ): AsyncGenerator<AudioBuffer> { @@ - if (audioManager) { - audioManager.setStreamController(controller); - } + const player = audioManager ?? new TTSPlayManager(); + player.setStreamController(controller); @@ - if (json.output?.audio?.data) { - yield await audioManager!.pcmBase64ToAudioBuffer( - json.output.audio.data, - { channels: 1, sampleRate: 24000, bitDepth: 16 }, - ); - } + if (json.output?.audio?.data) { + const sr = json.output?.audio?.sample_rate ?? 24000; + yield await player.pcmBase64ToAudioBuffer( + json.output.audio.data, + { channels: 1, sampleRate: sr, bitDepth: 16 }, + ); + } @@ - if (audioManager) { - audioManager.clearStreamController(); - } + player.clearStreamController();

Also applies to: 161-165, 185-189

🤖 Prompt for AI Agents

In app/client/platforms/alibaba.ts around lines 100 to 122 (and similarly at 161-165 and 185-189), streamSpeech currently uses audioManager with a non-null assertion which will throw if the optional parameter is undefined; update the function to either (a) require audioManager by making it a mandatory parameter, or (b) guard every usage by checking if audioManager exists before calling methods (e.g., if (audioManager) audioManager.setStreamController(controller)), and provide a local no-op or fallback TTSPlayManager instance for decoding/playback when audioManager is not provided so the function safely handles undefined without runtime errors.

coderabbitai · 2025-08-21T05:56:10Z

app/client/platforms/alibaba.ts

+      const speechPayload = {
+        method: "POST",
+        body: JSON.stringify(requestPayload),
+        signal: controller.signal,
+        headers: {
+          ...getHeaders(),
+          "X-DashScope-SSE": "enable",
+        },
+      };


🛠️ Refactor suggestion

Harden SSE request: set headers, check res.ok/body, and clear timeout on all paths

Missing Accept/Content-Type, no res.ok check, and no guard for res.body. Also ensure timeout is cleared in finally.

const speechPayload = { method: "POST", body: JSON.stringify(requestPayload), signal: controller.signal, headers: { ...getHeaders(), "X-DashScope-SSE": "enable", + Accept: "text/event-stream", + "Content-Type": "application/json", }, }; @@ - const res = await fetch(speechPath, speechPayload); - clearTimeout(requestTimeoutId); // Clear timeout on successful connection + const res = await fetch(speechPath, speechPayload); + if (!res.ok) { + const errText = await res.text().catch(() => ""); + throw new Error( + `[Alibaba TTS] HTTP ${res.status} ${res.statusText} ${errText}`, + ); + } + if (!res.body) { + throw new Error("[Alibaba TTS] Missing response body for SSE stream."); + }

And move timeout cleanup into finally (see next comment).

Also applies to: 140-146

🤖 Prompt for AI Agents

In app/client/platforms/alibaba.ts around lines 124-132 (and similarly for lines 140-146), the SSE request is missing explicit Accept and Content-Type headers, does not check res.ok or guard against a null res.body, and does not clear the timeout on all code paths; update the speechPayload.headers to include "Accept": "text/event-stream" and "Content-Type": "application/json" (or appropriate content type), perform the fetch then check if (!res.ok) throw or handle the error before proceeding, ensure you verify res.body exists before using it (guard null), and move timeout cleanup into a finally block so the controller timeout is cleared regardless of success or error.

coderabbitai · 2025-08-21T05:56:10Z

app/client/platforms/alibaba.ts

+      const requestTimeoutId = setTimeout(
+        () => controller.abort(),
+        getTimeoutMSByModel(options.model),
+      );
+
+      const res = await fetch(speechPath, speechPayload);
+      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
+
+      const reader = res.body!.getReader();
+      const decoder = new TextDecoder();
+      let buffer = "";
+      while (true) {
+        const { done, value } = await reader.read();
+        if (done) {
+          break;
+        }
+        buffer += decoder.decode(value, { stream: true });
+        const lines = buffer.split("\n");
+        buffer = lines.pop() || "";
+
+        for (const line of lines) {
+          const data = line.slice(5);
+          try {
+            if (line.startsWith("data:")) {
+              const json = JSON.parse(data);
+              if (json.output?.audio?.data) {
+                yield await audioManager!.pcmBase64ToAudioBuffer(
+                  json.output.audio.data,
+                  { channels: 1, sampleRate: 24000, bitDepth: 16 },
+                );
+              }
+            }
+          } catch (parseError) {
+            console.warn(
+              "[StreamSpeech] Failed to parse SSE data:",
+              parseError,
+            );
+            continue;
+          }
+        }
+      }
+      reader.releaseLock();


🛠️ Refactor suggestion

Always release reader and clear timeout (finally), and improve SSE parsing

Ensure requestTimeoutId and reader are cleaned up on all paths. Also, only slice when the line starts with data:, and handle the [DONE] sentinel. Keep parsing in try/catch.

- const requestTimeoutId = setTimeout( + let requestTimeoutId: any = setTimeout( () => controller.abort(), getTimeoutMSByModel(options.model), ); - - const res = await fetch(speechPath, speechPayload); - const reader = res.body!.getReader(); + const res = await fetch(speechPath, speechPayload); + let reader: ReadableStreamDefaultReader<Uint8Array> | undefined; + reader = res.body!.getReader(); const decoder = new TextDecoder(); let buffer = ""; while (true) { const { done, value } = await reader.read(); if (done) { break; } buffer += decoder.decode(value, { stream: true }); const lines = buffer.split("\n"); buffer = lines.pop() || ""; - for (const line of lines) { - const data = line.slice(5); + for (const rawLine of lines) { + const line = rawLine.trim(); try { - if (line.startsWith("data:")) { - const json = JSON.parse(data); + if (!line.startsWith("data:")) continue; + const data = line.slice(5).trim(); + if (data === "[DONE]") { + // end-of-stream marker + break; + } + const json = JSON.parse(data); if (json.output?.audio?.data) { - yield await player.pcmBase64ToAudioBuffer( - json.output.audio.data, - { channels: 1, sampleRate: 24000, bitDepth: 16 }, - ); + const sr = json.output?.audio?.sample_rate ?? 24000; + yield await player.pcmBase64ToAudioBuffer( + json.output.audio.data, + { channels: 1, sampleRate: sr, bitDepth: 16 }, + ); } } catch (parseError) { console.warn( "[StreamSpeech] Failed to parse SSE data:", parseError, ); continue; } } } - reader.releaseLock(); + reader?.releaseLock(); @@ - } finally { - if (audioManager) { - audioManager.clearStreamController(); - } + } finally { + try { + clearTimeout(requestTimeoutId); + } catch {} + try { + // releasing is idempotent; safe to attempt + // @ts-ignore - reader may be undefined in some error paths + reader?.releaseLock?.(); + } catch {} + try { + (audioManager ?? player)?.clearStreamController(); + } catch {} }

Also applies to: 185-189

🤖 Prompt for AI Agents

In app/client/platforms/alibaba.ts around lines 135-176 (and similarly 185-189), the fetch stream handling doesn't guarantee cleanup and incorrectly slices every line; wrap the read loop and downstream parsing in a try/finally so you always clearTimeout(requestTimeoutId) and releaseLock() on the reader regardless of errors or early returns, check line.startsWith("data:") before slicing and handle the "[DONE]" sentinel (break or return when encountered), and keep JSON.parse inside a try/catch around each data segment so parsing errors for one event don't break the stream.

coderabbitai · 2025-08-21T05:56:10Z

app/components/chat.tsx

  async function openaiSpeech(text: string) {
    if (speechStatus) {
      ttsPlayer.stop();
      setSpeechStatus(false);
    } else {


🛠️ Refactor suggestion

Await ttsPlayer.stop() to match the async API and ensure cleanup before toggling state

After aligning TTSPlayer.stop to return Promise, await it here to avoid races (e.g., stop finishes after UI flips status).

- if (speechStatus) { - ttsPlayer.stop(); + if (speechStatus) { + await ttsPlayer.stop(); setSpeechStatus(false); } else {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function openaiSpeech(text: string) {

if (speechStatus) {

ttsPlayer.stop();

setSpeechStatus(false);

} else {

async function openaiSpeech(text: string) {

if (speechStatus) {

await ttsPlayer.stop();

setSpeechStatus(false);

} else {

🤖 Prompt for AI Agents

In app/components/chat.tsx around lines 1312 to 1316, the code calls ttsPlayer.stop() without awaiting it which can race with state toggling; update the call to await ttsPlayer.stop() (inside an async function) before calling setSpeechStatus(false) so the stop operation completes and any cleanup finishes before the UI state is flipped.

coderabbitai · 2025-08-21T05:56:11Z

app/utils/audio.ts

+  startStreamPlay: (onended: () => void | null) => void;
+  finishStreamPlay: () => void;
+  setStreamController: (controller: AbortController) => void;
+  clearStreamController: () => void;
  stop: () => void;


⚠️ Potential issue

Interface vs implementation mismatch: stop() is async but typed as sync

TTSPlayManager.stop is async (returns Promise) but the TTSPlayer type declares stop: () => void. This will break “implements TTSPlayer” type checking and can leak unhandled promises at call sites.

Apply this diff to align the public API:

type TTSPlayer = { init: () => void; @@ - stop: () => void; + stop: () => Promise<void>;

Also applies to: 86-121

🤖 Prompt for AI Agents

In app/utils/audio.ts around lines 16 and 86-121, the TTSPlayer interface declares stop: () => void but the concrete TTSPlayManager.stop implementation is async (returns Promise<void>), causing an interface/implementation mismatch and potential unhandled promises; change the TTSPlayer.stop signature to stop: () => Promise<void>, update any related type aliases/exports, ensure TTSPlayManager.stop remains async or explicitly returns a Promise<void>, and update all call sites to await or handle the returned promise so the public API and implementations align.

coderabbitai · 2025-08-21T05:56:11Z

app/utils/audio.ts

+// Audio processing utilities
+export interface PCMConfig {
+  channels?: number;
+  sampleRate?: number;
+  bitDepth?: 16 | 24 | 32;
+}
+


🛠️ Refactor suggestion

Declare only supported bit depths (16-bit) or implement 24/32

PCMConfig advertises 16 | 24 | 32, but pcmDataToAudioBuffer only implements 16-bit and throws otherwise. Either narrow the type to 16 or finish the 24/32-bit implementations to avoid misleading API.

Minimal change (narrow the API):

export interface PCMConfig { channels?: number; sampleRate?: number; - bitDepth?: 16 | 24 | 32; + bitDepth?: 16; }

Optionally, leave the runtime guard in place for defense-in-depth:

- switch (bitDepth) { - case 16: - float32Array = this.pcm16ToFloat32(pcmData); - break; - default: - throw new Error(`Unsupported bit depth: ${bitDepth}`); - } + if (bitDepth !== 16) { + throw new Error(`Unsupported bit depth: ${bitDepth}`); + } + float32Array = this.pcm16ToFloat32(pcmData);

Also applies to: 242-249

coderabbitai · 2025-08-21T05:56:11Z

app/utils/audio.ts

+  async play(
+    audioBuffer: ArrayBuffer | AudioBuffer,
+    onended: () => void | null,
+  ) {
+    if (this.audioBufferSourceNode) {
+      this.audioBufferSourceNode.stop();
+      this.audioBufferSourceNode.disconnect();
+    }
+    let buffer: AudioBuffer;
+    if (audioBuffer instanceof AudioBuffer) {
+      buffer = audioBuffer;
+    } else {
+      buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);
+    }
+    this.audioBufferSourceNode =
+      TTSPlayManager.audioContext!.createBufferSource();
+    this.audioBufferSourceNode.buffer = buffer;
+    this.audioBufferSourceNode.connect(
+      TTSPlayManager.audioContext!.destination,
+    );
+    this.getAudioContext.resume().then(() => {
+      this.audioBufferSourceNode!.start();
    });
-    audioBufferSourceNode.onended = onended;
-  };
+    this.audioBufferSourceNode.onended = onended;
+  }


🛠️ Refactor suggestion

Use the initialized AudioContext via the getter to avoid null deref

play() uses TTSPlayManager.audioContext directly for decodeAudioData/createBufferSource. If init() wasn’t called or the context was closed by stop(), this can be null. Always go through the getter which lazily initializes the context.

async play( audioBuffer: ArrayBuffer | AudioBuffer, onended: () => void | null, ) { @@ - let buffer: AudioBuffer; - if (audioBuffer instanceof AudioBuffer) { - buffer = audioBuffer; - } else { - buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer); - } - this.audioBufferSourceNode = - TTSPlayManager.audioContext!.createBufferSource(); + const ctx = this.getAudioContext; + let buffer: AudioBuffer; + if (audioBuffer instanceof AudioBuffer) { + buffer = audioBuffer; + } else { + buffer = await ctx.decodeAudioData(audioBuffer); + } + this.audioBufferSourceNode = ctx.createBufferSource(); this.audioBufferSourceNode.buffer = buffer; - this.audioBufferSourceNode.connect( - TTSPlayManager.audioContext!.destination, - ); - this.getAudioContext.resume().then(() => { + this.audioBufferSourceNode.connect(ctx.destination); + ctx.resume().then(() => { this.audioBufferSourceNode!.start(); }); this.audioBufferSourceNode.onended = onended; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async play(

audioBuffer: ArrayBuffer | AudioBuffer,

onended: () => void | null,

) {

if (this.audioBufferSourceNode) {

this.audioBufferSourceNode.stop();

this.audioBufferSourceNode.disconnect();

}

let buffer: AudioBuffer;

if (audioBuffer instanceof AudioBuffer) {

buffer = audioBuffer;

} else {

buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);

}

this.audioBufferSourceNode =

TTSPlayManager.audioContext!.createBufferSource();

this.audioBufferSourceNode.buffer = buffer;

this.audioBufferSourceNode.connect(

TTSPlayManager.audioContext!.destination,

);

this.getAudioContext.resume().then(() => {

this.audioBufferSourceNode!.start();

});

audioBufferSourceNode.onended = onended;

};

this.audioBufferSourceNode.onended = onended;

}

async play(

audioBuffer: ArrayBuffer | AudioBuffer,

onended: () => void | null,

) {

if (this.audioBufferSourceNode) {

this.audioBufferSourceNode.stop();

this.audioBufferSourceNode.disconnect();

}

const ctx = this.getAudioContext;

let buffer: AudioBuffer;

if (audioBuffer instanceof AudioBuffer) {

buffer = audioBuffer;

} else {

buffer = await ctx.decodeAudioData(audioBuffer);

}

this.audioBufferSourceNode = ctx.createBufferSource();

this.audioBufferSourceNode.buffer = buffer;

this.audioBufferSourceNode.connect(ctx.destination);

ctx.resume().then(() => {

this.audioBufferSourceNode!.start();

});

this.audioBufferSourceNode.onended = onended;

}

🤖 Prompt for AI Agents

In app/utils/audio.ts around lines 60–84, play() currently accesses TTSPlayManager.audioContext directly which can be null; instead obtain the initialized context via the existing getter (await this.getAudioContext()), assign it to a local const (e.g., const ctx = await this.getAudioContext()), then use ctx.decodeAudioData(...) and ctx.createBufferSource() and connect to ctx.destination; use ctx.resume() for resuming before starting the source and replace direct null assertions on TTSPlayManager.audioContext with the local ctx to avoid null deref.

coderabbitai · 2025-08-21T05:56:11Z

app/utils/audio.ts

+        // 创建 AudioBuffer
+        const audioBuffer = audioContext.createBuffer(
+          channels,
+          float32Array.length / channels,
+          sampleRate,
+        );
+
+        // 复制数据到 AudioBuffer
+        for (let channel = 0; channel < channels; channel++) {
+          const channelData = audioBuffer.getChannelData(channel);
+          for (let i = 0; i < channelData.length; i++) {
+            channelData[i] = float32Array[i * channels + channel];
+          }
+        }


🛠️ Refactor suggestion

Channel/sample alignment: guard against misaligned PCM and avoid fractional frame lengths

If pcmData length is not divisible by (bytesPerSample * channels), createBuffer will still accept length but the interleaving copy will read past the end or skip samples. Compute frameCount explicitly and validate alignment.

- const audioBuffer = audioContext.createBuffer( - channels, - float32Array.length / channels, - sampleRate, - ); + if (float32Array.length % channels !== 0) { + console.warn( + "[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame", + ); + } + const frameCount = Math.floor(float32Array.length / channels); + const audioBuffer = audioContext.createBuffer( + channels, + frameCount, + sampleRate, + ); @@ - for (let channel = 0; channel < channels; channel++) { - const channelData = audioBuffer.getChannelData(channel); - for (let i = 0; i < channelData.length; i++) { - channelData[i] = float32Array[i * channels + channel]; - } - } + for (let channel = 0; channel < channels; channel++) { + const channelData = audioBuffer.getChannelData(channel); + for (let i = 0; i < frameCount; i++) { + channelData[i] = float32Array[i * channels + channel] ?? 0; + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// 创建 AudioBuffer

const audioBuffer = audioContext.createBuffer(

channels,

float32Array.length / channels,

sampleRate,

);

// 复制数据到 AudioBuffer

for (let channel = 0; channel < channels; channel++) {

const channelData = audioBuffer.getChannelData(channel);

for (let i = 0; i < channelData.length; i++) {

channelData[i] = float32Array[i * channels + channel];

}

}

// 创建 AudioBuffer

if (float32Array.length % channels !== 0) {

console.warn(

"[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame",

);

}

const frameCount = Math.floor(float32Array.length / channels);

const audioBuffer = audioContext.createBuffer(

channels,

frameCount,

sampleRate,

);

// 复制数据到 AudioBuffer

for (let channel = 0; channel < channels; channel++) {

const channelData = audioBuffer.getChannelData(channel);

for (let i = 0; i < frameCount; i++) {

channelData[i] = float32Array[i * channels + channel] ?? 0;

}

}

🤖 Prompt for AI Agents

In app/utils/audio.ts around lines 251 to 264, the code creates an AudioBuffer using float32Array.length / channels without validating alignment, which may read past the end or skip samples if pcm length isn't divisible by channels; compute an explicit frameCount = Math.floor(float32Array.length / channels), validate that float32Array.length === frameCount * channels (or else throw or log and handle remaining bytes), use frameCount when creating the AudioBuffer and in the channel copy loops, and ensure the inner copy bounds use frameCount to avoid reading beyond the float32Array (or pad/truncate as appropriate).

Little-LittleProgrammer added 4 commits July 30, 2025 21:30

feat: 添加 TTS 引擎配置，更新阿里巴巴语音接口，支持实时语音合成

9990a89

feat: 更新语音合成接口，支持流式播放和多种音频格式

c5e6b12

refactor: 移除不必要的 TTS 配置和模型，回复runtime部分的内容

e836dc0

refactor: 恢复runtime代码,去除调试console代码

221229c

feat: 添加音频上下文管理，优化 PCM 数据转换为 AudioBuffer 的实现

fe484fd

coderabbitai bot reviewed Jul 31, 2025

View reviewed changes

app/client/platforms/alibaba.ts Show resolved Hide resolved

app/client/platforms/alibaba.ts Outdated Show resolved Hide resolved

app/components/tts-config.tsx Show resolved Hide resolved

app/components/tts-config.tsx Show resolved Hide resolved

package.json Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jul 31, 2025

View reviewed changes

Little-LittleProgrammer added 2 commits August 5, 2025 17:42

feat: 阿里巴巴千问模型支持 Function calling

4e3f166

feat: 添加联网搜索功能，更新相关配置和多语言支持

044298e

Little-LittleProgrammer changed the title ~~Feature/alibaba tts support~~ Feature/ support (alibaba tts + alibaba function calling + network search) Aug 5, 2025

feat: 优化CR 代码，优化音频上下文管理，修复 PCM 数据转换逻辑，确保成功连接时清除超时

86f2c67

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

app/client/platforms/alibaba.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

feat: 更新网络配置管理，修复主题切换时的网络状态逻辑

9cb7275

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

app/components/chat.tsx Outdated Show resolved Hide resolved

Little-LittleProgrammer added 3 commits August 5, 2025 20:02

feat: 选择模型回复联网配置

45eb96f

refactor: 回退 plugins.json

b73e65d

feat: 增加流式语音合成错误处理，优化请求超时逻辑

800c96c

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

app/client/platforms/alibaba.ts Show resolved Hide resolved

app/client/platforms/alibaba.ts Outdated Show resolved Hide resolved

app/client/platforms/alibaba.ts Outdated Show resolved Hide resolved

app/client/platforms/alibaba.ts Outdated Show resolved Hide resolved

fix: 将 yarn 版本回退至 1.22.19，以保持与 packageManager 的一致性

16c3255

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

feat: 增强音频播放管理，新增 TTSPlayManager 类，优化流式语音合成逻辑，支持 PCM 数据和 base64 转换

bf999b9

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

Feature/ support (alibaba tts + alibaba function calling + network search) #6588

Are you sure you want to change the base?

Feature/ support (alibaba tts + alibaba function calling + network search) #6588

Uh oh!

Conversation

Little-LittleProgrammer commented Jul 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💻 变更类型 | Change Type

🔀 变更说明 | Description of Change

📝 补充信息 | Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Little-LittleProgrammer commented Jul 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)