Skip to content

Conversation

Little-LittleProgrammer
Copy link
Contributor

@Little-LittleProgrammer Little-LittleProgrammer commented Jul 31, 2025

💻 变更类型 | Change Type

  • feat
  • fix
  • refactor
  • perf
  • style
  • test
  • docs
  • ci
  • chore
  • build

🔀 变更说明 | Description of Change

  • 增加阿里云千问语音模型(qwen-tts, qwen-tts-latest)支持
  • 支持流式语音内容播放,支持边接收边播放功能
  • 重构音频处理逻辑,统一PCM转AudioBuffer的实现
  • 更新TTS配置选项,支持多引擎动态切换
  • 阿里云模型支持 function calling
  • 添加联网搜索功能,更新相关配置和多语言支持

📝 补充信息 | Additional Information

20250731110339_rec_.mp4

Summary by CodeRabbit

  • New Features

    • Web Search toggle in chat for supported models.
    • Streaming TTS playback with real-time audio streaming and queued playback.
  • Improvements

    • Dynamic multi-engine TTS settings (OpenAI, Alibaba, Edge) with selectable models/voices/providers.
    • Robust, class-based TTS playback manager with streaming controls and PCM support.
    • Many new localized strings for the Web Search toggle.
  • Chores

    • Yarn/node versions pinned via Volta; Yarn set to use node_modules.

Copy link
Contributor

coderabbitai bot commented Jul 31, 2025

Walkthrough

Introduces Yarn/Volta tooling, widens LLM API speech types and adds optional streaming API, implements Alibaba (Qwen) SSE PCM audio streaming, refactors TTS config and player into a streaming-capable manager, adds a network toggle and config, and adds locale strings for the toggle.

Changes

Cohort / File(s) Summary
Tooling
./.yarnrc.yml, package.json
Add .yarnrc.yml with nodeLinker: node-modules; add volta pin in package.json (node 20.19.4, yarn 1.22.19).
API surface
app/client/api.ts
Widen LLMApi.speech return type to Promise<ArrayBuffer | AudioBuffer> and add optional streamSpeech?(options, audioManager?): AsyncGenerator<AudioBuffer>.
Alibaba platform & constants
app/client/platforms/alibaba.ts, app/constant.ts
Add QwenApi.streamSpeech SSE-based audio streaming (parses SSE JSON, converts PCM base64 → AudioBuffer), make speech async, extend request params with tools? and enable_search?, add Alibaba.SpeechPath.
Chat UI & TTS behavior
app/components/chat.tsx, app/components/tts-config.tsx, app/store/config.ts, app/utils.ts
Add NetWork toggle UI and enableNetWork config; dynamic TTS provider selection; support streaming TTS via llm.streamSpeech and new TTS provider flows; refactor TTS configs to typed TTS_CONFIGS; add modelProvider to ttsConfig; add canUseNetWork util and update plugin visibility.
TTS player / audio utils
app/utils/audio.ts
Replace function-based player with TTSPlayManager class implementing TTSPlayer, add queueing, streaming controls (start/finish stream, stream controller), PCM conversion helpers (pcmBase64ToAudioBuffer, pcmDataToAudioBuffer), and broaden play to accept `ArrayBuffer
Locales
app/locales/*.ts
Add Chat.InputActions.NetWork translations (on/off) across locale files (ar, bn, cn, cs, da, de, en, es, fr, id, it, jp, ko, no, pt, ru, sk, tr, tw, vi).
Small additions
./.yarnrc.yml
New Yarn config file to force node_modules linker.

Sequence Diagram(s)

sequenceDiagram
  participant UI as Chat UI
  participant Client as ClientApi.llm
  participant Qwen as Alibaba SSE
  participant Player as TTSPlayManager

  UI->>Client: request streamSpeech(options, audioManager)
  Client->>Qwen: POST /Alibaba.SpeechPath (SSE, params include tools/enable_search)
  UI->>Player: startStreamPlay(onended)
  loop SSE chunks
    Qwen-->>Client: SSE chunk (data: JSON with output.audio.data = PCM base64)
    Client->>Client: decode base64 → AudioBuffer (pcmBase64ToAudioBuffer)
    Client-->>UI: yield AudioBuffer
    UI->>Player: addToQueue(AudioBuffer)
  end
  Client-->>UI: stream complete
  UI->>Player: finishStreamPlay()
  Player-->>UI: onended()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • lloydzhou
  • Dogtiti
  • ElricLiu

Poem

I twitch my ears at streams that flow,
Small PCM crumbs in audio snow.
I queue each nibble, hop in time,
While toggled nets go fetch and find.
New engines hum — a rabbit’s rhyme. 🐇🎧

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
app/components/chat.tsx (2)

287-287: Remove unused state variable speechCooldown.

This state variable and its setter are declared but never used throughout the component. Consider removing it to reduce code clutter.

-  const [speechCooldown, setSpeechCooldown] = useState(false);

302-311: Add error handling for Edge TTS branch.

The Edge TTS implementation lacks try-catch error handling that exists in other branches. This could lead to unhandled promise rejections.

   if (config.ttsConfig.engine === "Edge") {
     const edgeVoiceName = accessStore.edgeVoiceName();
     const tts = new MsEdgeTTS();
-    await tts.setMetadata(
-      edgeVoiceName,
-      OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3,
-    );
-    audioBuffer = await tts.toArrayBuffer(textContent);
-    playSpeech(audioBuffer);
+    try {
+      await tts.setMetadata(
+        edgeVoiceName,
+        OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3,
+      );
+      audioBuffer = await tts.toArrayBuffer(textContent);
+      playSpeech(audioBuffer);
+    } catch (e) {
+      console.error("[Edge Speech]", e);
+      showToast(prettyObject(e));
+      setSpeechStatus(false);
+      setSpeechLoading(false);
+    }
🧹 Nitpick comments (3)
app/utils/audio.ts (2)

57-58: Translate Chinese comment to English for consistency.

Please translate the comment to maintain code consistency across the codebase.

-      // 在流模式下,如果队列为空但流还没结束,等待
+      // In stream mode, if queue is empty but stream hasn't finished, wait

109-114: Consider logging when rejecting audio buffer additions to finished streams.

When streamFinished is true, the method silently ignores the addition. This could make debugging difficult.

 const addToQueue = (audioBuffer: ArrayBuffer | AudioBuffer) => {
   if (streamFinished) {
+    console.debug("[TTSPlayer] Cannot add to queue - stream already finished");
     return;
   }
   playQueue.push(audioBuffer);
 };
app/constant.ts (1)

483-488: Document the empty arrays for Edge TTS configuration.

The Edge TTS configuration has empty Model and Voices arrays. If this is intentional for dynamic loading, consider adding a comment to clarify this design decision.

 export const EDGE_TTS = {
     Provider: "Edge" as const,
     ModelProvider: ModelProvider.GPT,
+    // Models and voices are dynamically loaded for Edge TTS
     Model: [] as string[],
     Voices: [] as string[],
 } as const;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 557a2cc and 221229c.

📒 Files selected for processing (9)
  • .yarnrc.yml (1 hunks)
  • app/client/api.ts (1 hunks)
  • app/client/platforms/alibaba.ts (3 hunks)
  • app/components/chat.tsx (1 hunks)
  • app/components/tts-config.tsx (4 hunks)
  • app/constant.ts (3 hunks)
  • app/store/config.ts (3 hunks)
  • app/utils/audio.ts (2 hunks)
  • package.json (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
app/store/config.ts (1)

Learnt from: Noctiro
PR: #6139
File: app/store/access.ts:251-258
Timestamp: 2025-02-04T03:46:50.895Z
Learning: The getModelProvider function in app/utils/model.ts uses safe string operations to parse model names with optional provider (format: modelName@providerId), making explicit error handling unnecessary as it always returns a valid [model, provider] tuple where provider can be undefined.

🧬 Code Graph Analysis (1)
app/store/config.ts (1)
app/constant.ts (2)
  • DEFAULT_TTS_ENGINE (465-465)
  • TTSEngineType (490-490)
🔇 Additional comments (14)
.yarnrc.yml (1)

1-1: LGTM! Valid Yarn configuration.

Setting nodeLinker to node-modules is a standard configuration that switches from Yarn's PnP to traditional node_modules structure, which can improve compatibility with certain dependencies.

app/client/api.ts (1)

110-111: Well-designed API extension for streaming TTS support.

The changes properly extend the LLMApi interface to support streaming speech synthesis:

  • The speech method maintains backward compatibility by supporting both ArrayBuffer and AudioBuffer return types
  • The optional streamSpeech method allows implementations to provide streaming audio without breaking existing code
  • The AsyncGenerator<AudioBuffer> type is appropriate for streaming audio data
app/store/config.ts (2)

90-91: LGTM! Well-structured TTS configuration extension.

The addition of modelProvider field with proper typing and the updated engine typing to TTSEngineType properly supports the expanded TTS engine functionality while maintaining type safety.


143-145: LGTM! Consistent validator pattern.

The new modelProvider validator follows the established pattern used by other validators in the codebase and provides proper type safety.

app/components/chat.tsx (3)

296-296: Good refactoring for dynamic TTS provider support.

The change from hardcoded provider to configuration-based provider selection improves flexibility and supports multiple TTS engines.


312-337: Well-implemented streaming TTS with proper error handling.

The streaming implementation correctly:

  • Manages playback state before starting the stream
  • Uses async generator pattern for chunk processing
  • Handles errors with appropriate user feedback
  • Ensures loading state is cleared in the finally block

1350-1362: Good extraction of common audio playback logic.

The playSpeech helper function effectively eliminates code duplication and provides consistent error handling across different TTS branches.

app/utils/audio.ts (2)

32-53: Good implementation of flexible audio buffer handling!

The play method now elegantly handles both ArrayBuffer and AudioBuffer types, maintaining backward compatibility while supporting pre-decoded audio buffers for better performance.


132-149: Excellent async handling in stop method!

The stop method properly awaits audioContext.close() and comprehensively resets all state variables, ensuring clean shutdown.

app/constant.ts (2)

235-235: LGTM!

The addition of SpeechPath to the Alibaba configuration is consistent with the existing API path structure.


465-507: Excellent refactoring of TTS configuration!

The transformation from simple constants to a structured, typed configuration system is well-designed:

  • Type safety with const assertions
  • Centralized configuration management via TTS_CONFIGS
  • Clear separation of concerns for different TTS providers
  • Extensible design that supports adding new providers easily
app/client/platforms/alibaba.ts (3)

101-161: Well-implemented streaming speech functionality!

The streamSpeech method is well-structured with:

  • Proper async generator pattern for streaming
  • Comprehensive error handling and timeout management
  • Clean SSE parsing logic
  • Appropriate cleanup with reader.releaseLock()

398-409: Efficient PCM conversion implementation.

The pcm16ToFloat32 method correctly handles:

  • Little-endian byte order
  • Signed 16-bit integer conversion
  • Proper normalization to [-1, 1] range

368-369: Audio parameters validated
The hardcoded values already match Alibaba Cloud’s Qwen-TTS output (mono, 24 kHz PCM16). You can mark them in code for clarity or expose them as config if you anticipate supporting other services.

Suggested update:

   private convertToAudioBuffer(pcmData: Uint8Array) {
     const audioContext = new (window.AudioContext ||
       window.webkitAudioContext)();
+    // Alibaba Qwen-TTS returns 16-bit PCM WAV: mono (1 channel) at 24 000 Hz
     const channels = 1;
     const sampleRate = 24000;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)

98-100: The unimplemented speech method issue persists.

This is the same issue flagged in the previous review - the method still throws an unimplemented error without clear documentation about why Alibaba only supports streaming speech.

🧹 Nitpick comments (3)
app/client/platforms/alibaba.ts (3)

102-162: Well-structured streaming implementation with minor improvements needed.

The async generator approach for streaming TTS is well-implemented with proper SSE handling, error management, and cleanup. However, consider these improvements:

  1. Add null safety for audio data access (Line 150)
  2. Improve timeout handling - the timeout is cleared only on success
  3. Enhance buffer parsing robustness

Apply these improvements:

-            if (json.output.audio.data) {
+            if (json.output?.audio?.data) {
               yield this.PCMBase64ToAudioBuffer(json.output.audio.data);
             }
       const res = await fetch(speechPath, speechPayload);
+      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
 
       const reader = res.body!.getReader();

345-363: Consider using modern APIs and consistent error messages.

The base64 decoding works correctly, but consider these improvements:

  1. Use Uint8Array.from() for cleaner conversion
  2. Consistent error messages in English
-      // 解码 base64
-      const binaryString = atob(base64Data);
-      const bytes = new Uint8Array(binaryString.length);
-      for (let i = 0; i < binaryString.length; i++) {
-        bytes[i] = binaryString.charCodeAt(i);
-      }
+      // Decode base64
+      const binaryString = atob(base64Data);
+      const bytes = Uint8Array.from(binaryString, c => c.charCodeAt(0));
-      console.error("播放 PCM 数据失败:", error);
+      console.error("Failed to convert PCM data:", error);

366-400: AudioContext reuse partially implemented, but could be optimized further.

The lazy initialization of audioContext improves upon creating a new instance each time. However, the previous review suggestion of using a static getter method would be more efficient and follow better patterns.

The previous review's suggestion of a static AudioContext getter would still be beneficial:

+ private static audioContext: AudioContext | null = null;
+ 
+ private static getAudioContext(): AudioContext {
+   if (!QwenApi.audioContext) {
+     QwenApi.audioContext = new (window.AudioContext || 
+       window.webkitAudioContext)();
+   }
+   return QwenApi.audioContext;
+ }

  private convertToAudioBuffer(pcmData: Uint8Array) {
-   if (!this.audioContext) {
-     this.audioContext = new (window.AudioContext ||
-       window.webkitAudioContext)();
-   }
-   const audioContext = this.audioContext;
+   const audioContext = QwenApi.getAudioContext();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 221229c and fe484fd.

📒 Files selected for processing (1)
  • app/client/platforms/alibaba.ts (4 hunks)
🔇 Additional comments (2)
app/client/platforms/alibaba.ts (2)

67-67: Good addition of audioContext property for reuse.

Adding the private audioContext property is a step in the right direction for reusing the AudioContext instance, which addresses performance concerns from the previous review.


402-413: Correct PCM conversion implementation.

The 16-bit PCM to 32-bit float conversion is mathematically sound with proper signed integer handling and normalization. The bit manipulation and range conversion are implemented correctly.

@Little-LittleProgrammer Little-LittleProgrammer changed the title Feature/alibaba tts support Feature/ support (alibaba tts + alibaba function calling + network search) Aug 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)

101-103: Previous review comment still applies.

The speech method remains unimplemented as noted in the previous review. The suggestion to add documentation explaining why it's not implemented is still valid.

🧹 Nitpick comments (1)
app/client/platforms/alibaba.ts (1)

354-422: Make audio parameters configurable and improve error messages.

The audio processing implementation is technically sound, but consider these improvements:

  1. Hardcoded audio parameters: Consider making channels and sample rate configurable
  2. Internationalization: Replace Chinese error message with English
  3. Endianness assumption: Document that the PCM conversion assumes little-endian format
 // 播放 PCM base64 数据
 private async PCMBase64ToAudioBuffer(base64Data: string) {
   try {
     // 解码 base64
     const binaryString = atob(base64Data);
     const bytes = new Uint8Array(binaryString.length);
     for (let i = 0; i < binaryString.length; i++) {
       bytes[i] = binaryString.charCodeAt(i);
     }

     // 转换为 AudioBuffer
     const audioBuffer = await this.convertToAudioBuffer(bytes);

     return audioBuffer;
   } catch (error) {
-    console.error("播放 PCM 数据失败:", error);
+    console.error("Failed to convert PCM data to AudioBuffer:", error);
     throw error;
   }
 }

 // 将 PCM 字节数据转换为 AudioBuffer
-private convertToAudioBuffer(pcmData: Uint8Array) {
+private convertToAudioBuffer(pcmData: Uint8Array, channels = 1, sampleRate = 24000) {
   if (!this.audioContext) {
     this.audioContext = new (window.AudioContext ||
       window.webkitAudioContext)();
   }
   const audioContext = this.audioContext;
-  const channels = 1;
-  const sampleRate = 24000;
   return new Promise<AudioBuffer>((resolve, reject) => {

Consider adding JSDoc comments to document the endianness assumption:

-// 16位 PCM 转 32位浮点数
+/**
+ * Convert 16-bit PCM data to 32-bit float (assumes little-endian format)
+ */
 private pcm16ToFloat32(pcmData: Uint8Array) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e3f166 and 044298e.

⛔ Files ignored due to path filters (1)
  • app/icons/network.svg is excluded by !**/*.svg
📒 Files selected for processing (25)
  • app/client/platforms/alibaba.ts (8 hunks)
  • app/components/chat.tsx (6 hunks)
  • app/locales/ar.ts (1 hunks)
  • app/locales/bn.ts (1 hunks)
  • app/locales/cn.ts (1 hunks)
  • app/locales/cs.ts (1 hunks)
  • app/locales/da.ts (1 hunks)
  • app/locales/de.ts (2 hunks)
  • app/locales/en.ts (1 hunks)
  • app/locales/es.ts (1 hunks)
  • app/locales/fr.ts (1 hunks)
  • app/locales/id.ts (1 hunks)
  • app/locales/it.ts (1 hunks)
  • app/locales/jp.ts (1 hunks)
  • app/locales/ko.ts (1 hunks)
  • app/locales/no.ts (1 hunks)
  • app/locales/pt.ts (1 hunks)
  • app/locales/ru.ts (1 hunks)
  • app/locales/sk.ts (1 hunks)
  • app/locales/tr.ts (1 hunks)
  • app/locales/tw.ts (1 hunks)
  • app/locales/vi.ts (1 hunks)
  • app/store/config.ts (3 hunks)
  • app/utils.ts (2 hunks)
  • public/plugins.json (1 hunks)
✅ Files skipped from review due to trivial changes (21)
  • app/locales/cs.ts
  • public/plugins.json
  • app/locales/it.ts
  • app/locales/fr.ts
  • app/locales/vi.ts
  • app/locales/ko.ts
  • app/locales/pt.ts
  • app/locales/no.ts
  • app/locales/jp.ts
  • app/locales/id.ts
  • app/locales/ar.ts
  • app/locales/da.ts
  • app/locales/sk.ts
  • app/locales/en.ts
  • app/locales/ru.ts
  • app/locales/tw.ts
  • app/locales/de.ts
  • app/locales/tr.ts
  • app/locales/bn.ts
  • app/locales/cn.ts
  • app/locales/es.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/utils.ts
  • app/components/chat.tsx
  • app/store/config.ts
🔇 Additional comments (3)
app/client/platforms/alibaba.ts (3)

14-14: LGTM! Interface properly extended for function calling support.

The addition of FunctionToolItem import and the new tools and enable_search fields in RequestParam correctly support the function calling capabilities mentioned in the PR objectives.

Also applies to: 60-61


70-70: Good optimization! AudioContext reuse implemented.

Adding the private audioContext property enables reuse of the AudioContext instance across multiple audio conversions, addressing the performance concern from previous reviews.


203-203: LGTM! Function calling and network search properly integrated.

The modifications correctly implement:

  • Network search capability via enable_search parameter
  • Function calling support with proper tools integration
  • Safe argument concatenation to prevent undefined errors

Also applies to: 236-240, 278-278

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
app/client/platforms/alibaba.ts (2)

101-103: The previous review comment about documenting the unimplemented speech method still applies.

The speech method continues to throw an unimplemented error without explanation. Consider adding a comment explaining that Alibaba only supports streaming speech via streamSpeech method.


105-165: Previous suggestions for input validation, JSON parsing safety, and dynamic timeout haven't been implemented.

The streamSpeech method still lacks:

  1. Input parameter validation for required fields
  2. Safe JSON parsing with try-catch for malformed SSE data
  3. Dynamic timeout using getTimeoutMSByModel instead of hardcoded REQUEST_TIMEOUT_MS

These improvements would enhance robustness and prevent runtime errors.

🧹 Nitpick comments (1)
app/client/platforms/alibaba.ts (1)

354-426: Excellent audio processing implementation!

The audio processing methods properly implement:

  • Base64 decoding and PCM-to-AudioBuffer conversion
  • Static AudioContext reuse (addressing previous performance suggestions)
  • Correct 16-bit PCM to 32-bit float conversion with proper bit manipulation
  • Appropriate error handling throughout the pipeline

Consider using English comments for consistency with the rest of the codebase:

-  // 播放 PCM base64 数据
+  // Convert PCM base64 data to AudioBuffer
   private async PCMBase64ToAudioBuffer(base64Data: string) {
     try {
-      // 解码 base64
+      // Decode base64
       const binaryString = atob(base64Data);
       // ... rest of method

-  // 将 PCM 字节数据转换为 AudioBuffer  
+  // Convert PCM byte data to AudioBuffer
   private convertToAudioBuffer(pcmData: Uint8Array) {

-  // 16位 PCM 转 32位浮点数
+  // Convert 16-bit PCM to 32-bit float
   private pcm16ToFloat32(pcmData: Uint8Array) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 044298e and 86f2c67.

📒 Files selected for processing (1)
  • app/client/platforms/alibaba.ts (8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).

Applied to files:

  • app/client/platforms/alibaba.ts
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.

Applied to files:

  • app/client/platforms/alibaba.ts
🔇 Additional comments (3)
app/client/platforms/alibaba.ts (3)

2-7: LGTM!

The new imports for REQUEST_TIMEOUT_MS and FunctionToolItem are properly used in the streaming speech and function calling implementations.

Also applies to: 14-14


60-62: LGTM!

The interface extensions properly support function calling (tools) and network search (enable_search) features. The static audioContext property addresses the previous performance optimization suggestion.

Also applies to: 70-70


245-245: Passing an Empty Array for tools Is Safe

The streamWithThink signature in app/utils/chat.ts clearly defines tools: any[]. Supplying [] simply omits the tools field in the request payload (it only includes tools when the array is non-empty), which matches the intended “no tools” behavior. No further changes are needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
app/components/chat.tsx (1)

1307-1307: Unused state variable detected.

The speechCooldown state is declared but doesn't appear to be used anywhere in the visible code. Consider removing it if it's not needed or implement its usage if it serves a purpose.

-  const [speechCooldown, setSpeechCooldown] = useState(false);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86f2c67 and 9cb7275.

📒 Files selected for processing (1)
  • app/components/chat.tsx (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because the engine values are constrained to keys that exist in TTS_CONFIGS through the dropdown generation (Object.keys(TTS_CONFIGS)) and validation (TTSConfigValidator.engine()).

Applied to files:

  • app/components/chat.tsx
📚 Learning: in app/components/tts-config.tsx, the non-null assertion operator (!) on tts_configs[props.ttsconfig...
Learnt from: Little-LittleProgrammer
PR: ChatGPTNextWeb/NextChat#6588
File: app/components/tts-config.tsx:90-90
Timestamp: 2025-07-31T09:13:27.479Z
Learning: In app/components/tts-config.tsx, the non-null assertion operator (!) on TTS_CONFIGS[props.ttsConfig.engine] is safe because: 1) TTSEngineType is a union type that exactly matches TTS_CONFIGS keys, 2) dropdown options are generated from Object.keys(TTS_CONFIGS) constraining user input, 3) TTS_CONFIGS is typed as Record<TTSEngineType, TTSConfigItem> guaranteeing key-value correspondence.

Applied to files:

  • app/components/chat.tsx
🧬 Code Graph Analysis (1)
app/components/chat.tsx (6)
app/utils.ts (1)
  • canUseNetWork (299-306)
app/client/api.ts (2)
  • ClientApi (137-230)
  • config (186-186)
app/store/access.ts (1)
  • edgeVoiceName (169-173)
app/utils/ms_edge_tts.ts (1)
  • MsEdgeTTS (119-391)
app/components/ui-lib.tsx (1)
  • showToast (232-256)
app/utils/format.ts (1)
  • prettyObject (1-13)
🔇 Additional comments (6)
app/components/chat.tsx (6)

51-51: LGTM!

The new imports for NetWorkIcon and canUseNetWork utility function are correctly added and necessary for the network toggle feature.

Also applies to: 79-79


515-515: LGTM!

The network toggle implementation follows the established pattern used by other toggles in this component. State management and session updates are handled correctly.

Also applies to: 525-530


845-853: LGTM!

The network toggle UI implementation is consistent with other chat actions. Conditional rendering based on model capability and proper localization are correctly implemented.


1316-1316: Good refactoring of TTS configuration handling.

The dynamic API client initialization based on ttsConfig.modelProvider and the conditional engine handling improve the flexibility and maintainability of the TTS system.

Also applies to: 1319-1319, 1322-1322


1323-1330: LGTM!

The Edge TTS implementation is well-structured with proper async handling and consistent use of the MsEdgeTTS API. The use of the playSpeech helper function improves code organization.


1370-1382: Excellent refactoring with the playSpeech helper function.

The extraction of common speech playback logic into a dedicated helper function improves code organization and maintainability. Error handling and state management are properly implemented.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
app/client/platforms/alibaba.ts (2)

236-252: Likely bug: tools passed as [] into streamWithThink.

You attach tools to requestPayload.parameters.tools, but pass an empty array to streamWithThink. This may break the tool execution loop on subsequent turns.

         return streamWithThink(
           chatPath,
           requestPayload,
           headers,
-          [],
+          _tools,
           funcs,
           controller,

255-285: Make SSE parsing in chat resilient to malformed JSON and guard runTools index.

A bad chunk will currently throw and break the stream. Also guard when updating arguments by index.

-          (text: string, runTools: ChatMessageTool[]) => {
+          (text: string, runTools: ChatMessageTool[]) => {
             // console.log("parseSSE", text, runTools);
-            const json = JSON.parse(text);
+            let json: any;
+            try {
+              json = JSON.parse(text);
+            } catch (e) {
+              console.warn("[Alibaba Chat] Bad SSE JSON chunk:", e, text);
+              return { isThinking: false, content: "" };
+            }
-            const choices = json.output.choices as Array<{
+            const choices = json?.output?.choices as Array<{
               message: {
                 content: string | null | MultimodalContentForAlibaba[];
                 tool_calls: ChatMessageTool[];
                 reasoning_content: string | null;
               };
             }>;
@@
-            if (tool_calls?.length > 0) {
+            if (tool_calls?.length > 0) {
               const index = tool_calls[0]?.index;
               const id = tool_calls[0]?.id;
               const args = tool_calls[0]?.function?.arguments;
               if (id) {
                 runTools.push({
                   id,
                   type: tool_calls[0]?.type,
                   function: {
                     name: tool_calls[0]?.function?.name as string,
                     arguments: args,
                   },
                 });
               } else {
-                // @ts-ignore
-                runTools[index]["function"]["arguments"] += args || "";
+                if (
+                  typeof index === "number" &&
+                  runTools[index] &&
+                  runTools[index].function
+                ) {
+                  // @ts-ignore
+                  runTools[index]["function"]["arguments"] += args || "";
+                } else {
+                  console.warn(
+                    "[Alibaba Chat] tool_calls missing id and invalid index:",
+                    index,
+                  );
+                }
               }
             }
♻️ Duplicate comments (1)
app/client/platforms/alibaba.ts (1)

96-98: Implement or explicitly document the unimplemented speech() method.

Still throws. If Alibaba only supports streaming, add a clear comment and guide callers to use streamSpeech to avoid runtime surprises.

 async speech(options: SpeechOptions): Promise<ArrayBuffer> {
-  throw new Error("Method not implemented.");
+  // Note: Alibaba Qwen TTS supports streaming via streamSpeech only.
+  // Non-streaming speech is not supported by this provider.
+  throw new Error("Alibaba speech() is not supported. Use streamSpeech().");
 }
🧹 Nitpick comments (2)
app/client/platforms/alibaba.ts (2)

388-419: Optional: use copyToChannel for simpler, faster buffer writes.

Modern browsers support AudioBuffer.copyToChannel, avoiding manual loops.

-        // 复制数据到 AudioBuffer
-        for (let channel = 0; channel < channels; channel++) {
-          const channelData = audioBuffer.getChannelData(channel);
-          for (let i = 0; i < channelData.length; i++) {
-            channelData[i] = float32Array[i * channels + channel];
-          }
-        }
+        // 复制数据到 AudioBuffer
+        audioBuffer.copyToChannel(float32Array, 0);

389-432: Assumption check: 16-bit little-endian PCM.

pcm16ToFloat32 assumes 16-bit LE PCM with 1 channel. If server outputs different bit depth/channels, decoding will be wrong. Consider plumbing bit depth/channels from the API or response meta, or document the assumption inline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45eb96f and 800c96c.

📒 Files selected for processing (1)
  • app/client/platforms/alibaba.ts (8 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/client/platforms/alibaba.ts (6)
app/store/plugin.ts (1)
  • FunctionToolItem (25-32)
app/client/api.ts (1)
  • SpeechOptions (67-74)
app/constant.ts (1)
  • Alibaba (227-236)
app/utils.ts (2)
  • getTimeoutMSByModel (308-320)
  • fetch (368-376)
app/utils/stream.ts (1)
  • fetch (22-108)
app/utils/chat.ts (1)
  • streamWithThink (392-667)
🔇 Additional comments (2)
app/client/platforms/alibaba.ts (2)

55-57: Confirm server contract for new parameters (tools, enable_search).

Ensure Alibaba endpoint expects tools under parameters.tools and the network flag as enable_search (snake_case). If the API expects different nesting or naming, requests may be ignored or rejected.

Would you like a quick repo scan to confirm other platforms use the same shape, or to gate this behind a capability flag?


65-65: Good: Reusing a single AudioContext instance.

This addresses browser limits and CPU overhead from multiple contexts. LGTM.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
package.json (1)

96-100: Yarn versions are now consistent across packageManager and Volta.

Good fix—both are set to 1.22.19, resolving the earlier mismatch noted in previous reviews.

🧹 Nitpick comments (2)
package.json (2)

96-100: Add engines to communicate required Node/Yarn to non-Volta environments.

Volta enforces versions locally/CI if used, but adding engines helps external contributors and hosting platforms align tooling.

Apply this diff:

   "volta": {
     "node": "20.19.4",
     "yarn": "1.22.19"
-  }
+  },
+  "engines": {
+    "node": ">=20.19.4",
+    "yarn": "^1.22.19 <2"
+  }

96-100: Yarn configuration mismatch: Yarn 1.x pinned but .yarnrc.yml (Berry-only) still present

We’ve confirmed that:

  • package.json pins Yarn Classic v1.22.19 (both packageManager and Volta)
  • A .yarnrc.yml exists with the nodeLinker setting (Berry-only)

Yarn 1 ignores .yarnrc.yml, which can confuse contributors. Please choose one of these paths:

Option A (recommended): Migrate to Yarn Berry (3.x+)
– Update package.json to pin a Berry version, e.g.
```diff
"packageManager": "[email protected]",
"volta": {
"node": "20.19.4",

  •  "yarn": "1.22.19"
    
  •  "yarn": "3.6.0"
    
    }
    – Keep your existing `.yarnrc.yml` (Berry will use it)  
    
    

Option B: Stay on Yarn Classic (1.x)
– Remove or rename .yarnrc.yml → use a classic .yarnrc instead
– Drop Berry-only keys (nodeLinker, yarnPath, etc.)

Let me know which direction you’d like, and I can draft the necessary changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 800c96c and 16c3255.

📒 Files selected for processing (1)
  • package.json (1 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/client/platforms/alibaba.ts (1)

274-339: Wrap JSON.parse in try/catch inside parseSSE callback

Malformed or partial SSE chunks can throw and break the stream. Return an empty slice on parse error.

-            const json = JSON.parse(text);
+            let json: any;
+            try {
+              json = JSON.parse(text);
+            } catch (e) {
+              console.warn("[Alibaba SSE] JSON parse error:", e);
+              return { isThinking: false, content: "" };
+            }
♻️ Duplicate comments (2)
app/components/chat.tsx (1)

1335-1363: Streaming TTS: use English comment and show user-friendly error toast
The Chinese comment persists and the toast leaks technical details to users. Keep console logs technical; show a concise user-facing message.

-        if (api.llm.streamSpeech) {
-          // 使用流式播放,边接收边播放
+        if (api.llm.streamSpeech) {
+          // Use streaming playback for real-time audio
           setSpeechStatus(true);
           ttsPlayer.startStreamPlay(() => {
             setSpeechStatus(false);
           });
@@
-          } catch (e) {
-            console.error("[Stream Speech]", e);
-            showToast(prettyObject(e));
+          } catch (e) {
+            console.error("[Stream Speech]", e);
+            showToast("Speech generation failed. Please try again.");
             setSpeechStatus(false);
             ttsPlayer.stop();
           } finally {
             setSpeechLoading(false);
           }
app/client/platforms/alibaba.ts (1)

96-99: Unimplemented speech() should be documented or delegate to streamSpeech

speech() throws without context. This can surprise integrators when streamSpeech is unavailable. Either implement speech via a non-SSE endpoint (if supported) or document that Alibaba TTS is streaming-only and instruct callers to use streamSpeech.

   async speech(options: SpeechOptions): Promise<ArrayBuffer> {
-    throw new Error("Method not implemented.");
+    // Alibaba Qwen currently provides TTS via SSE streaming.
+    // Use streamSpeech(options) for incremental playback.
+    // Non-streaming speech synthesis is not supported at the moment.
+    throw new Error("[Alibaba TTS] Non-streaming speech is not supported; use streamSpeech instead.");
   }
🧹 Nitpick comments (7)
app/utils/audio.ts (3)

39-40: Tighten type of currentOnended

(() => void | null) | null is odd—either a function or null is sufficient. Use (() => void) | null for clarity.

-  private currentOnended: (() => void | null) | null = null;
+  private currentOnended: (() => void) | null = null;

115-120: Consider suspending instead of closing AudioContext to reduce churn

Closing the AudioContext on every stop forces a new context for subsequent plays and may hit browser limits. Suspending is often enough and cheaper; only close when tearing down the app.

If you want to keep close() as a hard stop, consider adding a separate destroy() and make stop() only stop playback and suspend the context.


122-132: Stream wait loop: prefer await over recursive setTimeout chain

playNext uses setTimeout + recursion every 100ms while waiting for streamed chunks. This can grow the call stack and complicate testing. A small async delay keeps the stack flat.

-      if (this.isStreamMode && !this.streamFinished) {
-        setTimeout(() => this.playNext(), 100);
-        return;
-      }
+      if (this.isStreamMode && !this.streamFinished) {
+        await new Promise((r) => setTimeout(r, 100));
+        return this.playNext();
+      }
app/components/chat.tsx (3)

1376-1388: Non-streaming playback: show user-friendly error toast and avoid leaking internals

Mirror the streaming branch behavior: keep console.error, present a clean toast.

   function playSpeech(audioBuffer: ArrayBuffer | AudioBuffer) {
     setSpeechStatus(true);
     ttsPlayer
       .play(audioBuffer, () => {
         setSpeechStatus(false);
       })
       .catch((e) => {
         console.error("[OpenAI Speech]", e);
-        showToast(prettyObject(e));
+        showToast("Speech playback failed. Please try again.");
         setSpeechStatus(false);
       })
       .finally(() => setSpeechLoading(false));
   }

1310-1311: Remove unused speechCooldown state

The speechCooldown state is declared but not used. Drop it to reduce noise.

-  const [speechCooldown, setSpeechCooldown] = useState(false);

848-856: Naming consistency: NetWork → Network

enableNetWork and canUseNetWork are inconsistent with common naming (“Network”). Consider renaming in a future cleanup to reduce confusion. Not blocking this PR.

Also applies to: 710-713, 515-531

app/client/platforms/alibaba.ts (1)

261-304: Tool-calls accumulation: guard array access and undefined args

When accumulating streamed tool arguments, runTools[index] might be undefined and args can be falsy. Guard both to avoid runtime errors.

-            if (tool_calls?.length > 0) {
+            if (tool_calls?.length > 0) {
               const index = tool_calls[0]?.index;
               const id = tool_calls[0]?.id;
               const args = tool_calls[0]?.function?.arguments;
               if (id) {
                 runTools.push({
                   id,
                   type: tool_calls[0]?.type,
                   function: {
                     name: tool_calls[0]?.function?.name as string,
                     arguments: args,
                   },
                 });
               } else {
-                // @ts-ignore
-                runTools[index]["function"]["arguments"] += args || "";
+                if (
+                  typeof index === "number" &&
+                  runTools[index] &&
+                  runTools[index]["function"]
+                ) {
+                  // @ts-ignore
+                  runTools[index]["function"]["arguments"] += args || "";
+                }
               }
             }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 16c3255 and bf999b9.

📒 Files selected for processing (4)
  • app/client/api.ts (2 hunks)
  • app/client/platforms/alibaba.ts (6 hunks)
  • app/components/chat.tsx (7 hunks)
  • app/utils/audio.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/client/api.ts
🧰 Additional context used
🧬 Code graph analysis (2)
app/client/platforms/alibaba.ts (5)
app/store/plugin.ts (1)
  • FunctionToolItem (25-32)
app/client/api.ts (1)
  • SpeechOptions (68-75)
app/utils/audio.ts (1)
  • TTSPlayManager (34-286)
app/constant.ts (1)
  • Alibaba (227-236)
app/utils.ts (2)
  • getTimeoutMSByModel (308-320)
  • fetch (368-376)
app/components/chat.tsx (6)
app/utils.ts (1)
  • canUseNetWork (299-306)
app/client/api.ts (2)
  • ClientApi (141-234)
  • config (190-190)
app/store/access.ts (1)
  • edgeVoiceName (169-173)
app/utils/ms_edge_tts.ts (1)
  • MsEdgeTTS (119-391)
app/components/ui-lib.tsx (1)
  • showToast (232-256)
app/utils/format.ts (1)
  • prettyObject (1-13)
🔇 Additional comments (2)
app/components/chat.tsx (1)

848-856: Network toggle locale keys verified
All app/locales/*.ts files define Chat.InputActions.NetWork.on and …off, so Locale.Chat.InputActions.NetWork[enableNetWork ? "on" : "off"] cannot be undefined at runtime. No changes required.

app/client/platforms/alibaba.ts (1)

218-231: Enable network search field integration looks good

Passing enable_search from modelConfig.enableNetWork into parameters matches the UI toggle and keeps the request payload minimal when false. LGTM.

Comment on lines +100 to +122
async *streamSpeech(
options: SpeechOptions,
audioManager?: TTSPlayManager,
): AsyncGenerator<AudioBuffer> {
if (!options.input || !options.model) {
throw new Error("Missing required parameters: input and model");
}
const requestPayload = {
model: options.model,
input: {
text: options.input,
voice: options.voice,
},
speed: options.speed,
response_format: options.response_format,
};
const controller = new AbortController();
options.onController?.(controller);

if (audioManager) {
audioManager.setStreamController(controller);
}
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

audioManager is optional but used with non-null assertion; guard or make it required

streamSpeech() references audioManager! which will throw if undefined. Either require the param or create a local TTSPlayManager fallback so decoding still works.

-  async *streamSpeech(
-    options: SpeechOptions,
-    audioManager?: TTSPlayManager,
-  ): AsyncGenerator<AudioBuffer> {
+  async *streamSpeech(
+    options: SpeechOptions,
+    audioManager?: TTSPlayManager,
+  ): AsyncGenerator<AudioBuffer> {
@@
-    if (audioManager) {
-      audioManager.setStreamController(controller);
-    }
+    const player = audioManager ?? new TTSPlayManager();
+    player.setStreamController(controller);
@@
-              if (json.output?.audio?.data) {
-                yield await audioManager!.pcmBase64ToAudioBuffer(
-                  json.output.audio.data,
-                  { channels: 1, sampleRate: 24000, bitDepth: 16 },
-                );
-              }
+              if (json.output?.audio?.data) {
+                const sr = json.output?.audio?.sample_rate ?? 24000;
+                yield await player.pcmBase64ToAudioBuffer(
+                  json.output.audio.data,
+                  { channels: 1, sampleRate: sr, bitDepth: 16 },
+                );
+              }
@@
-      if (audioManager) {
-        audioManager.clearStreamController();
-      }
+      player.clearStreamController();

Also applies to: 161-165, 185-189

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 100 to 122 (and similarly at
161-165 and 185-189), streamSpeech currently uses audioManager with a non-null
assertion which will throw if the optional parameter is undefined; update the
function to either (a) require audioManager by making it a mandatory parameter,
or (b) guard every usage by checking if audioManager exists before calling
methods (e.g., if (audioManager) audioManager.setStreamController(controller)),
and provide a local no-op or fallback TTSPlayManager instance for
decoding/playback when audioManager is not provided so the function safely
handles undefined without runtime errors.

Comment on lines +124 to +132
const speechPayload = {
method: "POST",
body: JSON.stringify(requestPayload),
signal: controller.signal,
headers: {
...getHeaders(),
"X-DashScope-SSE": "enable",
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden SSE request: set headers, check res.ok/body, and clear timeout on all paths

Missing Accept/Content-Type, no res.ok check, and no guard for res.body. Also ensure timeout is cleared in finally.

       const speechPayload = {
         method: "POST",
         body: JSON.stringify(requestPayload),
         signal: controller.signal,
         headers: {
           ...getHeaders(),
           "X-DashScope-SSE": "enable",
+          Accept: "text/event-stream",
+          "Content-Type": "application/json",
         },
       };
@@
-      const res = await fetch(speechPath, speechPayload);
-      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
+      const res = await fetch(speechPath, speechPayload);
+      if (!res.ok) {
+        const errText = await res.text().catch(() => "");
+        throw new Error(
+          `[Alibaba TTS] HTTP ${res.status} ${res.statusText} ${errText}`,
+        );
+      }
+      if (!res.body) {
+        throw new Error("[Alibaba TTS] Missing response body for SSE stream.");
+      }

And move timeout cleanup into finally (see next comment).

Also applies to: 140-146

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 124-132 (and similarly for lines
140-146), the SSE request is missing explicit Accept and Content-Type headers,
does not check res.ok or guard against a null res.body, and does not clear the
timeout on all code paths; update the speechPayload.headers to include "Accept":
"text/event-stream" and "Content-Type": "application/json" (or appropriate
content type), perform the fetch then check if (!res.ok) throw or handle the
error before proceeding, ensure you verify res.body exists before using it
(guard null), and move timeout cleanup into a finally block so the controller
timeout is cleared regardless of success or error.

Comment on lines +135 to +176
const requestTimeoutId = setTimeout(
() => controller.abort(),
getTimeoutMSByModel(options.model),
);

const res = await fetch(speechPath, speechPayload);
clearTimeout(requestTimeoutId); // Clear timeout on successful connection

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";

for (const line of lines) {
const data = line.slice(5);
try {
if (line.startsWith("data:")) {
const json = JSON.parse(data);
if (json.output?.audio?.data) {
yield await audioManager!.pcmBase64ToAudioBuffer(
json.output.audio.data,
{ channels: 1, sampleRate: 24000, bitDepth: 16 },
);
}
}
} catch (parseError) {
console.warn(
"[StreamSpeech] Failed to parse SSE data:",
parseError,
);
continue;
}
}
}
reader.releaseLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Always release reader and clear timeout (finally), and improve SSE parsing

Ensure requestTimeoutId and reader are cleaned up on all paths. Also, only slice when the line starts with data:, and handle the [DONE] sentinel. Keep parsing in try/catch.

-      const requestTimeoutId = setTimeout(
+      let requestTimeoutId: any = setTimeout(
         () => controller.abort(),
         getTimeoutMSByModel(options.model),
       );
-
-      const res = await fetch(speechPath, speechPayload);
-      const reader = res.body!.getReader();
+      const res = await fetch(speechPath, speechPayload);
+      let reader: ReadableStreamDefaultReader<Uint8Array> | undefined;
+      reader = res.body!.getReader();
       const decoder = new TextDecoder();
       let buffer = "";
       while (true) {
         const { done, value } = await reader.read();
         if (done) {
           break;
         }
         buffer += decoder.decode(value, { stream: true });
         const lines = buffer.split("\n");
         buffer = lines.pop() || "";
 
-        for (const line of lines) {
-          const data = line.slice(5);
+        for (const rawLine of lines) {
+          const line = rawLine.trim();
           try {
-            if (line.startsWith("data:")) {
-              const json = JSON.parse(data);
+            if (!line.startsWith("data:")) continue;
+            const data = line.slice(5).trim();
+            if (data === "[DONE]") {
+              // end-of-stream marker
+              break;
+            }
+            const json = JSON.parse(data);
             if (json.output?.audio?.data) {
-              yield await player.pcmBase64ToAudioBuffer(
-                json.output.audio.data,
-                { channels: 1, sampleRate: 24000, bitDepth: 16 },
-              );
+              const sr = json.output?.audio?.sample_rate ?? 24000;
+              yield await player.pcmBase64ToAudioBuffer(
+                json.output.audio.data,
+                { channels: 1, sampleRate: sr, bitDepth: 16 },
+              );
             }
           } catch (parseError) {
             console.warn(
               "[StreamSpeech] Failed to parse SSE data:",
               parseError,
             );
             continue;
           }
         }
       }
-      reader.releaseLock();
+      reader?.releaseLock();
@@
-    } finally {
-      if (audioManager) {
-        audioManager.clearStreamController();
-      }
+    } finally {
+      try {
+        clearTimeout(requestTimeoutId);
+      } catch {}
+      try {
+        // releasing is idempotent; safe to attempt
+        // @ts-ignore - reader may be undefined in some error paths
+        reader?.releaseLock?.();
+      } catch {}
+      try {
+        (audioManager ?? player)?.clearStreamController();
+      } catch {}
     }

Also applies to: 185-189

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 135-176 (and similarly 185-189),
the fetch stream handling doesn't guarantee cleanup and incorrectly slices every
line; wrap the read loop and downstream parsing in a try/finally so you always
clearTimeout(requestTimeoutId) and releaseLock() on the reader regardless of
errors or early returns, check line.startsWith("data:") before slicing and
handle the "[DONE]" sentinel (break or return when encountered), and keep
JSON.parse inside a try/catch around each data segment so parsing errors for one
event don't break the stream.

Comment on lines 1312 to 1316
async function openaiSpeech(text: string) {
if (speechStatus) {
ttsPlayer.stop();
setSpeechStatus(false);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Await ttsPlayer.stop() to match the async API and ensure cleanup before toggling state

After aligning TTSPlayer.stop to return Promise, await it here to avoid races (e.g., stop finishes after UI flips status).

-    if (speechStatus) {
-      ttsPlayer.stop();
+    if (speechStatus) {
+      await ttsPlayer.stop();
       setSpeechStatus(false);
     } else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function openaiSpeech(text: string) {
if (speechStatus) {
ttsPlayer.stop();
setSpeechStatus(false);
} else {
async function openaiSpeech(text: string) {
if (speechStatus) {
await ttsPlayer.stop();
setSpeechStatus(false);
} else {
🤖 Prompt for AI Agents
In app/components/chat.tsx around lines 1312 to 1316, the code calls
ttsPlayer.stop() without awaiting it which can race with state toggling; update
the call to await ttsPlayer.stop() (inside an async function) before calling
setSpeechStatus(false) so the stop operation completes and any cleanup finishes
before the UI state is flipped.

startStreamPlay: (onended: () => void | null) => void;
finishStreamPlay: () => void;
setStreamController: (controller: AbortController) => void;
clearStreamController: () => void;
stop: () => void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Interface vs implementation mismatch: stop() is async but typed as sync

TTSPlayManager.stop is async (returns Promise) but the TTSPlayer type declares stop: () => void. This will break “implements TTSPlayer” type checking and can leak unhandled promises at call sites.

Apply this diff to align the public API:

 type TTSPlayer = {
   init: () => void;
@@
-  stop: () => void;
+  stop: () => Promise<void>;

Also applies to: 86-121

🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 16 and 86-121, the TTSPlayer interface
declares stop: () => void but the concrete TTSPlayManager.stop implementation is
async (returns Promise<void>), causing an interface/implementation mismatch and
potential unhandled promises; change the TTSPlayer.stop signature to stop: () =>
Promise<void>, update any related type aliases/exports, ensure
TTSPlayManager.stop remains async or explicitly returns a Promise<void>, and
update all call sites to await or handle the returned promise so the public API
and implementations align.

Comment on lines +27 to +33
// Audio processing utilities
export interface PCMConfig {
channels?: number;
sampleRate?: number;
bitDepth?: 16 | 24 | 32;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Declare only supported bit depths (16-bit) or implement 24/32

PCMConfig advertises 16 | 24 | 32, but pcmDataToAudioBuffer only implements 16-bit and throws otherwise. Either narrow the type to 16 or finish the 24/32-bit implementations to avoid misleading API.

Minimal change (narrow the API):

 export interface PCMConfig {
   channels?: number;
   sampleRate?: number;
-  bitDepth?: 16 | 24 | 32;
+  bitDepth?: 16;
 }

Optionally, leave the runtime guard in place for defense-in-depth:

-        switch (bitDepth) {
-          case 16:
-            float32Array = this.pcm16ToFloat32(pcmData);
-            break;
-          default:
-            throw new Error(`Unsupported bit depth: ${bitDepth}`);
-        }
+        if (bitDepth !== 16) {
+          throw new Error(`Unsupported bit depth: ${bitDepth}`);
+        }
+        float32Array = this.pcm16ToFloat32(pcmData);

Also applies to: 242-249

Comment on lines +60 to +84
async play(
audioBuffer: ArrayBuffer | AudioBuffer,
onended: () => void | null,
) {
if (this.audioBufferSourceNode) {
this.audioBufferSourceNode.stop();
this.audioBufferSourceNode.disconnect();
}
let buffer: AudioBuffer;
if (audioBuffer instanceof AudioBuffer) {
buffer = audioBuffer;
} else {
buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);
}
this.audioBufferSourceNode =
TTSPlayManager.audioContext!.createBufferSource();
this.audioBufferSourceNode.buffer = buffer;
this.audioBufferSourceNode.connect(
TTSPlayManager.audioContext!.destination,
);
this.getAudioContext.resume().then(() => {
this.audioBufferSourceNode!.start();
});
audioBufferSourceNode.onended = onended;
};
this.audioBufferSourceNode.onended = onended;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the initialized AudioContext via the getter to avoid null deref

play() uses TTSPlayManager.audioContext directly for decodeAudioData/createBufferSource. If init() wasn’t called or the context was closed by stop(), this can be null. Always go through the getter which lazily initializes the context.

   async play(
     audioBuffer: ArrayBuffer | AudioBuffer,
     onended: () => void | null,
   ) {
@@
-    let buffer: AudioBuffer;
-    if (audioBuffer instanceof AudioBuffer) {
-      buffer = audioBuffer;
-    } else {
-      buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);
-    }
-    this.audioBufferSourceNode =
-      TTSPlayManager.audioContext!.createBufferSource();
+    const ctx = this.getAudioContext;
+    let buffer: AudioBuffer;
+    if (audioBuffer instanceof AudioBuffer) {
+      buffer = audioBuffer;
+    } else {
+      buffer = await ctx.decodeAudioData(audioBuffer);
+    }
+    this.audioBufferSourceNode = ctx.createBufferSource();
     this.audioBufferSourceNode.buffer = buffer;
-    this.audioBufferSourceNode.connect(
-      TTSPlayManager.audioContext!.destination,
-    );
-    this.getAudioContext.resume().then(() => {
+    this.audioBufferSourceNode.connect(ctx.destination);
+    ctx.resume().then(() => {
       this.audioBufferSourceNode!.start();
     });
     this.audioBufferSourceNode.onended = onended;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async play(
audioBuffer: ArrayBuffer | AudioBuffer,
onended: () => void | null,
) {
if (this.audioBufferSourceNode) {
this.audioBufferSourceNode.stop();
this.audioBufferSourceNode.disconnect();
}
let buffer: AudioBuffer;
if (audioBuffer instanceof AudioBuffer) {
buffer = audioBuffer;
} else {
buffer = await TTSPlayManager.audioContext!.decodeAudioData(audioBuffer);
}
this.audioBufferSourceNode =
TTSPlayManager.audioContext!.createBufferSource();
this.audioBufferSourceNode.buffer = buffer;
this.audioBufferSourceNode.connect(
TTSPlayManager.audioContext!.destination,
);
this.getAudioContext.resume().then(() => {
this.audioBufferSourceNode!.start();
});
audioBufferSourceNode.onended = onended;
};
this.audioBufferSourceNode.onended = onended;
}
async play(
audioBuffer: ArrayBuffer | AudioBuffer,
onended: () => void | null,
) {
if (this.audioBufferSourceNode) {
this.audioBufferSourceNode.stop();
this.audioBufferSourceNode.disconnect();
}
const ctx = this.getAudioContext;
let buffer: AudioBuffer;
if (audioBuffer instanceof AudioBuffer) {
buffer = audioBuffer;
} else {
buffer = await ctx.decodeAudioData(audioBuffer);
}
this.audioBufferSourceNode = ctx.createBufferSource();
this.audioBufferSourceNode.buffer = buffer;
this.audioBufferSourceNode.connect(ctx.destination);
ctx.resume().then(() => {
this.audioBufferSourceNode!.start();
});
this.audioBufferSourceNode.onended = onended;
}
🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 60–84, play() currently accesses
TTSPlayManager.audioContext directly which can be null; instead obtain the
initialized context via the existing getter (await this.getAudioContext()),
assign it to a local const (e.g., const ctx = await this.getAudioContext()),
then use ctx.decodeAudioData(...) and ctx.createBufferSource() and connect to
ctx.destination; use ctx.resume() for resuming before starting the source and
replace direct null assertions on TTSPlayManager.audioContext with the local ctx
to avoid null deref.

Comment on lines +251 to +264
// 创建 AudioBuffer
const audioBuffer = audioContext.createBuffer(
channels,
float32Array.length / channels,
sampleRate,
);

// 复制数据到 AudioBuffer
for (let channel = 0; channel < channels; channel++) {
const channelData = audioBuffer.getChannelData(channel);
for (let i = 0; i < channelData.length; i++) {
channelData[i] = float32Array[i * channels + channel];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Channel/sample alignment: guard against misaligned PCM and avoid fractional frame lengths

If pcmData length is not divisible by (bytesPerSample * channels), createBuffer will still accept length but the interleaving copy will read past the end or skip samples. Compute frameCount explicitly and validate alignment.

-        const audioBuffer = audioContext.createBuffer(
-          channels,
-          float32Array.length / channels,
-          sampleRate,
-        );
+        if (float32Array.length % channels !== 0) {
+          console.warn(
+            "[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame",
+          );
+        }
+        const frameCount = Math.floor(float32Array.length / channels);
+        const audioBuffer = audioContext.createBuffer(
+          channels,
+          frameCount,
+          sampleRate,
+        );
@@
-        for (let channel = 0; channel < channels; channel++) {
-          const channelData = audioBuffer.getChannelData(channel);
-          for (let i = 0; i < channelData.length; i++) {
-            channelData[i] = float32Array[i * channels + channel];
-          }
-        }
+        for (let channel = 0; channel < channels; channel++) {
+          const channelData = audioBuffer.getChannelData(channel);
+          for (let i = 0; i < frameCount; i++) {
+            channelData[i] = float32Array[i * channels + channel] ?? 0;
+          }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 创建 AudioBuffer
const audioBuffer = audioContext.createBuffer(
channels,
float32Array.length / channels,
sampleRate,
);
// 复制数据到 AudioBuffer
for (let channel = 0; channel < channels; channel++) {
const channelData = audioBuffer.getChannelData(channel);
for (let i = 0; i < channelData.length; i++) {
channelData[i] = float32Array[i * channels + channel];
}
}
// 创建 AudioBuffer
if (float32Array.length % channels !== 0) {
console.warn(
"[TTSPlayManager] PCM length not divisible by channel count; truncating last incomplete frame",
);
}
const frameCount = Math.floor(float32Array.length / channels);
const audioBuffer = audioContext.createBuffer(
channels,
frameCount,
sampleRate,
);
// 复制数据到 AudioBuffer
for (let channel = 0; channel < channels; channel++) {
const channelData = audioBuffer.getChannelData(channel);
for (let i = 0; i < frameCount; i++) {
channelData[i] = float32Array[i * channels + channel] ?? 0;
}
}
🤖 Prompt for AI Agents
In app/utils/audio.ts around lines 251 to 264, the code creates an AudioBuffer
using float32Array.length / channels without validating alignment, which may
read past the end or skip samples if pcm length isn't divisible by channels;
compute an explicit frameCount = Math.floor(float32Array.length / channels),
validate that float32Array.length === frameCount * channels (or else throw or
log and handle remaining bytes), use frameCount when creating the AudioBuffer
and in the channel copy loops, and ensure the inner copy bounds use frameCount
to avoid reading beyond the float32Array (or pad/truncate as appropriate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant