Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
9990a89
feat: 添加 TTS 引擎配置,更新阿里巴巴语音接口,支持实时语音合成
Little-LittleProgrammer Jul 30, 2025
c5e6b12
feat: 更新语音合成接口,支持流式播放和多种音频格式
Little-LittleProgrammer Jul 30, 2025
e836dc0
refactor: 移除不必要的 TTS 配置和模型,回复runtime部分的内容
Little-LittleProgrammer Jul 30, 2025
221229c
refactor: 恢复runtime代码,去除调试console代码
Little-LittleProgrammer Jul 31, 2025
fe484fd
feat: 添加音频上下文管理,优化 PCM 数据转换为 AudioBuffer 的实现
Little-LittleProgrammer Jul 31, 2025
4e3f166
feat: 阿里巴巴千问模型支持 Function calling
Little-LittleProgrammer Aug 5, 2025
044298e
feat: 添加联网搜索功能,更新相关配置和多语言支持
Little-LittleProgrammer Aug 5, 2025
86f2c67
feat: 优化CR 代码,优化音频上下文管理,修复 PCM 数据转换逻辑,确保成功连接时清除超时
Little-LittleProgrammer Aug 5, 2025
9cb7275
feat: 更新网络配置管理,修复主题切换时的网络状态逻辑
Little-LittleProgrammer Aug 5, 2025
45eb96f
feat: 选择模型回复联网配置
Little-LittleProgrammer Aug 5, 2025
b73e65d
refactor: 回退 plugins.json
Little-LittleProgrammer Aug 8, 2025
800c96c
feat: 增加流式语音合成错误处理,优化请求超时逻辑
Little-LittleProgrammer Aug 11, 2025
16c3255
fix: 将 yarn 版本回退至 1.22.19,以保持与 packageManager 的一致性
Little-LittleProgrammer Aug 21, 2025
bf999b9
feat: 增强音频播放管理,新增 TTSPlayManager 类,优化流式语音合成逻辑,支持 PCM 数据和 base64 转换
Little-LittleProgrammer Aug 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodeLinker: node-modules
7 changes: 6 additions & 1 deletion app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import { XAIApi } from "./platforms/xai";
import { ChatGLMApi } from "./platforms/glm";
import { SiliconflowApi } from "./platforms/siliconflow";
import { Ai302Api } from "./platforms/ai302";
import type { TTSPlayManager } from "../utils/audio";

export const ROLES = ["system", "user", "assistant"] as const;
export type MessageRole = (typeof ROLES)[number];
Expand Down Expand Up @@ -107,7 +108,11 @@ export interface LLMModelProvider {

export abstract class LLMApi {
abstract chat(options: ChatOptions): Promise<void>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer | AudioBuffer>;
abstract streamSpeech?(
options: SpeechOptions,
audioManager?: TTSPlayManager,
): AsyncGenerator<AudioBuffer>;
abstract usage(): Promise<LLMUsage>;
abstract models(): Promise<LLMModel[]>;
}
Expand Down
108 changes: 105 additions & 3 deletions app/client/platforms/alibaba.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ import {
useChatStore,
ChatMessageTool,
usePluginStore,
FunctionToolItem,
} from "@/app/store";
import { TTSPlayManager } from "@/app/utils/audio";
import {
preProcessImageContentForAlibabaDashScope,
streamWithThink,
Expand Down Expand Up @@ -51,6 +53,8 @@ interface RequestParam {
repetition_penalty?: number;
top_p: number;
max_tokens?: number;
tools?: FunctionToolItem[];
enable_search?: boolean;
}
interface RequestPayload {
model: string;
Expand Down Expand Up @@ -89,10 +93,102 @@ export class QwenApi implements LLMApi {
return res?.output?.choices?.at(0)?.message?.content ?? "";
}

speech(options: SpeechOptions): Promise<ArrayBuffer> {
async speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}

async *streamSpeech(
options: SpeechOptions,
audioManager?: TTSPlayManager,
): AsyncGenerator<AudioBuffer> {
if (!options.input || !options.model) {
throw new Error("Missing required parameters: input and model");
}
const requestPayload = {
model: options.model,
input: {
text: options.input,
voice: options.voice,
},
speed: options.speed,
response_format: options.response_format,
};
const controller = new AbortController();
options.onController?.(controller);

if (audioManager) {
audioManager.setStreamController(controller);
}
try {
Comment on lines +100 to +122
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

audioManager is optional but used with non-null assertion; guard or make it required

streamSpeech() references audioManager! which will throw if undefined. Either require the param or create a local TTSPlayManager fallback so decoding still works.

-  async *streamSpeech(
-    options: SpeechOptions,
-    audioManager?: TTSPlayManager,
-  ): AsyncGenerator<AudioBuffer> {
+  async *streamSpeech(
+    options: SpeechOptions,
+    audioManager?: TTSPlayManager,
+  ): AsyncGenerator<AudioBuffer> {
@@
-    if (audioManager) {
-      audioManager.setStreamController(controller);
-    }
+    const player = audioManager ?? new TTSPlayManager();
+    player.setStreamController(controller);
@@
-              if (json.output?.audio?.data) {
-                yield await audioManager!.pcmBase64ToAudioBuffer(
-                  json.output.audio.data,
-                  { channels: 1, sampleRate: 24000, bitDepth: 16 },
-                );
-              }
+              if (json.output?.audio?.data) {
+                const sr = json.output?.audio?.sample_rate ?? 24000;
+                yield await player.pcmBase64ToAudioBuffer(
+                  json.output.audio.data,
+                  { channels: 1, sampleRate: sr, bitDepth: 16 },
+                );
+              }
@@
-      if (audioManager) {
-        audioManager.clearStreamController();
-      }
+      player.clearStreamController();

Also applies to: 161-165, 185-189

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 100 to 122 (and similarly at
161-165 and 185-189), streamSpeech currently uses audioManager with a non-null
assertion which will throw if the optional parameter is undefined; update the
function to either (a) require audioManager by making it a mandatory parameter,
or (b) guard every usage by checking if audioManager exists before calling
methods (e.g., if (audioManager) audioManager.setStreamController(controller)),
and provide a local no-op or fallback TTSPlayManager instance for
decoding/playback when audioManager is not provided so the function safely
handles undefined without runtime errors.

const speechPath = this.path(Alibaba.SpeechPath);
const speechPayload = {
method: "POST",
body: JSON.stringify(requestPayload),
signal: controller.signal,
headers: {
...getHeaders(),
"X-DashScope-SSE": "enable",
},
};
Comment on lines +124 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden SSE request: set headers, check res.ok/body, and clear timeout on all paths

Missing Accept/Content-Type, no res.ok check, and no guard for res.body. Also ensure timeout is cleared in finally.

       const speechPayload = {
         method: "POST",
         body: JSON.stringify(requestPayload),
         signal: controller.signal,
         headers: {
           ...getHeaders(),
           "X-DashScope-SSE": "enable",
+          Accept: "text/event-stream",
+          "Content-Type": "application/json",
         },
       };
@@
-      const res = await fetch(speechPath, speechPayload);
-      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
+      const res = await fetch(speechPath, speechPayload);
+      if (!res.ok) {
+        const errText = await res.text().catch(() => "");
+        throw new Error(
+          `[Alibaba TTS] HTTP ${res.status} ${res.statusText} ${errText}`,
+        );
+      }
+      if (!res.body) {
+        throw new Error("[Alibaba TTS] Missing response body for SSE stream.");
+      }

And move timeout cleanup into finally (see next comment).

Also applies to: 140-146

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 124-132 (and similarly for lines
140-146), the SSE request is missing explicit Accept and Content-Type headers,
does not check res.ok or guard against a null res.body, and does not clear the
timeout on all code paths; update the speechPayload.headers to include "Accept":
"text/event-stream" and "Content-Type": "application/json" (or appropriate
content type), perform the fetch then check if (!res.ok) throw or handle the
error before proceeding, ensure you verify res.body exists before using it
(guard null), and move timeout cleanup into a finally block so the controller
timeout is cleared regardless of success or error.


// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
getTimeoutMSByModel(options.model),
);

const res = await fetch(speechPath, speechPayload);
clearTimeout(requestTimeoutId); // Clear timeout on successful connection

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";

for (const line of lines) {
const data = line.slice(5);
try {
if (line.startsWith("data:")) {
const json = JSON.parse(data);
if (json.output?.audio?.data) {
yield await audioManager!.pcmBase64ToAudioBuffer(
json.output.audio.data,
{ channels: 1, sampleRate: 24000, bitDepth: 16 },
);
}
}
} catch (parseError) {
console.warn(
"[StreamSpeech] Failed to parse SSE data:",
parseError,
);
continue;
}
}
}
reader.releaseLock();
Comment on lines +135 to +176
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Always release reader and clear timeout (finally), and improve SSE parsing

Ensure requestTimeoutId and reader are cleaned up on all paths. Also, only slice when the line starts with data:, and handle the [DONE] sentinel. Keep parsing in try/catch.

-      const requestTimeoutId = setTimeout(
+      let requestTimeoutId: any = setTimeout(
         () => controller.abort(),
         getTimeoutMSByModel(options.model),
       );
-
-      const res = await fetch(speechPath, speechPayload);
-      const reader = res.body!.getReader();
+      const res = await fetch(speechPath, speechPayload);
+      let reader: ReadableStreamDefaultReader<Uint8Array> | undefined;
+      reader = res.body!.getReader();
       const decoder = new TextDecoder();
       let buffer = "";
       while (true) {
         const { done, value } = await reader.read();
         if (done) {
           break;
         }
         buffer += decoder.decode(value, { stream: true });
         const lines = buffer.split("\n");
         buffer = lines.pop() || "";
 
-        for (const line of lines) {
-          const data = line.slice(5);
+        for (const rawLine of lines) {
+          const line = rawLine.trim();
           try {
-            if (line.startsWith("data:")) {
-              const json = JSON.parse(data);
+            if (!line.startsWith("data:")) continue;
+            const data = line.slice(5).trim();
+            if (data === "[DONE]") {
+              // end-of-stream marker
+              break;
+            }
+            const json = JSON.parse(data);
             if (json.output?.audio?.data) {
-              yield await player.pcmBase64ToAudioBuffer(
-                json.output.audio.data,
-                { channels: 1, sampleRate: 24000, bitDepth: 16 },
-              );
+              const sr = json.output?.audio?.sample_rate ?? 24000;
+              yield await player.pcmBase64ToAudioBuffer(
+                json.output.audio.data,
+                { channels: 1, sampleRate: sr, bitDepth: 16 },
+              );
             }
           } catch (parseError) {
             console.warn(
               "[StreamSpeech] Failed to parse SSE data:",
               parseError,
             );
             continue;
           }
         }
       }
-      reader.releaseLock();
+      reader?.releaseLock();
@@
-    } finally {
-      if (audioManager) {
-        audioManager.clearStreamController();
-      }
+    } finally {
+      try {
+        clearTimeout(requestTimeoutId);
+      } catch {}
+      try {
+        // releasing is idempotent; safe to attempt
+        // @ts-ignore - reader may be undefined in some error paths
+        reader?.releaseLock?.();
+      } catch {}
+      try {
+        (audioManager ?? player)?.clearStreamController();
+      } catch {}
     }

Also applies to: 185-189

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 135-176 (and similarly 185-189),
the fetch stream handling doesn't guarantee cleanup and incorrectly slices every
line; wrap the read loop and downstream parsing in a try/finally so you always
clearTimeout(requestTimeoutId) and releaseLock() on the reader regardless of
errors or early returns, check line.startsWith("data:") before slicing and
handle the "[DONE]" sentinel (break or return when encountered), and keep
JSON.parse inside a try/catch around each data segment so parsing errors for one
event don't break the stream.

} catch (e) {
// 如果是用户主动取消(AbortError),则不作为错误处理
if (e instanceof Error && e.name === "AbortError") {
console.log("[Request] Stream speech was aborted by user");
return; // 正常退出,不抛出错误
}
console.log("[Request] failed to make a speech request", e);
throw e;
} finally {
if (audioManager) {
audioManager.clearStreamController();
}
}
}

async chat(options: ChatOptions) {
const modelConfig = {
...useAppConfig.getState().modelConfig,
Expand Down Expand Up @@ -129,6 +225,7 @@ export class QwenApi implements LLMApi {
temperature: modelConfig.temperature,
// max_tokens: modelConfig.max_tokens,
top_p: modelConfig.top_p === 1 ? 0.99 : modelConfig.top_p, // qwen top_p is should be < 1
enable_search: modelConfig.enableNetWork,
},
};

Expand Down Expand Up @@ -161,11 +258,16 @@ export class QwenApi implements LLMApi {
.getAsTools(
useChatStore.getState().currentSession().mask?.plugin || [],
);
// console.log("getAsTools", tools, funcs);
const _tools = tools as unknown as FunctionToolItem[];
if (_tools && _tools.length > 0) {
requestPayload.parameters.tools = _tools;
}
return streamWithThink(
chatPath,
requestPayload,
headers,
tools as any,
[],
funcs,
controller,
// parseSSE
Expand Down Expand Up @@ -198,7 +300,7 @@ export class QwenApi implements LLMApi {
});
} else {
// @ts-ignore
runTools[index]["function"]["arguments"] += args;
runTools[index]["function"]["arguments"] += args || "";
}
}

Expand Down
101 changes: 79 additions & 22 deletions app/components/chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ import PluginIcon from "../icons/plugin.svg";
import ShortcutkeyIcon from "../icons/shortcutkey.svg";
import McpToolIcon from "../icons/tool.svg";
import HeadphoneIcon from "../icons/headphone.svg";
import NetWorkIcon from "../icons/network.svg";
import {
BOT_HELLO,
ChatMessage,
Expand Down Expand Up @@ -75,6 +76,7 @@ import {
useMobileScreen,
selectOrCopy,
showPlugins,
canUseNetWork,
} from "../utils";

import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
Expand All @@ -101,8 +103,6 @@ import {
import { useNavigate } from "react-router-dom";
import {
CHAT_PAGE_SIZE,
DEFAULT_TTS_ENGINE,
ModelProvider,
Path,
REQUEST_TIMEOUT_MS,
ServiceProvider,
Expand Down Expand Up @@ -512,6 +512,7 @@ export function ChatActions(props: {

// switch themes
const theme = config.theme;
const enableNetWork = session.mask.modelConfig.enableNetWork || false;

function nextTheme() {
const themes = [Theme.Auto, Theme.Light, Theme.Dark];
Expand All @@ -521,6 +522,13 @@ export function ChatActions(props: {
config.update((config) => (config.theme = nextTheme));
}

function nextNetWork() {
chatStore.updateTargetSession(session, (session) => {
session.mask.modelConfig.enableNetWork =
!session.mask.modelConfig.enableNetWork;
});
}

// stop all responses
const couldStop = ChatControllerPool.hasPending();
const stopAll = () => ChatControllerPool.stopAll();
Expand Down Expand Up @@ -699,6 +707,9 @@ export function ChatActions(props: {
session.mask.modelConfig.providerName =
providerName as ServiceProvider;
session.mask.syncGlobalConfig = false;
session.mask.modelConfig.enableNetWork = canUseNetWork(model)
? session.mask.modelConfig.enableNetWork
: false;
});
if (providerName == "ByteDance") {
const selectedModel = models.find(
Expand Down Expand Up @@ -833,6 +844,16 @@ export function ChatActions(props: {
/>
)}
{!isMobileScreen && <MCPAction />}

{canUseNetWork(currentModel) && (
<ChatAction
onClick={nextNetWork}
text={
Locale.Chat.InputActions.NetWork[enableNetWork ? "on" : "off"]
}
icon={<NetWorkIcon />}
/>
)}
</>
<div className={styles["chat-input-actions-end"]}>
{config.realtimeConfig.enable && (
Expand Down Expand Up @@ -1286,50 +1307,86 @@ function _Chat() {
const accessStore = useAccessStore();
const [speechStatus, setSpeechStatus] = useState(false);
const [speechLoading, setSpeechLoading] = useState(false);
const [speechCooldown, setSpeechCooldown] = useState(false);

async function openaiSpeech(text: string) {
if (speechStatus) {
ttsPlayer.stop();
setSpeechStatus(false);
} else {
Comment on lines 1312 to 1316
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Await ttsPlayer.stop() to match the async API and ensure cleanup before toggling state

After aligning TTSPlayer.stop to return Promise, await it here to avoid races (e.g., stop finishes after UI flips status).

-    if (speechStatus) {
-      ttsPlayer.stop();
+    if (speechStatus) {
+      await ttsPlayer.stop();
       setSpeechStatus(false);
     } else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function openaiSpeech(text: string) {
if (speechStatus) {
ttsPlayer.stop();
setSpeechStatus(false);
} else {
async function openaiSpeech(text: string) {
if (speechStatus) {
await ttsPlayer.stop();
setSpeechStatus(false);
} else {
🤖 Prompt for AI Agents
In app/components/chat.tsx around lines 1312 to 1316, the code calls
ttsPlayer.stop() without awaiting it which can race with state toggling; update
the call to await ttsPlayer.stop() (inside an async function) before calling
setSpeechStatus(false) so the stop operation completes and any cleanup finishes
before the UI state is flipped.

var api: ClientApi;
api = new ClientApi(ModelProvider.GPT);
const config = useAppConfig.getState();
api = new ClientApi(config.ttsConfig.modelProvider);
setSpeechLoading(true);
ttsPlayer.init();
let audioBuffer: ArrayBuffer;
let audioBuffer: ArrayBuffer | AudioBuffer;
const { markdownToTxt } = require("markdown-to-txt");
const textContent = markdownToTxt(text);
if (config.ttsConfig.engine !== DEFAULT_TTS_ENGINE) {
if (config.ttsConfig.engine === "Edge") {
const edgeVoiceName = accessStore.edgeVoiceName();
const tts = new MsEdgeTTS();
await tts.setMetadata(
edgeVoiceName,
OUTPUT_FORMAT.AUDIO_24KHZ_96KBITRATE_MONO_MP3,
);
audioBuffer = await tts.toArrayBuffer(textContent);
playSpeech(audioBuffer);
} else {
audioBuffer = await api.llm.speech({
model: config.ttsConfig.model,
input: textContent,
voice: config.ttsConfig.voice,
speed: config.ttsConfig.speed,
});
if (api.llm.streamSpeech) {
// 使用流式播放,边接收边播放
setSpeechStatus(true);
ttsPlayer.startStreamPlay(() => {
setSpeechStatus(false);
});

try {
for await (const chunk of api.llm.streamSpeech(
{
model: config.ttsConfig.model,
input: textContent,
voice: config.ttsConfig.voice,
speed: config.ttsConfig.speed,
},
ttsPlayer,
)) {
ttsPlayer.addToQueue(chunk);
}
ttsPlayer.finishStreamPlay();
} catch (e) {
console.error("[Stream Speech]", e);
showToast(prettyObject(e));
setSpeechStatus(false);
ttsPlayer.stop();
} finally {
setSpeechLoading(false);
}
} else {
audioBuffer = await api.llm.speech({
model: config.ttsConfig.model,
input: textContent,
voice: config.ttsConfig.voice,
speed: config.ttsConfig.speed,
});
playSpeech(audioBuffer);
}
}
setSpeechStatus(true);
ttsPlayer
.play(audioBuffer, () => {
setSpeechStatus(false);
})
.catch((e) => {
console.error("[OpenAI Speech]", e);
showToast(prettyObject(e));
setSpeechStatus(false);
})
.finally(() => setSpeechLoading(false));
}
}

function playSpeech(audioBuffer: ArrayBuffer | AudioBuffer) {
setSpeechStatus(true);
ttsPlayer
.play(audioBuffer, () => {
setSpeechStatus(false);
})
.catch((e) => {
console.error("[OpenAI Speech]", e);
showToast(prettyObject(e));
setSpeechStatus(false);
})
.finally(() => setSpeechLoading(false));
}

const context: RenderMessage[] = useMemo(() => {
return session.mask.hideContext ? [] : session.mask.context.slice();
}, [session.mask.context, session.mask.hideContext]);
Expand Down
Loading