Skip to content
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
9990a89
feat: 添加 TTS 引擎配置,更新阿里巴巴语音接口,支持实时语音合成
Little-LittleProgrammer Jul 30, 2025
c5e6b12
feat: 更新语音合成接口,支持流式播放和多种音频格式
Little-LittleProgrammer Jul 30, 2025
e836dc0
refactor: 移除不必要的 TTS 配置和模型,回复runtime部分的内容
Little-LittleProgrammer Jul 30, 2025
221229c
refactor: 恢复runtime代码,去除调试console代码
Little-LittleProgrammer Jul 31, 2025
fe484fd
feat: 添加音频上下文管理,优化 PCM 数据转换为 AudioBuffer 的实现
Little-LittleProgrammer Jul 31, 2025
4e3f166
feat: 阿里巴巴千问模型支持 Function calling
Little-LittleProgrammer Aug 5, 2025
044298e
feat: 添加联网搜索功能,更新相关配置和多语言支持
Little-LittleProgrammer Aug 5, 2025
86f2c67
feat: 优化CR 代码,优化音频上下文管理,修复 PCM 数据转换逻辑,确保成功连接时清除超时
Little-LittleProgrammer Aug 5, 2025
9cb7275
feat: 更新网络配置管理,修复主题切换时的网络状态逻辑
Little-LittleProgrammer Aug 5, 2025
45eb96f
feat: 选择模型回复联网配置
Little-LittleProgrammer Aug 5, 2025
b73e65d
refactor: 回退 plugins.json
Little-LittleProgrammer Aug 8, 2025
800c96c
feat: 增加流式语音合成错误处理,优化请求超时逻辑
Little-LittleProgrammer Aug 11, 2025
16c3255
fix: 将 yarn 版本回退至 1.22.19,以保持与 packageManager 的一致性
Little-LittleProgrammer Aug 21, 2025
bf999b9
feat: 增强音频播放管理,新增 TTSPlayManager 类,优化流式语音合成逻辑,支持 PCM 数据和 base64 转换
Little-LittleProgrammer Aug 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodeLinker: node-modules
3 changes: 2 additions & 1 deletion app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ export interface LLMModelProvider {

export abstract class LLMApi {
abstract chat(options: ChatOptions): Promise<void>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer | AudioBuffer>;
abstract streamSpeech?(options: SpeechOptions): AsyncGenerator<AudioBuffer>;
abstract usage(): Promise<LLMUsage>;
abstract models(): Promise<LLMModel[]>;
}
Expand Down
163 changes: 160 additions & 3 deletions app/client/platforms/alibaba.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {
useChatStore,
ChatMessageTool,
usePluginStore,
FunctionToolItem,
} from "@/app/store";
import {
preProcessImageContentForAlibabaDashScope,
Expand Down Expand Up @@ -51,6 +52,8 @@ interface RequestParam {
repetition_penalty?: number;
top_p: number;
max_tokens?: number;
tools?: FunctionToolItem[];
enable_search?: boolean;
}
interface RequestPayload {
model: string;
Expand All @@ -59,6 +62,7 @@ interface RequestPayload {
}

export class QwenApi implements LLMApi {
private static audioContext: AudioContext | null = null;
path(path: string): string {
const accessStore = useAccessStore.getState();

Expand Down Expand Up @@ -89,10 +93,83 @@ export class QwenApi implements LLMApi {
return res?.output?.choices?.at(0)?.message?.content ?? "";
}

speech(options: SpeechOptions): Promise<ArrayBuffer> {
async speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}

async *streamSpeech(options: SpeechOptions): AsyncGenerator<AudioBuffer> {
if (!options.input || !options.model) {
throw new Error("Missing required parameters: input and model");
}
const requestPayload = {
model: options.model,
input: {
text: options.input,
voice: options.voice,
},
speed: options.speed,
response_format: options.response_format,
};
const controller = new AbortController();
options.onController?.(controller);
try {
const speechPath = this.path(Alibaba.SpeechPath);
const speechPayload = {
method: "POST",
body: JSON.stringify(requestPayload),
signal: controller.signal,
headers: {
...getHeaders(),
"X-DashScope-SSE": "enable",
},
};
Comment on lines +124 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden SSE request: set headers, check res.ok/body, and clear timeout on all paths

Missing Accept/Content-Type, no res.ok check, and no guard for res.body. Also ensure timeout is cleared in finally.

       const speechPayload = {
         method: "POST",
         body: JSON.stringify(requestPayload),
         signal: controller.signal,
         headers: {
           ...getHeaders(),
           "X-DashScope-SSE": "enable",
+          Accept: "text/event-stream",
+          "Content-Type": "application/json",
         },
       };
@@
-      const res = await fetch(speechPath, speechPayload);
-      clearTimeout(requestTimeoutId); // Clear timeout on successful connection
+      const res = await fetch(speechPath, speechPayload);
+      if (!res.ok) {
+        const errText = await res.text().catch(() => "");
+        throw new Error(
+          `[Alibaba TTS] HTTP ${res.status} ${res.statusText} ${errText}`,
+        );
+      }
+      if (!res.body) {
+        throw new Error("[Alibaba TTS] Missing response body for SSE stream.");
+      }

And move timeout cleanup into finally (see next comment).

Also applies to: 140-146

🤖 Prompt for AI Agents
In app/client/platforms/alibaba.ts around lines 124-132 (and similarly for lines
140-146), the SSE request is missing explicit Accept and Content-Type headers,
does not check res.ok or guard against a null res.body, and does not clear the
timeout on all code paths; update the speechPayload.headers to include "Accept":
"text/event-stream" and "Content-Type": "application/json" (or appropriate
content type), perform the fetch then check if (!res.ok) throw or handle the
error before proceeding, ensure you verify res.body exists before using it
(guard null), and move timeout cleanup into a finally block so the controller
timeout is cleared regardless of success or error.


// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
getTimeoutMSByModel(options.model),
);

const res = await fetch(speechPath, speechPayload);
clearTimeout(requestTimeoutId); // Clear timeout on successful connection

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";

for (const line of lines) {
const data = line.slice(5);
try {
if (line.startsWith("data:")) {
const json = JSON.parse(data);
if (json.output?.audio?.data) {
yield this.PCMBase64ToAudioBuffer(json.output.audio.data);
}
}
} catch (parseError) {
console.warn(
"[StreamSpeech] Failed to parse SSE data:",
parseError,
);
continue;
}
}
}
reader.releaseLock();
} catch (e) {
console.log("[Request] failed to make a speech request", e);
throw e;
}
}

async chat(options: ChatOptions) {
const modelConfig = {
...useAppConfig.getState().modelConfig,
Expand Down Expand Up @@ -129,6 +206,7 @@ export class QwenApi implements LLMApi {
temperature: modelConfig.temperature,
// max_tokens: modelConfig.max_tokens,
top_p: modelConfig.top_p === 1 ? 0.99 : modelConfig.top_p, // qwen top_p is should be < 1
enable_search: modelConfig.enableNetWork,
},
};

Expand Down Expand Up @@ -161,11 +239,16 @@ export class QwenApi implements LLMApi {
.getAsTools(
useChatStore.getState().currentSession().mask?.plugin || [],
);
// console.log("getAsTools", tools, funcs);
const _tools = tools as unknown as FunctionToolItem[];
if (_tools && _tools.length > 0) {
requestPayload.parameters.tools = _tools;
}
return streamWithThink(
chatPath,
requestPayload,
headers,
tools as any,
[],
funcs,
controller,
// parseSSE
Expand Down Expand Up @@ -198,7 +281,7 @@ export class QwenApi implements LLMApi {
});
} else {
// @ts-ignore
runTools[index]["function"]["arguments"] += args;
runTools[index]["function"]["arguments"] += args || "";
}
}

Expand Down Expand Up @@ -273,5 +356,79 @@ export class QwenApi implements LLMApi {
async models(): Promise<LLMModel[]> {
return [];
}

// 播放 PCM base64 数据
private async PCMBase64ToAudioBuffer(base64Data: string) {
try {
// 解码 base64
const binaryString = atob(base64Data);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}

// 转换为 AudioBuffer
const audioBuffer = await this.convertToAudioBuffer(bytes);

return audioBuffer;
} catch (error) {
console.error("播放 PCM 数据失败:", error);
throw error;
}
}

private static getAudioContext(): AudioContext {
if (!QwenApi.audioContext) {
QwenApi.audioContext = new (window.AudioContext ||
window.webkitAudioContext)();
}
return QwenApi.audioContext;
}

// 将 PCM 字节数据转换为 AudioBuffer
private convertToAudioBuffer(pcmData: Uint8Array) {
const audioContext = QwenApi.getAudioContext();
const channels = 1;
const sampleRate = 24000;
return new Promise<AudioBuffer>((resolve, reject) => {
try {
let float32Array;
// 16位 PCM 转换为 32位浮点数
float32Array = this.pcm16ToFloat32(pcmData);

// 创建 AudioBuffer
const audioBuffer = audioContext.createBuffer(
channels,
float32Array.length / channels,
sampleRate,
);

// 复制数据到 AudioBuffer
for (let channel = 0; channel < channels; channel++) {
const channelData = audioBuffer.getChannelData(channel);
for (let i = 0; i < channelData.length; i++) {
channelData[i] = float32Array[i * channels + channel];
}
}

resolve(audioBuffer);
} catch (error) {
reject(error);
}
});
}
// 16位 PCM 转 32位浮点数
private pcm16ToFloat32(pcmData: Uint8Array) {
const length = pcmData.length / 2;
const float32Array = new Float32Array(length);

for (let i = 0; i < length; i++) {
const int16 = (pcmData[i * 2 + 1] << 8) | pcmData[i * 2];
const int16Signed = int16 > 32767 ? int16 - 65536 : int16;
float32Array[i] = int16Signed / 32768;
}

return float32Array;
}
}
export { Alibaba };
Loading