Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: invoke anthropic SDK to call Claude #197

Merged
merged 5 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion apps/site/docs/en/model-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON='{"apiVersion": "2024-11-01-previe

## Choose a model other than `gpt-4o`

We find that `gpt-4o` performs the best for Midscene at this moment. The other known supported models are: `gemini-1.5-pro`, `qwen-vl-max-latest`, `doubao-vision-pro-32k`
We find that `gpt-4o` performs the best for Midscene at this moment. The other known supported models are `claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest`, `doubao-vision-pro-32k`

If you want to use other models, please follow these steps:

Expand All @@ -49,6 +49,18 @@ If you want to use other models, please follow these steps:
3. If you find it not working well after changing the model, you can try using some short and clear prompt (or roll back to the previous model). See more details in [Prompting Tips](./prompting-tips.html).
4. Remember to follow the terms of use of each model.

## Example: Using `claude-3-opus-20240229` from Anthropic

When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.

Configure the environment variables:

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## Example: Using `gemini-1.5-pro` from Google

Configure the environment variables:
Expand Down Expand Up @@ -80,3 +92,9 @@ export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```

## Troubleshooting LLM Service Connectivity Issues

If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)

Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test`.
36 changes: 27 additions & 9 deletions apps/site/docs/zh/model-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ export MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON='{"apiVersion": "2024-11-01-previe

## 选用 `gpt-4o` 以外的其他模型

我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有:`qwen-vl-max-latest` (千问), `gemini-1.5-pro`, `doubao-vision-pro-32k` (豆包)
我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有:`claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest` (千问), `doubao-vision-pro-32k` (豆包

如果你想要使用其他模型,请遵循以下步骤:

Expand All @@ -46,24 +46,36 @@ export MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON='{"apiVersion": "2024-11-01-previe
3. 如果发现使用新模型后效果不佳,可以尝试使用一些简短且清晰的提示词(或回滚到之前的模型)。更多详情请参阅 [Prompting Tips](./prompting-tips.html)
4. 请遵守各模型的使用条款。

## 示例:使用 Google 的 `gemini-1.5-pro` 模型
## 示例:使用阿里云的 `qwen-vl-max-latest` 模型

配置环境变量:

```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
```

## 示例:使用阿里云的 `qwen-vl-max-latest` 模型
## 示例:使用 Anthropic 的 `claude-3-opus-20240229` 模型

当配置 `MIDSCENE_USE_ANTHROPIC_SDK=1` 时,Midscene 会使用 Anthropic SDK (`@anthropic-ai/sdk`) 来调用模型。

配置环境变量:

```bash
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## 示例:使用 Google 的 `gemini-1.5-pro` 模型

配置环境变量:

```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
```

## 示例:使用火山云的豆包 `doubao-vision-pro-32k` 模型
Expand All @@ -77,3 +89,9 @@ export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```

## 调试 LLM 服务连接问题

如果你想要调试 LLM 服务连接问题,可以使用示例项目中的 `connectivity-test` 目录:[https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)

将你的 `.env` 文件放在 `connectivity-test` 文件夹中,然后运行 `npm i && npm run test` 来查看问题。
1 change: 1 addition & 0 deletions packages/midscene/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
"prepublishOnly": "npm run build"
},
"dependencies": {
"@anthropic-ai/sdk": "0.33.1",
"@azure/identity": "4.5.0",
"@midscene/shared": "workspace:*",
"dirty-json": "0.9.2",
Expand Down
21 changes: 11 additions & 10 deletions packages/midscene/src/ai-model/common.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import assert from 'node:assert';
import { MIDSCENE_MODEL_TEXT_ONLY, getAIConfig } from '@/env';
import type { AIUsageInfo } from '@/types';

import type {
ChatCompletionContentPart,
ChatCompletionSystemMessageParam,
ChatCompletionUserMessageParam,
} from 'openai/resources';
import { callToGetJSONObject, preferOpenAIModel } from './openai';
import { callToGetJSONObject, checkAIConfig } from './openai';

export type AIArgs = [
ChatCompletionSystemMessageParam,
Expand All @@ -24,17 +26,16 @@ export async function callAiFn<T>(options: {
AIActionType: AIActionType;
}): Promise<{ content: T; usage?: AIUsageInfo }> {
const { msgs, AIActionType: AIActionTypeValue } = options;
if (preferOpenAIModel('openAI')) {
const { content, usage } = await callToGetJSONObject<T>(
msgs,
AIActionTypeValue,
);
return { content, usage };
}
assert(
checkAIConfig(),
'Cannot find config for AI model service. You should set it before using. https://midscenejs.com/model-provider.html',
);

throw Error(
'Cannot find OpenAI config. You should set it before using. https://midscenejs.com/model-provider.html',
const { content, usage } = await callToGetJSONObject<T>(
msgs,
AIActionTypeValue,
);
return { content, usage };
}

export function transformUserMessages(msgs: ChatCompletionContentPart[]) {
Expand Down
127 changes: 100 additions & 27 deletions packages/midscene/src/ai-model/openai/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import assert from 'node:assert';
import { AIResponseFormat, type AIUsageInfo } from '@/types';
import { Anthropic } from '@anthropic-ai/sdk';
import {
DefaultAzureCredential,
getBearerTokenProvider,
Expand All @@ -10,6 +11,7 @@ import OpenAI, { AzureOpenAI } from 'openai';
import type { ChatCompletionMessageParam } from 'openai/resources';
import { SocksProxyAgent } from 'socks-proxy-agent';
import {
ANTHROPIC_API_KEY,
MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON,
MIDSCENE_AZURE_OPENAI_SCOPE,
MIDSCENE_DANGEROUSLY_PRINT_ALL_CONFIG,
Expand All @@ -18,6 +20,7 @@ import {
MIDSCENE_MODEL_NAME,
MIDSCENE_OPENAI_INIT_CONFIG_JSON,
MIDSCENE_OPENAI_SOCKS_PROXY,
MIDSCENE_USE_ANTHROPIC_SDK,
MIDSCENE_USE_AZURE_OPENAI,
OPENAI_API_KEY,
OPENAI_BASE_URL,
Expand All @@ -31,10 +34,11 @@ import { findElementSchema } from '../prompt/element_inspector';
import { planSchema } from '../prompt/planning';
import { assertSchema } from '../prompt/util';

export function preferOpenAIModel(preferVendor?: 'coze' | 'openAI') {
export function checkAIConfig(preferVendor?: 'coze' | 'openAI') {
if (preferVendor && preferVendor !== 'openAI') return false;
if (getAIConfig(OPENAI_API_KEY)) return true;
if (getAIConfig(MIDSCENE_USE_AZURE_OPENAI)) return true;
if (getAIConfig(ANTHROPIC_API_KEY)) return true;

return Boolean(getAIConfig(MIDSCENE_OPENAI_INIT_CONFIG_JSON));
}
Expand All @@ -50,8 +54,11 @@ export function getModelName() {
return modelName;
}

async function createOpenAI() {
let openai: OpenAI | AzureOpenAI;
async function createChatClient(): Promise<{
completion: OpenAI.Chat.Completions;
style: 'openai' | 'anthropic';
}> {
let openai: OpenAI | AzureOpenAI | undefined;
const extraConfig = getAIConfigInJson(MIDSCENE_OPENAI_INIT_CONFIG_JSON);

const socksProxy = getAIConfig(MIDSCENE_OPENAI_SOCKS_PROXY);
Expand All @@ -65,7 +72,7 @@ async function createOpenAI() {
httpAgent: socksAgent,
...extraConfig,
dangerouslyAllowBrowser: true,
});
}) as OpenAI;
} else if (getAIConfig(MIDSCENE_USE_AZURE_OPENAI)) {
// sample code: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/cookbook/simpleCompletionsPage/app.js
const scope = getAIConfig(MIDSCENE_AZURE_OPENAI_SCOPE);
Expand All @@ -87,7 +94,7 @@ async function createOpenAI() {
...extraConfig,
...extraAzureConfig,
});
} else {
} else if (!getAIConfig(MIDSCENE_USE_ANTHROPIC_SDK)) {
openai = new OpenAI({
baseURL: getAIConfig(OPENAI_BASE_URL),
apiKey: getAIConfig(OPENAI_API_KEY),
Expand All @@ -97,7 +104,7 @@ async function createOpenAI() {
});
}

if (getAIConfig(MIDSCENE_LANGSMITH_DEBUG)) {
if (openai && getAIConfig(MIDSCENE_LANGSMITH_DEBUG)) {
if (ifInBrowser) {
throw new Error('langsmith is not supported in browser');
}
Expand All @@ -106,7 +113,30 @@ async function createOpenAI() {
openai = wrapOpenAI(openai);
}

return openai;
if (typeof openai !== 'undefined') {
return {
completion: openai.chat.completions,
style: 'openai',
};
}

// Anthropic
if (getAIConfig(MIDSCENE_USE_ANTHROPIC_SDK)) {
const apiKey = getAIConfig(ANTHROPIC_API_KEY);
assert(apiKey, 'ANTHROPIC_API_KEY is required');
openai = new Anthropic({
apiKey,
}) as any;
}

if (typeof openai !== 'undefined' && (openai as any).messages) {
return {
completion: (openai as any).messages,
style: 'anthropic',
};
}

throw new Error('Openai SDK or Anthropic SDK is not initialized');
}

export async function call(
Expand All @@ -115,32 +145,74 @@ export async function call(
| OpenAI.ChatCompletionCreateParams['response_format']
| OpenAI.ResponseFormatJSONObject,
): Promise<{ content: string; usage?: AIUsageInfo }> {
const openai = await createOpenAI();
const { completion, style } = await createChatClient();
const shouldPrintTiming =
typeof getAIConfig(MIDSCENE_DEBUG_AI_PROFILE) === 'string';
if (getAIConfig(MIDSCENE_DANGEROUSLY_PRINT_ALL_CONFIG)) {
console.log(allAIConfig());
}

const startTime = Date.now();
const model = getModelName();
const completion = await openai.chat.completions.create({
model,
messages,
response_format: responseFormat,
let content: string | undefined;
let usage: OpenAI.CompletionUsage | undefined;
const commonConfig = {
temperature: 0.1,
stream: false,
// betas: ['computer-use-2024-10-22'],
} as any);
shouldPrintTiming &&
console.log(
'Midscene - AI call',
max_tokens: 3000,
};
if (style === 'openai') {
const result = await completion.create({
model,
completion.usage,
`${Date.now() - startTime}ms`,
);
const { content } = completion.choices[0].message;
assert(content, 'empty content');
return { content, usage: completion.usage };
messages,
response_format: responseFormat,
...commonConfig,
// betas: ['computer-use-2024-10-22'],
} as any);
shouldPrintTiming &&
console.log(
'Midscene - AI call',
model,
result.usage,
`${Date.now() - startTime}ms`,
);
content = result.choices[0].message.content!;
assert(content, 'empty content');
usage = result.usage;
} else if (style === 'anthropic') {
const convertImageContent = (content: any) => {
if (content.type === 'image_url') {
const imgBase64 = content.image_url.url;
assert(imgBase64, 'image_url is required');
return {
source: {
type: 'base64',
media_type: imgBase64.includes('data:image/png;base64,')
? 'image/png'
: 'image/jpeg',
data: imgBase64.split(',')[1],
},
type: 'image',
};
}
return content;
};

const result = await completion.create({
model,
system: 'You are a versatile professional in software UI automation',
messages: messages.map((m) => ({
role: 'user',
content: Array.isArray(m.content)
? (m.content as any).map(convertImageContent)
: m.content,
})),
response_format: responseFormat,
...commonConfig,
} as any);
content = (result as any).content[0].text as string;
assert(content, 'empty content');
usage = result.usage;
}

return { content: content || '', usage };
}

export async function callToGetJSONObject<T>(
Expand All @@ -166,13 +238,14 @@ export async function callToGetJSONObject<T>(
case AIActionType.EXTRACT_DATA:
//TODO: Currently the restriction type can only be a json subset of the constraint, and the way the extract api is used needs to be adjusted to limit the user's data to this as well
// targetResponseFormat = extractDataSchema;
responseFormat = { type: AIResponseFormat.JSON };
break;
case AIActionType.PLAN:
responseFormat = planSchema;
break;
}

if (model === 'gpt-4o-2024-05-13') {
if (model === 'gpt-4o-2024-05-13' || !responseFormat) {
responseFormat = { type: AIResponseFormat.JSON };
}
}
Expand Down
6 changes: 6 additions & 0 deletions packages/midscene/src/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ export const MIDSCENE_AZURE_OPENAI_SCOPE = 'MIDSCENE_AZURE_OPENAI_SCOPE';
export const MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON =
'MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON';

export const MIDSCENE_USE_ANTHROPIC_SDK = 'MIDSCENE_USE_ANTHROPIC_SDK';
export const ANTHROPIC_API_KEY = 'ANTHROPIC_API_KEY';

// @deprecated
export const OPENAI_USE_AZURE = 'OPENAI_USE_AZURE';

Expand Down Expand Up @@ -54,6 +57,9 @@ const allConfigFromEnv = () => {
'https://cognitiveservices.azure.com/.default',
[MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON]:
process.env[MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON] || undefined,
[MIDSCENE_USE_ANTHROPIC_SDK]:
process.env[MIDSCENE_USE_ANTHROPIC_SDK] || undefined,
[ANTHROPIC_API_KEY]: process.env[ANTHROPIC_API_KEY] || undefined,
};
};

Expand Down
Loading
Loading