Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the application inference profile in Bedrock results in failed model invocations. #740

Open
moritalous opened this issue Nov 7, 2024 · 2 comments

Comments

@moritalous
Copy link
Contributor

Amazon Bedrock has added a new feature called "application inference profiles".

Using application inference profiles is like adding an alias to a base model.

  • Creating an application inference profile
import boto3

bedrock = boto3.Session(region_name="us-west-2").client("bedrock")

# Create application inference profile
response = bedrock.create_inference_profile(
    inferenceProfileName="sonnet-inference-profile",
    modelSource={
        "copyFrom": "arn:aws:bedrock:us-west-2:637423213562:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0"
    },
)

inference_profile_arn = response["inferenceProfileArn"]
print(inference_profile_arn)

arn:aws:bedrock:us-west-2:637423213562:application-inference-profile/hq2of259skzs

For Bedrock's Invoke Model, you can specify the application inference profile as the modelId.

import json

bedrock_runtime = boto3.Session(region_name="us-west-2").client("bedrock-runtime")

response = bedrock_runtime.invoke_model(
    modelId=inference_profile_arn,
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "messages": [
                {
                    "role": "user",
                    "content": "Hello!",
                }
            ],
        }
    ),
)

response_body = json.loads(response.get("body").read())
print(response_body["content"][0]["text"])

However, when using the Anthropic SDK, specifying the application inference profile as the model results in an error.

anthropic = AnthropicBedrock(aws_region="us-west-2")

response = anthropic.messages.create(
    model=inference_profile_arn,
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response)

Message(id=None, content=None, model=None, role=None, stop_reason=None, stop_sequence=None, type=None, usage=None, Output={'__type': 'com.amazon.coral.service#UnknownOperationException'}, Version='1.0')

This is likely because the model parameter is not expecting an ARN to be set.

Please let me know if you have any further questions regarding this.

@RobertCraigie
Copy link
Collaborator

Thanks for the report, do you know what the expected HTTP path is? What endpoint is .invoke_model() hitting?

@moritalous
Copy link
Contributor Author

moritalous commented Nov 8, 2024

I try output debug log.

  • use boto3

    response = bedrock_runtime.invoke_model(
        modelId=inference_profile_arn,
        body=json.dumps(
            {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 1000,
                "messages": [
                    {
                        "role": "user",
                        "content": "Hello!",
                    }
                ],
            }
        ),
    )
    2024-11-08 15:51:17,031 botocore.auth [DEBUG] CanonicalRequest:
    POST
    /model/arn%253Aaws%253Abedrock%253Aus-west-2%253A637423213562%253Aapplication-inference-profile%252Fhq2of259skzs/invoke
    
    host:bedrock-runtime.us-west-2.amazonaws.com
    x-amz-date:20241108T155117Z
    x-amz-security-token:**********
    host;x-amz-date;x-amz-security-token
    dec27b832eccd8d562578f99b60945183245f8876193a23d309e951df15eaab9
    
  • use Anthropic SDK

    response = anthropic.messages.create(
        model=inference_profile_arn,
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    
    2024-11-08 15:55:34,077 botocore.auth [DEBUG] CanonicalRequest:
    POST
    /model/arn%3Aaws%3Abedrock%3Aus-west-2%3A637423213562%3Aapplication-inference-profile/hq2of259skzs/invoke
    
    accept:application/json
    accept-encoding:gzip, deflate
    content-length:116
    content-type:application/json
    host:bedrock-runtime.us-west-2.amazonaws.com
    x-amz-date:20241108T155534Z
    x-amz-security-token:*****
    x-stainless-arch:x64
    x-stainless-lang:python
    x-stainless-os:Linux
    x-stainless-package-version:0.39.0
    x-stainless-retry-count:0
    x-stainless-runtime:CPython
    x-stainless-runtime-version:3.10.12
    
    accept;accept-encoding;content-length;content-type;host;x-amz-date;x-amz-security-token;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version
    33fcea8bfcee2180d557bf027b19a7e6b4394deee5c905ef5afe381afd0b4d83
    

I hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants