Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.sentence_similarity() does not use correct route by default #2494

Open
MoritzLaurer opened this issue Aug 29, 2024 · 2 comments · May be fixed by #2596
Open

client.sentence_similarity() does not use correct route by default #2494

MoritzLaurer opened this issue Aug 29, 2024 · 2 comments · May be fixed by #2596
Labels
bug Something isn't working

Comments

@MoritzLaurer
Copy link
Contributor

Describe the bug

I have a TEI embedding model endpoint created like this:

from huggingface_hub import create_inference_endpoint


repository = "thenlper/gte-large"  #"BAAI/bge-reranker-large-base"
endpoint_name = "gte-large-001"
namespace = "MoritzLaurer"  # your user or organization name


# check if endpoint with this name already exists from previous tests
available_endpoints_names = [endpoint.name for endpoint in huggingface_hub.list_inference_endpoints()]
if endpoint_name in available_endpoints_names:
    endpoint_exists = True
else: 
    endpoint_exists = False
print("Does the endpoint already exist?", endpoint_exists)
    

# create new endpoint
if not endpoint_exists:
    endpoint = create_inference_endpoint(
        endpoint_name,
        repository=repository,
        namespace=namespace,
        framework="pytorch",
        task="sentence-similarity",
        # see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricing
        accelerator="gpu",
        vendor="aws",
        region="us-east-1",
        instance_size="x1",
        instance_type="nvidia-a10g",
        min_replica=2,
        max_replica=4,
        type="protected",
        custom_image={
            "health_route":"/health",
            "env": {
                "MAX_BATCH_TOKENS":"16384",
                "MAX_CONCURRENT_REQUESTS":"512",
                "MAX_BATCH_REQUESTS": "124",
                "MODEL_ID": "/repository"},
            "url":"ghcr.io/huggingface/text-embeddings-inference:latest"
        }
    )
    print("Waiting for endpoint to be created")
    endpoint.wait()
    print("Endpoint ready")

# if endpoint with this name already exists, get existing endpoint
else:
    endpoint = huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
    if endpoint.status in ["paused", "scaledToZero"]:
        print("Resuming endpoint")
        endpoint.resume()
    print("Waiting for endpoint to start")
    endpoint.wait()
    print("Endpoint ready")

Based on the docs here, I should be able to call it like this:

from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
        "I can't believe how much I struggled with this.",
    ],
    model=endpoint.url
)

This results in this (hard to interpret) error message: HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: nEd4Xz) Make sure 'sentence-similarity' task is supported by the model.

It does work when making the /similarity route from TEI explicit:

from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
        "I can't believe how much I struggled with this.",
    ],
    model=endpoint.url + "/similarity"
)
# output: [0.9319057, 0.81048536, 0.75192505]

Seems like the route is not set correctly by the client.

Reproduction

No response

Logs

No response

System info

{'huggingface_hub version': '0.24.6',
 'Platform': 'Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.31',
 'Python version': '3.9.5',
 'Running in iPython ?': 'Yes',
 'iPython shell': 'ZMQInteractiveShell',
 'Running in notebook ?': 'Yes',
 'Running in Google Colab ?': 'No',
 'Token path ?': '/home/user/.cache/huggingface/token',
 'Has saved token ?': True,
 'Who am I ?': 'MoritzLaurer',
 'Configured git credential helpers': 'store',
 'FastAI': 'N/A',
 'Tensorflow': 'N/A',
 'Torch': 'N/A',
 'Jinja2': '3.1.4',
 'Graphviz': 'N/A',
 'keras': 'N/A',
 'Pydot': 'N/A',
 'Pillow': 'N/A',
 'hf_transfer': 'N/A',
 'gradio': 'N/A',
 'tensorboard': 'N/A',
 'numpy': 'N/A',
 'pydantic': 'N/A',
 'aiohttp': 'N/A',
 'ENDPOINT': 'https://huggingface.co',
 'HF_HUB_CACHE': '/home/user/.cache/huggingface/hub',
 'HF_ASSETS_CACHE': '/home/user/.cache/huggingface/assets',
 'HF_TOKEN_PATH': '/home/user/.cache/huggingface/token',
 'HF_HUB_OFFLINE': False,
 'HF_HUB_DISABLE_TELEMETRY': False,
 'HF_HUB_DISABLE_PROGRESS_BARS': None,
 'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
 'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False,
 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
 'HF_HUB_ENABLE_HF_TRANSFER': False,
 'HF_HUB_ETAG_TIMEOUT': 10,
 'HF_HUB_DOWNLOAD_TIMEOUT': 10}
@MoritzLaurer MoritzLaurer added the bug Something isn't working label Aug 29, 2024
@MoritzLaurer
Copy link
Contributor Author

(could maybe be useful to double check all the TEI routes (Swagger here) and related client methods to make sure that they work correctly)

@Wauplin
Copy link
Contributor

Wauplin commented Aug 30, 2024

Thanks for reporting these with a reproducible example @MoritzLaurer. I'm figuring out a solution to avoid this kind of problems where we don't call the correct endpoint because of difference between Inference API and Inference Endpoints (similar to #2484). Will keep you posted.

@hanouticelina hanouticelina linked a pull request Oct 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants