This repository contains a lightweight Sanic API for creating embeddings using the Instructor model. It is provided as a Docker container and based on the hkunlp/instructor-large
model. The API can be used for versatile purposes, including in applications such as text classification, similarity, or clustering tasks.
For more information about the Instructor model, visit the official links:
- Docker, version 20.10 or newer
-
Clone this repository:
git clone https://github.com/flexchar/instructor-embedding-api.git cd instructor-embedding-api
-
Build a Docker image:
make build
-
Run the Docker container:
make run
The API will be available at http://127.0.0.1:8000/
.
You can also use the pre-built container available on GitHub Packages:
docker pull ghcr.io/flexchar/instructor-embedding-api:latest
docker run --rm -p 8000:8000 ghcr.io/flexchar/instructor-embedding-api:latest
You can use the API to generate embeddings by sending a POST request to http://127.0.0.1:8000/
with a JSON payload in the format:
{
"input": [instruction_sentence_pairs]
}
instruction_sentence_pairs
is a list of pairs, where each pair contains two strings: an instruction and a sentence.
For example:
{
"input": [
[
"Represent the Fitness title:",
"What is the easiest training plan for a newbie?"
]
]
}
A valid response will have the following structure:
{
"model": "hkunlp/instructor-large",
"data": [embeddings]
}
embeddings
is a list of arrays representing the embeddings for the given instruction-sentence pairs.
This project is licensed under the MIT License. See the LICENSE file for details.