-
Notifications
You must be signed in to change notification settings - Fork 304
Description
What?
A subsystem in DAB to fetch embeddings from text.
Why?
For the sake of future use around cache and parameter substitution.
Configuration (runtime.embeddings)
{
"runtime": {
"embeddings": {
"enabled": true,
"provider": "azure-openai",
"base-url": "@env('EMBEDDINGS_ENDPOINT')",
"api-key": "@env('EMBEDDINGS_API_KEY')",
"model": "text-embedding-ada-002",
"api-version": "2024-02-01",
"dimensions": 1536,
"timeout-ms": 30000,
"endpoint": {
"enabled": true,
"path": "/embed",
"roles": ["authenticated"]
},
"health": {
"enabled": true,
"threshold-ms": 1000
}
}
}
}Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
enabled |
bool |
No | true |
Master toggle for embedding subsystem |
provider |
string |
Yes | - | azure-openai or openai |
base-url |
string |
Yes | - | Provider base URL |
api-key |
string |
Yes | - | Authentication key (can use @env() indirection) |
model |
string |
Yes | OpenAI: text-embedding-3-small |
Deployment name (Azure) or model name (OpenAI) |
api-version |
string |
Azure only | 2024-02-01 |
Azure API version (prevent deprecation issues) |
dimensions |
int |
No | Model default | Output vector size (used for Redis schema alignment) |
timeout-ms |
int |
No | 30000 |
Request timeout in milliseconds |
endpoint.enabled |
bool |
No | false |
Whether to expose /embed REST API |
endpoint.path |
string |
No | /embed |
Embed endpoint path (customizable) |
endpoint.roles |
string[] |
No | ["authenticated"] |
List of roles allowed to access embed endpoint |
health.enabled |
bool |
No | true |
Enable embedding health check |
health.threshold-ms |
int |
No | 5000 |
Acceptable embedding latency threshold in ms for health check |
Command line
- Easily configure via
dab configure --runtime.embeddings.*
REST Endpoint
- Optionally exposes
/embedfor embedding on demand (secured by role) - When
host.modeisdevelopment, thenanonymousis automatically allowed.
Telemetry & Caching
- OpenTelemetry spans/metrics for latency, cache hits/misses, and error rate
- L1 (SHA256-based, 24h TTL, cache keys incorporate provider & model)
Health, Hot Reload, Startup
- Health check covers embedding provider & endpoint
- Listens for hot-reload event and reloads
- Startup logs embedding configuration info
REST API Reference
Azure OpenAI:
- URL: POST {base-url}/openai/deployments/{model}/embeddings?api-version={api-version}
- Headers: api-key: {api-key}
- Body:
{ "input": string | string[], "dimensions": 1536 }
OpenAI:
- URL: POST {base-url}/v1/embeddings
- Headers: Authorization: Bearer {api-key}
- Body:
{ "model": "text-embedding-3-small", "input": string | string[], "dimensions": 1536 }
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Todo