subcategory |
---|
Serving |
This resource allows you to manage Model Serving endpoints in Databricks.
resource "databricks_model_serving" "this" {
name = "ads-serving-endpoint"
config {
served_models {
name = "prod_model"
model_name = "ads-model"
model_version = "2"
workload_size = "Small"
scale_to_zero_enabled = true
}
served_models {
name = "candidate_model"
model_name = "ads-model"
model_version = "4"
workload_size = "Small"
scale_to_zero_enabled = false
}
traffic_config {
routes {
served_model_name = "prod_model"
traffic_percentage = 90
}
routes {
served_model_name = "candidate_model"
traffic_percentage = 10
}
}
}
}
The following arguments are supported:
name
- (Required) The name of the model serving endpoint. This field is required and must be unique across a workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores. NOTE: Changing this name will delete the existing endpoint and create a new endpoint with the update name.config
- (Required) The model serving endpoint configuration.
served_models
- (Required) Each block represents a served model for the endpoint to serve. A model serving endpoint can have up to 10 served models.traffic_config
- A single block represents the traffic split configuration amongst the served models.
name
- The name of a served model. It must be unique across an endpoint. If not specified, this field will default tomodelname-modelversion
. A served model name can consist of alphanumeric characters, dashes, and underscores.model_name
- (Required) The name of the model in Databricks Model Registry to be served.model_version
- (Required) The version of the model in Databricks Model Registry to be served.workload_size
- (Required) The workload size of the served model. The workload size corresponds to a range of provisioned concurrency that the compute will autoscale between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are "Small" (4 - 4 provisioned concurrency), "Medium" (8 - 16 provisioned concurrency), and "Large" (16 - 64 provisioned concurrency).scale_to_zero_enabled
- Whether the compute resources for the served model should scale down to zero. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size will be 0. The default value istrue
.
routes
- (Required) Each block represents a route that defines traffic to each served model. Eachserved_models
block needs to have a correspondingroutes
block
served_model_name
- (Required) The name of the served model this route configures traffic for. This needs to match the name of aserved_models
blocktraffic_percentage
- (Required) The percentage of endpoint traffic to send to this route. It must be an integer between 0 and 100 inclusive.
The timeouts
block allows you to specify create
and update
timeouts. The default right now is 45 minutes for both operations.
timeouts {
create = "30m"
}
The model serving resource can be imported using the name of the endpoint.
$ terraform import databricks_model_serving.this <model-serving-endpoint-name>
The following resources are often used in the same context:
- End to end workspace management guide.
- databricks_directory to manage directories in Databricks Workspace.
- databricks_mlflow_model to create MLflow models in Databricks.
- databricks_notebook to manage Databricks Notebooks.
- databricks_notebook data to export a notebook from Databricks Workspace.
- databricks_repo to manage Databricks Repos.