Skip to content

Commit

Permalink
Initial toml docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Katsie011 committed Dec 7, 2023
1 parent 33d79db commit ab4f9dd
Show file tree
Hide file tree
Showing 11 changed files with 363 additions and 177 deletions.
2 changes: 1 addition & 1 deletion available-hardware.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ We have the following graphics cards available on the platform:

_NOTE: The maximum model sizes are calculated as a guideline, assuming that the model is the only thing loaded into VRAM. Longer inputs will result in a smaller maximum model size. Your mileage may vary._

These GPUs can be selected using the `--hardware` flag when deploying your model on Cortex or can be specified in your config.yaml.
These GPUs can be selected using the `--gpu` flag when deploying your model on Cortex or can be specified in your `cerebrium.toml`.
For more help with deciding which GPU you require, see this section [here](#choosing-a-gpu).

_Due to the global shortage of GPUs at the moment, we may not always have the Enterprise edition of your GPU available. In this case, we will deploy to the Workstation edition of the GPU._
Expand Down
4 changes: 2 additions & 2 deletions cerebrium/environments/custom-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Custom Images
description: Specify your versions, dependencies and packages to use
---

By default, Cerebrium models are executed in Python 3.9 unless the Python version specified by you in your **config.yaml** is different. However, Cerebrium only supports version 3.9 and above.
By default, Cerebrium models are executed in Python 3.9 unless the Python version specified by you in your **cerebrium.toml** is different. However, Cerebrium only supports version 3.9 and above.

Traditionally, when working with Python, you will need access to Apt packages, Pip packages and Conda packages, and so we replicate this functionality as if you were developing locally.
When creating your Cortex project, you can contain the following files
Expand All @@ -15,4 +15,4 @@ When creating your Cortex project, you can contain the following files
Each package must be represented on a new line just as you would locally. All the files above are optional, however, have to contain these file names specifically.

Typically, specifying versions for packages leads to faster builds however, if you ever find you would like to change version numbers or find your library versions aren't
updating, please add the following flag to your deploy command: `cerebrium deploy model-name --force-rebuild`
updating, please add the following flag to your deploy command: `cerebrium deploy --name model-name --force-rebuild`
101 changes: 67 additions & 34 deletions cerebrium/environments/initial-setup.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,10 @@ This will create a Cortex project in the specified directory with the following
```
project_name/
├── main.py
├── requirements.txt
├── pkglist.txt
├── conda_pkglist.txt
└── config.yaml
└── cerebrium.toml
```
Cortex supports the use of config YAML files to configure various aspects of your project such as hardware requirements, memory and much more.
Cortex supports the use of `toml` config files to configure various aspects of your project such as hardware requirements, scaling parameters and much more.
Using config files makes it easier to keep track of your Cerebrium deployments, share them and use git versioning to show changes over time.
To deploy your model with a specific config file, you can use the `cerebrium deploy` command with the `--config-file` flag to specify the path to your config file. Otherwise `cerebrium deploy` will use the only yaml in the file directory.
Expand All @@ -35,36 +32,72 @@ Your config file can be named anything you want and can be placed anywhere on yo
The parameters for your config file are the same as those which you would use as flags for a normal `cerebrium deploy` command. They're tabulated below for your convenience:
| Parameter | Description | Type | Default |
| ------------------- | ----------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------ |
| `name` | Name of the deployment | string | |
| `api_key` | API key for the deployment | string | not included for safety |
| `hardware` | Hardware to use for the deployment | string | GPU |
| `gpu_count` | The number of GPUs to specify | int | 2 |
| `cpu` | The number of CPU cores to use | int | 2 |
| `memory` | The amount of Memory to use in GB | int | 14.5 |
| `log_level` | Log level for the deployment | string | INFO |
| `include` | Local files to include in the deployment | string | '[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]' |
| `exclude` | Local Files to exclude from the deployment | string | '[./.*, ./__*]' |
| `disable_animation` | Whether to disable the animation in the logs. | boolean | false |
| `python_version` | The Python version you would like to run | float | 3.9 |
| `min_replicas` | The minimum number of replicas to run. | int | 0 |
| `max_replicas` | The maximum number of replicas to scale to. | int | \*plan limit |
| `cooldown` | The number of seconds to keep your model warm after each request. It resets after every request | int | 60 |
| Section | Parameter | Description | Type | Default |
| --- | --- | --- | --- | --- |
| `cerebrium.build` | A section for all the parameters governing your cortex builds | | | |
| | `predict_data` | The data to use to test your predict function on build. This is the same as the payload in a inference call | string | '{"prompt": "Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build."}' |
| | `force_rebuild` | Whether to force a rebuild of your deployment | boolean | false |
| | `disable_animation` | Whether to disable the animation in the logs. | boolean | false |
| | `log_level` | Log level for the deployment | string | INFO |
| | `disable_deployment_confirmation` | Whether to disable the pre-deployment confirmation prompt | boolean | false |
| `cerebrium.deployment` | All the parameters related to the lifetime of your deployment live here. | | |
| | `python_version` | The Python version you would like to run | float | 3.9 |
| | `include` | Local files to include in the deployment | string | '[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]' |
| | `exclude` | Local Files to exclude from the deployment | string | '[./.*, ./__*]' |
| `cerebrium.hardware` | Select the specifics for the machine you would like to run here. | | |
| | `gpu` | The GPU you would like to use. | string | AMPERE_A5000 |
| | `cpu` | The number of CPU cores to use | int | 2 |
| | `memory` | The amount of Memory to use in GB | float | 14.5 |
| | `gpu_count` | The number of GPUs to specify | int | 2 |
| `cerebrium.scaling` | All the parameters related to the auto scaling of your deployment when live are placed here. | | |
| | `min_replicas` | The minimum number of replicas to run. | int | 0 |
| | `max_replicas` | The maximum number of replicas to scale to. | int | \*plan limit |
| | `cooldown` | The number of seconds to keep your model warm after each request. It resets after every request ends. | int | 60 |
| `cerebrium.requirements` | All the parameters related to the packages you would like to install on your deployment are placed here. | | |
| | `pip` | The pip packages you would like to install. In the format 'module' = 'version_constraints' | dict (toml) | |
| | `conda` | The conda packages you would like to install. In the format 'module' = 'version_constraints' | dict (toml) | |
| | `apt` | The apt packages you would like to install. | list (toml) | |
## Config File Example
```yaml
%YAML 1.2
---
name: an-optional-name
api_key: an-optional-api-key
hardware: GPU
exclude: "[./.*, ./__*]"
include: "[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]"
log_level: INFO
disable_animation: false
python_version: 3.9
min_replicas: 0
max_replicas: 30
```toml
# This file was automatically generated by Cerebrium as a starting point for your project.
# You can edit it as you wish.
# If you would like to learn more about your Cerebrium config, please visit https://docs.cerebrium.ai/cerebrium/environments/initial-setup#config-file-example
[cerebrium.build]
predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}"
force_rebuild = false
disable_animation = false
log_level = "INFO"
disable_deployment_confirmation = false
[cerebrium.deployment]
python_version = "3.10"
include = "[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]"
exclude = "[./.*, ./__*]"
[cerebrium.hardware]
gpu = "AMPERE_A5000"
cpu = 2
memory = 16.0
gpu_count = 1
[cerebrium.scaling]
min_replicas = 0
cooldown = 60
[cerebrium.requirements.pip]
torch = ">=2.0.0"
[cerebrium.requirements.conda]
cuda = ">=11.7"
cudatoolkit = "==11.7"
[cerebrium.requirements]
apt = [ "libgl1-mesa-glx", "libglib2.0-0"]
```
4 changes: 2 additions & 2 deletions cerebrium/environments/warm-models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ There are two ways to do this based on your use case:

1. Set min replicas to 1 or more.

This is set through the **min_replicas** option in your `config.yaml` file. This is typically the best option if you would like to sustain a base load or would like
This is set through the **min_replicas** option in your `cerebrium.toml` file. This is typically the best option if you would like to sustain a base load or would like
to meet minimum SLA's with customers. Please note that you are charged for 24/7 usage of the instances

2. Set your cooldown period

You set this using the **cooldown** parameter in your `config.yaml` and is by default set to 60 seconds. This is the number of seconds of inactivity from when your last
You set this using the **cooldown** parameter in your `cerebrium.toml` and is by default set to 60 seconds. This is the number of seconds of inactivity from when your last
request finishes that a container must experience before terminating. Every time you get a new request, this time is reset. It is important to note that you are charged
for the cooldown time since your container is constantly running.
6 changes: 2 additions & 4 deletions cerebrium/getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,8 @@ cerebrium init first-project
Currently, our implementation has five components:

- **main.py** - This is where your Python code lives. This is mandatory to include.
- **requirements.txt** - This is where you define your Python packages where each package should be on a new line. Deployment will be quicker if you specify specific versions. This is optional to include.
- **pkglist.txt** - This is where you can define Linux packages where each package should be on a new line. We run the apt-install command for items here. This is optional to include.
- **conda_pkglist.txt** - This is where you can define Conda packages where each package should be on a new line. if you prefer using it for some libraries over pip. You can use both conda and pip in conjunction. This is optional to include.
- **config.yaml** - This is where you define all the configurations around your model such as the hardware you use, memory required, min replicas etc. Check [here](../environments/initial-setup) for a full list

- **cerebrium.toml** - This is where you define all the configurations around your model such as the hardware you use, scaling parameters, deployment config, build parameters, etc. Check [here](../environments/initial-setup) for a full list

Every main.py you deploy needs the following mandatory layout:

Expand Down
87 changes: 57 additions & 30 deletions examples/langchain.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,24 @@ First we create our project:
cerebrium init langchain-QA
```

We need certain Python packages to implement this project. Let's add those to our **_requirements.txt_** file:

```
pytube # For audio downloading
langchain
faiss-gpu
ffmpeg
openai-whisper
transformers
sentence_transformers
cerebrium
We need certain Python packages to implement this project. Let's add those to our **[cerebrium.requirements.pip]** section of our `cerebrium.toml` file:

```toml
[cerebrium.requirements.pip]
pytube = "" # For audio downloading
langchain = ""
faiss-gpu = ""
ffmpeg = ""
openai-whisper = ""
transformers = ">=4.35.0"
sentence_transformers = ">=2.2.0"
```

To use Whisper, we also have to install ffmpeg and a few other packages as a Linux package and therefore have to define these in **pkglist.txt** - this is to install all Linux-based packages.
To use Whisper, we also have to install ffmpeg and a few other packages as a Linux package and therefore have to define these in **[cerebrium.requirements]** - this is to install all Linux-based packages.

```
ffmpeg
libopenblas-base
libomp-dev
```toml
[cerebrium.requirements]
apt = [ "ffmpeg", "libopenblas-base", "libomp-dev"]
```

Our **main.py** file will contain our main Python code. This is a relatively simple implementation, so we can do everything in 1 file. We would like a user to send in a link to a YouTube video with a question and return to them the answer as well as the time segment of where we got that response.
Expand Down Expand Up @@ -147,26 +146,54 @@ We then integrate Langchain with a Cerebrium deployed endpoint to answer questio

## Deploy

Your config.yaml file is where you can set your compute/environment. Please make sure that the hardware you specify is a AMPERE_A5000, and that you have enough memory (RAM) on your instance to run the models. You config.yaml file should look like:
Your cerebrium.toml file is where you can set your compute/environment. Please make sure that the hardware you specify is a AMPERE_A5000, and that you have enough memory (RAM) on your instance to run the models. You cerebrium.toml file should look like:


```toml

[cerebrium.build]
predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}"
force_rebuild = false
disable_animation = false
log_level = "INFO"
disable_deployment_confirmation = false

[cerebrium.deployment]
name = "langchain-qa"
python_version = "3.10"
include = "[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]"
exclude = "[./.*, ./__*]"

[cerebrium.hardware]
gpu = "AMPERE_A5000"
cpu = 2
memory = 16.0
gpu_count = 1

[cerebrium.scaling]
min_replicas = 0
cooldown = 60

[cerebrium.requirements]
apt = [ "ffmpeg", "libopenblas-base", "libomp-dev"]

[cerebrium.requirements.pip]
pytube = "" # For audio downloading
langchain = ""
faiss-gpu = ""
ffmpeg = ""
openai-whisper = ""
transformers = ">=4.35.0"
sentence_transformers = ">=2.2.0"

[cerebrium.requirements.conda]

```
%YAML 1.2
---
hardware: AMPERE_A5000
memory: 14
cpu: 2
min_replicas: 0
log_level: INFO
include: '[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]'
exclude: '[./.*, ./__*]'
cooldown: 60
disable_animation: false
```

To deploy the model use the following command:

```bash
cerebrium deploy langchain-QA
cerebrium deploy
```

Once deployed, we can make the following request:
Expand Down
68 changes: 47 additions & 21 deletions examples/logo-controlnet.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,15 @@ cerebrium init controlnet-logo

It is important to think of the way you develop models using Cerebrium should be identical to developing on a virtual machine or Google Colab - so converting this should be very easy!

Let us create our **_requirements.txt_** file and add the following packages:

```
accelerate
transformers
safetensors
opencv-python
diffusers
Let us add the following packages to the **[cerebrium.requirements.pip]** section of our `cerebrium.toml` file:

```toml
[cerebrium.requirements.pip]
accelerate = ""
transformers = ">=4.35.0"
safetensors = ""
opencv-python = ""
diffusers = ""
```

To start, we need to create a **main.py** file which will contain our main Python code. This is a relatively simple implementation, so we can do everything in 1 file. We would like a user to send in a link to a YouTube video with a question and return to them the answer as well as the time segment of where we got that response.
Expand Down Expand Up @@ -120,20 +121,45 @@ def predict(item, run_id, logger):

## Deploy

Your config.yaml file is where you can set your compute/environment. Please make sure that the hardware you specify is a AMPERE_A5000 and that you have enough memory (RAM) on your instance to run the models. You config.yaml file should look like:
Your cerebrium.toml file is where you can set your compute/environment. Please make sure that the hardware you specify is a AMPERE_A5000 and that you have enough memory (RAM) on your instance to run the models. You cerebrium.toml file should look like:

```toml

[cerebrium.build]
predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}"
force_rebuild = false
disable_animation = false
log_level = "INFO"
disable_deployment_confirmation = false

[cerebrium.deployment]
name = "controlnet-logo"
python_version = "3.10"
include = "[./*, main.py]"
exclude = "[./.*, ./__*]"

[cerebrium.hardware]
gpu = "AMPERE_A5000"
cpu = 2
memory = 16.0
gpu_count = 1

[cerebrium.scaling]
min_replicas = 0
cooldown = 60

[cerebrium.requirements]
apt = ["ffmpeg"]

[cerebrium.requirements.pip]
accelerate = ""
transformers = ">=4.35.0"
safetensors = ""
opencv-python = ""
diffusers = ""

[cerebrium.requirements.conda]

```
%YAML 1.2
---
hardware: AMPERE_A5000
memory: 14
cpu: 2
min_replicas: 0
log_level: INFO
include: '[./*, main.py, requirements.txt, pkglist.txt, conda_pkglist.txt]'
exclude: '[./.*, ./__*]'
cooldown: 60
disable_animation: false
```

To deploy the model, use the following command:
Expand Down
Loading

0 comments on commit ab4f9dd

Please sign in to comment.