Merge branch 'master' into hotfix/disable_deployment_confirmation

# Conflicts: # cerebrium/environments/initial-setup.mdx # examples/logo-controlnet.mdx # v4/examples/sdxl.mdx # v4/examples/transcribe-whisper.mdx
CerebriumAI · Oct 25, 2024 · 5a6c5b2 · 5a6c5b2
2 parents a50e2b6 + af39641
commit 5a6c5b2
Show file tree

Hide file tree

Showing 66 changed files with 3,768 additions and 1,117 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Welcome to Cerebrium's documentation hub currently available at [docs.cerebrium.ai](https://docs.cerebrium.ai)
 
-Cerebrium is an AWS Sagemaker alternative providing all the features you need to quickly build an ML product.
+Cerebrium is an AWS SageMaker alternative providing all the features you need to quickly build an ML product.
 
 ### 🚀 Setup
 
@@ -26,7 +26,7 @@ yarn installed already run `npm install --global yarn` in your terminal.
 
 ### 😎 Publishing Changes
 
-Changes will be deployed to production automatically after pushing to the default (`master`) branch.
+Changes are deployed to production automatically after pushing to the default (`master`) branch.
 
 You can also preview changes using PRs, which generates a preview link of the docs.
 

diff --git a/available-hardware.mdx b/available-hardware.mdx
@@ -10,28 +10,26 @@ This page lists the hardware that is currently available on the platform. If you
 
 # Hardware
 
-## GPU's
+## GPUs
 
 We have the following graphics cards available on the platform:
-| Name | Cerebrium Name | VRAM | Minimum Plan | Max fp32 Model Params | Max fp16 Model Params
-| --------------------------------------------------------------------------------------------------- | :------: |------ | :-------------------: | :-------------------: | :------------------: |
-| [NVIDIA H100](https://www.nvidia.com/en-us/data-center/h100/) | Special Request | 80GB | Enterprise | 18B | 36B
-| [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) | Special Request | 80GB | Standard | 18B | 36B
-| [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) | AMPERE_A100 | 40GB | Standard | 9B | 18B
-| [NVIDIA RTX A6000](https://www.nvidia.com/en-us/design-visualization/rtx-a6000/) | AMPERE_A6000 | 48GB | Hobby | 10B | 21B
-| [NVIDIA RTX A5000](https://www.nvidia.com/en-us/design-visualization/rtx-a5000/) | AMPERE_A5000 | 24GB | Hobby | 5B | 10B
-| [NVIDIA RTX A4000](https://www.nvidia.com/en-us/design-visualization/rtx-a4000/) | AMPERE_A4000 | 16GB | Hobby | 3B | 7B
-| [NVIDIA Quadro RTX 5000](https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/quadro-rtx-5000-data-sheet-us-nvidia-704120-r4-web.pdf) | TURING_5000 | 16GB | Hobby | 3B | 7B
-| [NVIDIA Quadro RTX 4000](https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/quadro-rtx-4000-datasheet-us-nvidia-1060942-r2-web.pdf) | TURING_4000 | 8GB | Hobby | 1B | 3B
-
-_NOTE: The maximum model sizes are calculated as a guideline, assuming that the model is the only thing loaded into VRAM. Longer inputs will result in a smaller maximum model size. Your mileage may vary._
-
-These GPUs can be selected using the `--gpu` flag when deploying your model on Cortex or can be specified in your `cerebrium.toml`.
+| Name | Cerebrium Name | VRAM | Minimum Plan | Provider
+| --------------------------------------------------------------------------------------------------- | :------: |------ | :-------------------: | :-------------------: |
+| [NVIDIA H100](https://www.nvidia.com/en-us/data-center/h100/) | Special Request | 80GB | Enterprise | [AWS]
+| [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) | Special Request | 80GB | Enterprise | [AWS]
+| [NVIDIA A100_80GB](https://www.nvidia.com/en-us/data-center/a100/) | AMPERE_A100 | 80GB | Enterprise | [AWS]
+| [NVIDIA A100_40GB](https://www.nvidia.com/en-us/data-center/a100/) | AMPERE_A100 | 40GB | Enterprise | [AWS]
+| [NVIDIA A10](https://www.nvidia.com/en-us/data-center/a100/) | AMPERE_A10 | 24GB | Hobby | [AWS]
+| [NVIDIA L4](https://www.nvidia.com/en-us/data-center/l4/) | ADA_L4 | 24GB | Hobby | [AWS]
+| [NVIDIA L40s](https://www.nvidia.com/en-us/data-center/l40s/) | ADA_L40 | 48GB | Hobby | [AWS]
+| [NVIDIA T4](https://www.nvidia.com/en-us/data-center/tesla-t4/) | TURING_T4 | 16GB | Hobby | [AWS]
+| [AWS INFERENTIA](https://aws.amazon.com/machine-learning/inferentia/) | INF2 | 32GB | Hobby | [AWS]
+| [AWS TRANIUM](https://aws.amazon.com/machine-learning/trainium/) | TRN1 | 32GB | Hobby | [AWS]
+
+
+These GPUs can be selected using the `--gpu` flag when deploying your app on Cortex or can be specified in your `cerebrium.toml`.
 For more help with deciding which GPU you require, see this section [here](#choosing-a-gpu).
 
-_Due to the global shortage of GPUs at the moment, we may not always have the Enterprise edition of your GPU available. In this case, we will deploy to the Workstation edition of the GPU._  
-_These are the same GPUs, and it will not affect the performance of your model in any way._
-
 ## CPUs
 
 We select the CPU based on your choice of hardware, choosing the best available options so you can get the performance you need.
@@ -42,13 +40,13 @@ You can choose the number of CPU cores you require for your deployment. If you d
 
 We let you select the amount of memory you require for your deployment.  
 All the memory you request is dedicated to your deployment and is not shared with any other deployments, ensuring that you get the performance you need.
-This is the amount of memory that is available to your code when it is running and you should choose an adequate amount for your model to be loaded into VRAM if you are deploying onto a GPU.
+This is the amount of memory that is available to your code when it is running, and you should choose an adequate amount for your model to be loaded into VRAM if you are deploying onto a GPU.
 Once again, you only pay for what you need!
 
 ## Storage
 
 We provide you with a persistent storage volume attached to your deployment.
-You can use this storage volume to store any data that you need to persist between deployments. Accessing your persistent storage is covered in depth for [cortex here](./cerebrium/data-sharing-storage/persistent-storage).
+You can use this storage volume to store any data that you need to persist between deployments. Accessing your persistent storage is covered in depth for [cortex here](/cerebrium/data-sharing-storage/persistent-storage).
 
 The storage volume is backed by high-performance SSDs so that you can get the best performance possible
 Pricing for storage is based on the amount of storage you use and is charged per GB per month.
@@ -60,10 +58,10 @@ On one hand, you want the best performance possible, but on the other hand, you
 
 ## Choosing a GPU
 
-Choosing a GPU can be a complicated task of calculating VRAM usage based on the number of parameters you have as well as the length of your inputs. Additionally, some variables are dependent on your inputs to your model which will affect the VRAM usage substantially. For example, with LLMs and transformer-based architectures, you need to factor in attention processes as well as any memory-heavy positional encoding that may be happening which can increase VRAM usage exponentially for some methods. Similarly, for CNNs, you need to look at the number of filters you are using as well as the size of your inputs.
+Choosing a GPU can be a complicated task of calculating VRAM usage based on the number of parameters you have as well as the length of your inputs. Additionally, some variables are dependent on your inputs to your app which will affect the VRAM usage substantially. For example, with LLMs and transformer-based architectures, you need to factor in attention processes as well as any memory-heavy positional encoding that may be happening which can increase VRAM usage exponentially for some methods. Similarly, for CNNs, you need to look at the number of filters you are using as well as the size of your inputs.
 
 As a rule of thumb, the easiest way is to choose the GPU that has at least 1.5x the minimum amount of VRAM that your model requires.
-This approach is conservative and will ensure that your model will fit on the GPU you choose even if you have longer inputs than you expect. However, it is just a rule of thumb and you should test the VRAM usage of your model to ensure that it will fit on the GPU you choose.
+This approach is conservative and will ensure that your model will fit on the GPU you choose even if you have longer inputs than you expect. However, it's just a rule of thumb, and you should test the VRAM usage of your model to ensure that it will fit on the GPU you choose.
 
 You can calculate the VRAM usage of your model by using the following formula:
 

diff --git a/calculating-cost.mdx b/calculating-cost.mdx
@@ -9,19 +9,19 @@ view the pricing of various compute on our [pricing page](https://www.cerebrium.
 
 When you deploy a model, there are two processes we charge you for:
 
-1. We charge you for the build process where we set up your model environment. In this step, we set up a Python environment according to your parameters before downloading and installing the required apt packages, Conda and Python packages as well as any model files you require.  
+1. We charge you for the build process where we set up your app environment. In this step, we set up a Python environment according to your parameters before downloading and installing the required apt packages, Conda and Python packages as well as any model files you require.
    You are only charged for a build if we need to rebuild your environment, ie: you have run a `build` or `deploy` command and have changed your requirements, parameters or code. Note that we cache each of the steps in a build so subsequent builds will cost substantially less than the first.
 
-2. The model runtime. This is the amount of time it takes your code to run from start to finish on each request. There are 3 costs to consider here:
+2. The app runtime. This is the amount of time it takes your code to run from start to finish on each request. There are 3 costs to consider here:
 
 - <u>Cold start</u>: This is the amount of time it takes to spin up a server(s),
   load your environment, connect storage etc. This is part of the Cerebrium
   service and something we are working on every day to get as low as possible.
   <b>We do not charge you for this!</b>
 - <u>Model initialization</u>: This part of your code is outside of the predict
-  function and only runs when your model incurs a cold start. You are charged
-  for the amount of time it takes for this code to run. Typically this is
-  loading a model into GPU RAM.
+  function and only runs when your app incurs a cold start. You are charged for
+  the amount of time it takes for this code to run. Typically this is loading a
+  model into GPU RAM.
 - <u>Predict runtime</u>: This is the code stored in your predict function and
   runs every time a request hits your endpoint
 
@@ -34,7 +34,7 @@ The model you wish to deploy requires:
 - 20GB Memory: 20 \* $0.00000659 per second
 - 10 GB persistent storage: 10 \* $0.3 per month
 
-In our situation, your model works on the first deployment and so you incur only one build process of 2 minutes. Additionally, let's say that the model has 10 cold starts a day with an average initialization of 2 seconds and lastly and average runtime (predict) of 2 seconds. Let us calculate your
+In our situation, your app works on the first deployment and so you incur only one build process of 2 minutes. Additionally, let's say that the app has 10 cold starts a day with an average initialization of 2 seconds and lastly and average runtime (predict) of 2 seconds. Let us calculate your
 expected cost at month end with you expecting to do 100 000 model inferences.
 
 ```python

diff --git a/cerebrium/data-sharing-storage/persistent-storage.mdx b/cerebrium/data-sharing-storage/persistent-storage.mdx
@@ -1,53 +1,99 @@
 ---
-title: "Persistent Storage"
+title: "Persistent Volumes"
 ---
 
-Cerebrium gives to access to persistent storage to store model weights, files and much more. This storage volume persists across your project, meaning that if
-you refer to model weights or a file created in a different deployment, you will be able to access it!
+Cerebrium gives you access to persistent volumes to store model weights and files.
+This volume persists across your project, meaning that if
+you refer to model weights or files created in a different app (but in the same project), you're able to access them.
 
-This allows you to load in model weights more efficiently as well as reduce the size of your deployment container images. Currently,
-the volume can be accessed through `/persistent-storage` in your container instance, should you wish to access it directly and store other artifacts.
+This allows model weights to be loaded in more efficiently, as well as reduce the size of your App container image.
 
-While you have full access to this drive, we recommend that you only store files in directories other than `/persistent-storage/cache`, as this and its subdirectories
-are used by Cerebrium to store your models. As a simple example, suppose you have an external SAM model that you want to use in your custom deployment. You can download it to the cache
-as such:
+### How it works
 
-```python
-import os
-import torch
+Every Cerebrium Project comes with a 50GB volume by default. This volume is mounted on all apps as `/persistent-storage`.
 
-file_path = "/persistent-storage/segment-anything/model.pt"
-# Check if the file already exists, if not download it
-if not os.path.exists("/persistent-storage/segment-anything/"):
-    response = requests.get("https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth")
-    with open(file_path, "wb") as f:
-        f.write(response.content)
+### Uploading files
 
-# Load the model
-model = torch.jit.load(file_path)
-... # Continue with your initialization
-```
+To upload files to your persistent volume, you can use the `cerebrium cp local_path dest_path` command. This command copies files from your local machine to the specified destination path in the volume. The dest_path is optional; if not provided, the files will be uploaded to the root of the persistent volume.
 
-Now, in subsequent deployments, the model will load from the cache rather than download it again.
+```bash
+Usage: cerebrium cp [OPTIONS] LOCAL_PATH REMOTE_PATH (Optional)
 
-## Increasing your Persistent Storage Size
+  Copy contents to persistent volume.
 
-<Note>Once increased, your persistent storage size cannot be decreased.</Note>
+Options:
+  -h, --help          Show this message and exit.
 
-By default, your account is given 50GB of persistent storage to start with. However, if you find you need more (for example, you get an error saying `disk quote exceeded`) then you can increase your allocation using the following steps:
+Examples:
+  # Copy a single file
+  cerebrium cp src_file_name.txt # copies to /src_file_name.txt
 
-1. Check your current persistent storage allocation by running:
+  cerebrium cp src_file_name.txt dest_file_name.txt # copies to /dest_file_name.txt
+
+  # Copy a directory
+  cerebrium cp dir_name # copies to the root directory
+  cerebrium cp dir_name sub_folder/ # copies to sub_folder/
+```
+
+### Listing files
+
+To list the files on your persistent volume, you can use the cerebrium ls [remote_path] command. This command lists all files and directories within the specified remote_path. If no remote_path is provided, it lists the contents of the root directory of the persistent volume.
 
 ```bash
-cerebrium storage --get-capacity
+Usage: cerebrium ls [OPTIONS] REMOTE_PATH (Optional)
+
+  List contents of persistent volume.
+
+Options:
+  -h, --help          Show this message and exit.
+
+Examples:
+  # List all files in the root directory
+  cerebrium ls
+
+  # List all files in a specific folder
+  cerebrium ls sub_folder/
 ```
 
-This will return your current persistent storage allocation in GB.
+### Deleting files
 
-2. To increase your persistent storage allocation run:
+To delete files or directories from your persistent volume, use the `cerebrium rm remote_path` command. This command removes the specified file or directory from the persistent volume. Be careful, as this operation is irreversible.
 
 ```bash
-cerebrium storage --increase-in-gb <number of GB to increase by>
+Usage: cerebrium rm [OPTIONS] REMOTE_PATH
+
+  Remove a file or directory from persistent volume.
+
+Options:
+  -h, --help          Show this message and exit.
+
+Examples:
+  # Remove a specific file
+  cerebrium rm /file_name.txt
+
+  # Remove a directory and all its contents
+  cerebrium rm /sub_folder/
+```
+
+### Real world example
+
+```bash
+wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+cerebrium cp sam_vit_h_4b8939.pth segment-anything/sam_vit_h_4b8939.pth
+```
+
+As a simple example, suppose you have an external SAM model that you want to use in your custom deployment. You can download it to a cache directory on your persistent volume.
+as such:
+
+```python
+import os
+import torch
+
+file_path = "/persistent-storage/segment-anything/sam_vit_h_4b8939.pth"
+
+# Load the model
+model = torch.jit.load(file_path)
+... # Continue with your initialization
 ```
 
-This will return a confirmation message and your new persistent storage allocation in GB if successful.
+Now, in later inference requests, the model loads from the persistent volume instead of downloading again.
diff --git a/cerebrium/deployments/async-functions.mdx b/cerebrium/deployments/async-functions.mdx