diff --git a/cerebrium/fine-tuning/auto-deploying.mdx b/cerebrium/fine-tuning/auto-deploying.mdx
deleted file mode 100644
index cf017845..00000000
--- a/cerebrium/fine-tuning/auto-deploying.mdx
+++ /dev/null
@@ -1,49 +0,0 @@
----
-title: Auto-Deploying your Trained Model
-description: Learn how to seamlessly deploy your trained model to a production environment.
----
-
-# Introduction
-
-Now that your model has been successfully trained, you have two options for deploying to a production environment:
-
-- The first is to download the trained adapter weights using `cerebrium download-model` and write your own deployment code.
-- The second is to use the built-in auto-deployment feature of Cerebrium's trainer.
-
-While you have more flexibility writing custom deployment code for your finetuned weights, the second option is faster, easier and gets your model up and running in no time.
- This way, your deployment occurs as soon as training is completed, providing you with a seamless way to test your model's performance before considering any modifications to the deployment code.
-
-# Configuration of your Auto-deployment
-
-While an auto-deployment will generate all the code needed to deploy your model, you'll need to configure a few parameters to get it up and running. These parameters are added to the `config.yml` file of your training under the title `autodeploy`.
-
-We've kept the configuration as simple as possible, using the same format and parameters that you're used to in the [config file](cerebrium/cortex/advanced-functionality/init-cortex-project) of a Cortex deployment.
-
-The following parameters are the parameters we suggest you supply for an auto-deployment:
-
-```yaml
-autodeploy:
- name: your-auto-deployed-model-name
- hardware: AMPERE_A5000 # Hardware to use for the deployment
- cpu: 2 # Number of CPUs to allocate to the deployment
- memory: 14.5 # Memory to allocate to the deployment. This depends on your model but for most 7B models, 14.5GB is sufficient.
-```
-
-
- While we've tried to include all the requirements your model may need, you can
- overwrite our defaults by providing files for `requirements-file` and
- `pkglist-file` under the autodeployment parameters. This is particularly
- useful if you want to use a specific version of a library or find you want to
- include an extra package.
-
-
-Feel free to include additional parameters such as `min_replicas`, `max_replicas`, or any [other parameters](cerebrium/cortex/introduction) typically used in the config of a Cortex deployment.
-
-With these steps completed, all that's left to do is run your `cerebrium train` command to upload your data and config. In no time your trained model will be up and running in a production environment, ready to make predictions!
-
-## Post Auto-Deployment Steps
-
-Once your training is finished and the auto-deployment is executed, your auto-deployed model will be visible in your dashboard. To test your model, use the endpoint provided in the training logs or within the `Example Code` tab of the model's dashboard.
-
-You can obtain the deployment code from the `builds` tab of the model dashboard. Clicking the download button will fetch a zip file containing your code used for the auto-deployment. This is particularly useful if you intend to customize deployment parameters or functionalities.
-Additionally, your model weights can be found in the downloaded zip file. If you plan to modify the deployment code, make sure to include the updated weights in the deployment folder.
diff --git a/cerebrium/fine-tuning/diffusion-models/config.mdx b/cerebrium/fine-tuning/diffusion-models/config.mdx
deleted file mode 100644
index ebfce68d..00000000
--- a/cerebrium/fine-tuning/diffusion-models/config.mdx
+++ /dev/null
@@ -1,137 +0,0 @@
----
-title: "Config Files"
-description: "Guide for the config files for fine-tuning a diffusion model on Cerebrium"
----
-
-In order to simplify the training experience and make deployments easier, we use YAML config files. This lets you keep track of all your training parameters in one place, giving you the flexibility to train your model with the parameters you need while still keeping the deployment process streamlined.
-
-In this section, we introduce the parameters you may want to manipulate for your training, however, we'll set the defaults to what we've found works well if you'd like to leave them out!
-
-If you would like to override the parameters in your config file during a deployment, the `--config-string` option in the command line accepts a JSON string. The parameters provided in the JSON string will override the values they may have been assigned in your config file.
-
-For your convenience, an example of the config file is available [here](#example-yaml-config-file)
-
-## Setting up a config file
-
-Your YAML config file can be placed anywhere on your system, just point the trainer to the file.
-
-In your YAML file, you need to include the following required parameters or parse them into the `cerebrium train` command:
-
-### Required parameters
-
-The following parameters need to be provided either in your command-line call of the trainer or in your config file:
-
-| Parameter | Suggested Value | Description |
-| --------------- | ------------------------------ | ------------------------------------------------------------------------------------ |
-| training_type | diffuser | Type of training to run. Either diffuser or transformer. In this case, use diffuser. |
-| name | test-config-file | Name of the experiment. |
-| api_key | Your private API key | Your Cerebrium API key. Can also be retrieved if you have run `cerebrium login` |
-| hf_model_path | runwayml/stable-diffusion-v1-5 | Path to your stable diffusion model on huggingface. |
-| train_prompt | | Your prompt to train. |
-| log_level | INFO | log_level level for logging. Can be DEBUG, INFO, WARNING, ERROR. |
-| train_image_dir | data/training_class_images/ | Directory of training images. |
-
-#### Optional parameters
-
-Additionally, you can include the following parameters in your config file:
-
-| Parameter | Suggested Value | Description |
-| ----------------------- | ----------------------- | -------------------------------------------------------------------------- |
-| **Diffuser Parameters** | | |
-| revision | main | Revision of the diffuser model to use. |
-| validation_prompt | ~ | an optional validation prompt to use. Will use the training prompt if "~". |
-| custom_tokenizer | ~ | custom tokenizer from AutoTokenizer if required. |
-| **Dataset params** | | |
-| prior_class_image_dir | data/prior_class_images | Optional directory of images to use for prior class. |
-| prior_class_prompt | Photo of a dog. |
-
-##### Training parameters
-
-The following training parameters can be included if you need to modify the training requirements. These must be child parameters which fall under the **training_args** parameter in your config file.
-
-| Parameter | Sub parameter | Suggested Value | Description |
-| ------------- | --------------------------------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| training_args | | | |
-| | num_validation_images | 4 | Number of images to generate at each validation step. |
-| | num_train_epochs | 50 | Number of epochs to train for |
-| | seed | ~ | manual seed to set for training. |
-| | resolution | 512 | Resolution to train images at. |
-| | center_crop | False | Whether to center crop images to resolution. |
-| | train_batch_size | 2 | Batch size for training. Will significantly affect memory usage so we suggest leaving at 2. |
-| | num_prior_class_images | 10 | Number of images to generate using the **prior class prompt** |
-| | prior_class_generation_batch_size | 2 | Batch size for generating prior class images. We suggest leaving at 2. |
-| | prior_generation_precision | ~ | Precision used to generate prior class images if applicable. |
-| | prior_loss_weight | 1.0 | Weight of prior loss in the total loss. |
-| | max_train_steps | ~ | maximum training steps which overrides number of training epochs |
-| | validation_epochs | 5 | number of epochs before running validation and checkpointing |
-| | gradient_accumulation_steps | 1 | Number of gradient accumulation steps. |
-| | learning_rate | 0.0005 | The learning rate you would like to train with. Can be more aggressive than you would use to train a full network. |
-| | lr_scheduler | constant | Learning rate schedule. Can be one of ["constant", "linear", "cosine", "polynomial"] |
-| | lr_warmup_steps | 5 | Number of learning rate warmup steps. |
-| | lr_num_cycles | 1 | Number of learning rate cycles to use. |
-| | lr_power | 1.0 | Power factor if using a polynomial scheduler. |
-| | max_grad_norm | 1.0 | Maximum gradient norm. |
-| | mixed_precision | no | If you would like to use mixed precision. Supports 'fp16' and 'bf16' else defaults to 'no' which will train in fp32. Use with caution as it can cause the safety checker to misbehave. |
-| | scale_lr | False | Scale the learning rate according to the gradient accumulation steps, training batch size and number of processes. |
-| | allow_tf32 | False | Allow matmul using tf32. Defaults to False. |
-| | use_8bit_adam | True | Use 8 bit adam for lower memory usage. |
-| | use_xformers | True | Whether to use xformers memory efficient attention or not. |
-
-## Example yaml config file
-
-```yaml
-%YAML 1.2
----
-###############################################################
-# Mandatory Parameters. Must be provided here or in the CLI.
-###############################################################
-training_type: "diffuser" # Type of training to run. Either "diffuser" or "transformer".
-name: "test-config-file" # Name of the experiment.
-api_key: "YOUR PRIVATE CEREBRIUM API KEY" # Your Cerebrium API key.
-hf_model_path: "runwayml/stable-diffusion-v1-5" # Path to the huggingface diffusion model to train.
-train_prompt: "INSERT YOUR TRAINING PROMPT" # Your prompt to train.
-log_level: "INFO" # log_level level for logging. Can be "DEBUG", "INFO", "WARNING", "ERROR".
-train_image_dir: data/training_class_images/ # Directory of training images.
-
-###############################################################
-# Optional Parameters
-###############################################################
-# Diffuser params
-revision: "main" # Revision of the diffuser model to use.
-validation_prompt: ~ # an optional validation prompt to use. If ~, will use the training prompt.
-custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
-
-# Dataset params
-prior_class_image_dir: ~ # or "path/to/your/prior_class_images". Optional directory of images to use if you would like to train prior class images as well.
-prior_class_prompt: ~ # Set your prior class prompt here. If ~, will not use prior classes. Only use prior class preservation if you want to preserve the class in your results.
-
-# Training params
-training_args:
- # General training params
- learning_rate: 1.0E-5
- num_validation_images: 4 # Number of images to generate in validation.
- num_train_epochs: 50
- seed: 1
- resolution: 512 # Resolution to train images at.
- center_crop: False # Whether to center crop images to resolution.
- train_batch_size: 2
- num_prior_class_images: 5
- prior_class_generation_batch_size: 2
- prior_loss_weight: 1.0 # Weight of prior loss in the total loss.
- max_train_steps: ~ # maximum training steps which overrides number of training epochs
- validation_epochs: 5 # number of epochs before running validation and checkpointing
-
- # Training loop params
- gradient_accumulation_steps: 1
- lr_scheduler: "constant"
- lr_warmup_steps: 50
- lr_num_cycles: 1
- lr_power: 1.0
- allow_tf32: False
- max_grad_norm: 1.0
- mixed_precision: "no" # If you would like to use mixed precision. Supports fp16 and bf16. Defaults to 'no'
- prior_generation_precision: ~
- scale_lr: False
- use_8bit_adam: True
- use_xformers: True # Whether to use xformers memory efficient attention or not.
-```
diff --git a/cerebrium/fine-tuning/diffusion-models/dataset.mdx b/cerebrium/fine-tuning/diffusion-models/dataset.mdx
deleted file mode 100644
index d48c62b7..00000000
--- a/cerebrium/fine-tuning/diffusion-models/dataset.mdx
+++ /dev/null
@@ -1,78 +0,0 @@
----
-title: "Dataset Curation"
-description: "Guide to curating a dataset for success when fine-tuning your diffusion model on Cerebrium."
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and we are adding more
- functionality each week! If there are any issues or if you have an urgent
- requirement, please reach out to [support](mailto:support@cerebrium.ai)
-
-
-Curating your dataset is arguably the most important step in fine-tuning a diffusion model.
-The quality of your dataset will determine the quality of your outputs, so spending extra time on this step is highly recommended.
-
-## Curation guidelines
-
-The following is a set of guidelines to keep in mind when curating your dataset:
-
-- Images should be sized to 512 x 512 for stable diffusion v1.5.
- - If not, we'll resize them for you, but this may result in a loss of quality.
-- Your images should all be upright or in the same orientation.
-- Keep the number of training images low. Aim for between 10 and 15 as your model may not converge if you use too many.
-- Make sure your images are of the same class. For example, if you are fine-tuning the prompt "Photo of a dog", all of your images should be of dogs.
-
-If you are doing style-based fine-tuning:
-
-- Fine-tune a single style at a time.
-- Keep the style consistent across all of your images.
-- Keep the colouring consistent across all of your images.
-
-For object-focused fine-tuning, each of your images should contribute new information on the subject.
- You can do this by varying:
-
-- camera angle
- (although try to keep the side of the object that is photographed consistently across your dataset. i.e. only take photos from the front)
-- pose
-- props or styles
-- background in the photos (if you would like varied backgrounds in your outputs.)
-- If your results are not what you expected, try training for more epochs or adding a new token to describe your object.
-
-## File structure
-
-The following is the file structure that Cerebrium expects for your local dataset:
-
-```
-training_dataset/
-├── image0.jpg
-├── image1.jpg
-├── image2.jpg
-├── image3.jpg
-```
-
-Where your image files can be named whatever you choose provided they are images in any of the following formats:
-
-- jpg
-- png
-- jpeg
-
-It is important to keep all your images under the same folder and use the `train_image_dir` parameter to specify the path to your dataset folder. Similarly, if you are using prior preservation, you can also specify the path to your prior class dataset folder using the `prior_class_image_dir` parameter.
-
-# Using prior preservation
-
-Stable diffusion models are prone to catastrophic forgetting of classes when training.
-This means that fine-tuning a model on a new prompt may cause it to forget how to generate other prompts in the same class.
-For example, if you have finetuned your model on a new prompt, say "Photo of a sdf dog" and supplied photos of a Labrador, your model may only predict photos of a labrador when asked for photos of a dog.
-
-Therefore prior preservation works as a kind of regularizer for stable diffusion models that will still be generating other prompts in the same class.
-
-The idea is to supervise the fine-tuning process with the model's own generated samples of the class noun.
-In practice, this means having the model fit both the new images and the generated images simultaneously during training.
-
-To use a prior class in training, there are two options:
-
-- The first is to provide the generation prompt for the prior class (eg. `prior_class_prompt: "Photo of a dog"`) and the number of prior images you require (eg. `num_prior_class_images: 10`).
- Cerebrium will generate these images at training time.
-
-- Alternatively, if you would like to generate these images beforehand, you can upload the prior class image dataset and the prompt used to generate them.
- When curating a prior class dataset, the goal is to select a wide variety of outputs from your model.
diff --git a/cerebrium/fine-tuning/diffusion-models/introduction.mdx b/cerebrium/fine-tuning/diffusion-models/introduction.mdx
deleted file mode 100644
index 8ba90bb2..00000000
--- a/cerebrium/fine-tuning/diffusion-models/introduction.mdx
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: "Introduction"
-description: "Quickly and conveniently fine-tune your diffusion model with just one command!"
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and so we are adding
- more functionality each week! Currently, we only support the training of
- stable-diffusion models. If you have an urgent requirement, we can help you
- just reach out to [support](mailto:support@cerebrium.ai)
-
-
-Diffusion models are incredible at generating images from art, designs, logos and beyond. Fine-tuning stable diffusion models allows you to create your very own personal model which generates images of your object or in your style.
-Using Cerebrium's trainer, you can fine-tune your model with one command using your very own data without having to worry about any of the hardware or implementation details!
-
-Our diffusion fine-tuner leverages parameter efficient finetuning (PEFT) and low-rank adaptors (LORA) to only train a small subset of the diffusion model. This drastically improves training time while focusing on the parts of the model that make the biggest difference to your results, saving you time while yielding results comparable to full network finetuning.
-
-While in the beta phase, the following models are supported for fine-tuning on Cerebrium:
-
-| Model Name | Huggingface Path |
-| ------------------------------------ | ------------------------------ |
-| Stable Diffusion V1.5 | runwayml/stable-diffusion-v1-5 |
-| Stable Diffusion V2.1 (coming soon!) | stabilityai/stable-diffusion-2 |
-
-We're actively working on adding more models based on demand so please let the team know if there's a model you would like us to add support for.
diff --git a/cerebrium/fine-tuning/diffusion-models/training.mdx b/cerebrium/fine-tuning/diffusion-models/training.mdx
deleted file mode 100644
index 05f2c20a..00000000
--- a/cerebrium/fine-tuning/diffusion-models/training.mdx
+++ /dev/null
@@ -1,125 +0,0 @@
----
-title: "Training Diffusion Models"
-description: "Guide to fine-tuning a diffusion model on Cerebrium"
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and we are adding more
- functionality each week! If there are any issues or if you have an urgent
- requirement, please reach out to [support](mailto:support@cerebrium.ai)
-
-
-This guide will walk you through the process of fine-tuning a diffusion model on Cerebrium.
-While this guide is a high-level overview of the process, you can find more detailed information on the available parameters in the [config](/cerebrium/fine-tuning/diffusion-models/config) and [dataset](/cerebrium/fine-tuning/diffusion-models/dataset) sections.
-
-### Creating your project
-
-You can quickly set up a Cerebrium fine-tuning project by running the following command:
-
-```bash
-cerebrium init-trainer <> <>
-```
-
-The above variables are:
-
-- the type of fine-tuning (transformer or diffuser)
-- a path of where to create your config file.
-
-This will set up a YAMl config file with a sensible set of default parameters to help you get started quickly. Whether you are deploying a diffuser or transformer
-type of model, the default parameters are tailored to the type of model you are deploying so you can get good results immediately. All you need to do is fill in the
-model name and dataset parameters and you're good to go!
-
-### Starting a job with the CLI
-
-Starting a job on Cerebrium requires four things:
-
-- A name for you to use to identify the training job.
-- Your API key.
-- A config file or JSON string. See [this section](/cerebrium/fine-tuning/diffusion-models/config) for more info.
-- Your local dataset of training images along with the corresponding prompt.
-
-Once you have these, you can start a fine-tuning job using the following command:
-
-```bash
-cerebrium train --config-file <>
-```
-
-Your `config-file` or `config-string` could alternatively provide all the other parameters.
-
-If you would like to provide the `name`, `training-type` or `api-key` from the command line, you can add them as follows:
-
-```bash
-cerebrium train --config-file <> --name <> --training-type "diffuser" --api-key <>
-```
-
-_Note that if these parameters are present in your `config-file` or `config-string`, they will be overridden by the command line args._
-
-#### Monitoring
-
-Once your training job has been deployed, you will receive a **job-id\*\***. This can be used to access the job status as well as retrieve the logs.
-
-Checking the training status:
-
-```bash
-cerebrium get-training-jobs --api-key <> --last-n <>
-```
-
-Retrieving the training logs can be done with:
-
-```bash
-cerebrium get-training-logs <> --api-key <>
-```
-
-_Note that your training logs will only be available once the job has started running and will not be stored after it is complete._
-
-_Coming soon: logging your training with **weights and biases** is in the final stages of development._
-
-## Retrieving your training results
-
-Once your training is complete, you can download the training results using:
-
-```bash
-cerebrium download-model <> --api-key <> --download-path <>
-```
-
-This will return a zip file which contains your diffusion model's unet attention processors and the validation images generated by your model during training.
-
-## Using your Fine-tuned Diffuser
-
-Using your finetuning results is done as follows:
-
-```python
-
-from diffusers import (
- DiffusionPipeline,
- DPMSolverMultistepScheduler,
-)
-import torch
-
-# Boilerplate loading of model
-pipeline = DiffusionPipeline.from_pretrained(
- your_model_name, revision=your_model_revision, torch_dtype=torch.float16
-)
-pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
-pipeline = pipeline.to("cuda")
-
-
-# LOAD IN YOUR TRAINING RESULTS
-# load attention processors from where they are saved in your_results/checkpoints/final/attn_procs/pytorch_lora_weights.bin
-pipeline.unet.load_attn_procs(os.path.join(final_output_dir, "attn_procs"))
-# And that's all you need to do to load in the finetuned result!
-# Now you can run your inference as you would like with the pipeline.
-
-
-# some inference variables
-your_prompt = "Your training prompt that you would like to use here"
-num_images = 4 # number of images to generate
-your_manual_seed = 42 # a manual seed if you would like repeatable results
-
-# run inference as you normally would
-generator = torch.Generator(device="cuda").manual_seed(your_manual_seed)
-images = [
- pipeline(your_prompt, num_inference_steps=25, generator=generator).images[0]
- for _ in range(num_images)
-]
-```
diff --git a/cerebrium/fine-tuning/language-models/config.mdx b/cerebrium/fine-tuning/language-models/config.mdx
deleted file mode 100644
index a04d88f3..00000000
--- a/cerebrium/fine-tuning/language-models/config.mdx
+++ /dev/null
@@ -1,122 +0,0 @@
----
-title: "Config Files"
-description: "Guide for the config files for fine-tuning a large language model on Cerebrium"
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and we are adding more
- functionality each week! If there are any issues or if you have an urgent
- requirement, please reach out to [support](mailto:support@cerebrium.ai)
-
-
-In order to simplify the training experience and make deployments easier, we use YAML config files. This lets you keep track of all your training parameters in one place, giving you the flexibility to train your model with the parameters you need while still keeping the deployment process streamlined.
-
-In this section, we introduce the parameters you may want to manipulate for your training, however, we'll set the defaults to what we've found works well if you'd like to leave them out!
-
-If you would like to override the parameters in your config file during a deployment, the `--config-string` option in the command line accepts a JSON string. The parameters provided in the JSON string will override the values they may have been assigned in your config file.
-
-For your convenience, an example of the config file is available [here](#example-yaml-config-file)
-
-## Setting up a config file
-
-Your Yaml config file can be placed anywhere on your system, just point the trainer to the file.
-
-In your YAML file, you need to include the following required parameters or parse them into the `cerebrium train` command:
-
-### Required parameters
-
-| Parameter | Suggested Value | Description |
-| ------------- | ----------------------------- | ----------------------------------------------------------------------------------- |
-| training_type | transformer | Type of training to run. Either diffuser or transformer. In this case, transformer. |
-| name | | Your name for the fine-tuning run. |
-| api_key | | Your Cerebrium private API key |
-| hf_model_path | decapoda-research/llama-7b-hf | |
-| model_type | AutoModelForCausalLM | |
-| dataset_path | path/to/your/dataset.json | path to your local JSON dataset. |
-| log_level | INFO | log_level level for logging. |
-
-### Optional parameters
-
-| Parameter | Suggested Value | Description |
-| ---------------- | --------------- | ------------------------------------------------ |
-| custom_tokenizer | ~ | custom tokenizer from AutoTokenizer if required. |
-| seed | 42 | random seed for reproducibility. |
-
-| Parameter | Sub parameter | Suggested Value | Description |
-| ------------------- | --------------------------- | -------------------- | ---------------------------------------------------------------------- |
-| training_args | |
-| | logging_steps | 10 | Number of steps between logging. |
-| | per_device_train_batch_size | 15 | Batch size per GPU for training. |
-| | per_device_eval_batch_size | 15 | Batch size per GPU for evaluation. |
-| | warmup_steps | 0 | Number of warmup steps for learning rate scheduler. |
-| | gradient_accumulation_steps | 4 | Number of gradient accumulation steps. |
-| | num_train_epochs | 50 | Number of training epochs. |
-| | learning_rate | 1.0e-4 | Learning rate for training. |
-| | group_by_length | False | Whether to group batches by length. |
-| **base_model_args** | | | The kwargs for loading in the base model with _AutoModelForCausalLM()_ |
-| | load_in_8bit | True | Whether to load in the model in 8bit. |
-| | device_map | "auto" | Device map for loading in the model. |
-| **peft_lora_args** | | | Peft lora kwargs for use by _PeftConfig()_ |
-| | r | 8 | The r value for LoRA. |
-| | lora_alpha | 32 | The lora_alpha value for LoRA. |
-| | lora_dropout | 0.05 | The lora_dropout value for LoRA. |
-| | target_modules | ["q_proj", "v_proj"] | The target_modules for LoRA. These are the suggested values for Llama |
-| | bias | "none" | Bias for LoRA. |
-| | task_type | "CAUSAL_LM" | The task_type for LoRA. |
-| **dataset_args** | | | Custom args for your dataset. |
-| | prompt_template | "short" | Prompt template to use. Either "short" or "long". |
-| | instruction_column | "prompt" | Column name of your prompt in the dataset.json |
-| | label_column | "completion" | Column name of your label/completion in the dataset.json |
-| | context_column | "context" | Optional column name of your context in the dataset.json |
-| | cutoff_len | 512 | Cutoff length for the prompt. |
-| | train_val_ratio | 0.9 | Ratio of training to validation data in the dataset split. |
-
-## Example yaml config file
-
-```yaml
-%YAML 1.2
----
-training_type: "transformer" # Type of training to run. Either "diffuser" or "transformer". In this case, "transformer".
-
-name: test-config-file # Your name for the fine-tuning run.
-api_key: <<>>
-auth_token: YOUR HUGGINGFACE API TOKEN # Optional. You will need this if you are finetuning llama2
-# Model params:
-hf_model_path: "decapoda-research/llama-7b-hf"
-model_type: "AutoModelForCausalLM"
-dataset_path: path/to/your/dataset.json # path to your local JSON dataset.
-custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
-seed: 42 # random seed for reproducibility.
-log_level: "INFO" # log_level level for logging.
-
-# Training params:
-training_args:
- logging_steps: 10
- per_device_train_batch_size: 15
- per_device_eval_batch_size: 15
- warmup_steps: 0
- gradient_accumulation_steps: 4
- num_train_epochs: 50
- learning_rate: 1.0E-4
- group_by_length: False
-
-base_model_args: # args for loading in the base model with AutoModelForCausalLM
- load_in_8bit: True
- device_map: "auto"
-
-peft_lora_args: # peft lora args.
- r: 32
- lora_alpha: 16
- lora_dropout: 0.05
- target_modules: ["q_proj", "v_proj"]
- bias: "none"
- task_type: "CAUSAL_LM"
-
-dataset_args:
- prompt_template: "short" # Prompt template to use. Either "short" or "long".
- instruction_column: "prompt" # column name of your prompt in the dataset.json
- label_column: "completion" # column name of your label/completion in the dataset.json
- context_column: "context" # optional column name of your context in the dataset.json
- cutoff_len: 512 # cutoff length for the prompt.
- train_val_ratio: 0.9 # ratio of training to validation data in the dataset split.
-```
diff --git a/cerebrium/fine-tuning/language-models/custom-templates.mdx b/cerebrium/fine-tuning/language-models/custom-templates.mdx
deleted file mode 100644
index e7ca13c9..00000000
--- a/cerebrium/fine-tuning/language-models/custom-templates.mdx
+++ /dev/null
@@ -1,69 +0,0 @@
----
-title: "Prompt Templating"
-description: "Guide to prompt templating when fine-tuning your language model on Cerebrium."
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and we are adding more
- functionality each week! If there are any issues or if you have an urgent
- requirement, please reach out to [support](mailto:support@cerebrium.ai)
-
-
-## Builtin Templates
-
-By default, we provide two templates for question-and-answering which we have adapted from [Alpaca-Lora format](https://github.com/TianyiPeng/alpaca-lora/tree/main/templates) as our prompt template format. You can use these templates or create your own. For more information on templating see [this page](/cerebrium/fine-tuning/language-models/custom-templates).
-
-There are two options for the templates, `short` and `long`. The `short` template is a single sentence prompt and the `long` template is a multi-sentence prompt. They are shown below:
-
-### Alpaca template (long)
-
-```yaml
-description: "Generic long template for fine-tuning.",
-prompt_input: "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
-prompt_no_input: "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n",
-response_split: "### Response:"
-```
-
-This would be used by adding the following to your **config.yaml** file:
-
-```yaml
-dataset_args:
- prompt_template: "long"
-```
-
-### Alpaca template (short)
-
-```yaml
-"description": "A shorter template to experiment with.",
-"prompt_input": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
-"prompt_no_input": "### Instruction:\n{instruction}\n\n### Response:\n",
-"response_split": "### Response:"
-```
-
-This would be used by adding the following to your **config.yaml** file:
-
-```yaml
-dataset_args:
- prompt_template: "short"
-```
-
-## Custom Templates
-
-If you would like to use a custom template, you can do so by adding it to your **config.yaml** file under the `prompt_template` parameter. This should be in the format shown below:
-
-```yaml
-dataset_args:
- prompt_template:
- "description": "An optional description for your template",
- "prompt_input": "### Your Instruction:\n{instruction}\n\n### Some context label:\n{input}\n\n### Your response prefix:\n",
- "prompt_no_input": "### Your Instruction:\n{instruction}\n\n### Your response prefix:\n",
- "response_split": "### Your response prefix:"
-```
-
-Where your template contains the following parameters in the template string:
-
-- `{instruction}`: The instruction for the prompt. This will be filled in with the `instruction_column` parameter from your dataset.
-- `{input}`: The context for the prompt. This will be filled in with the `context_column` parameter from your dataset.
-- `{response}`: The response prefix for the prompt. This will be filled in with the `label_column` parameter from your dataset.
-
-where `instruction_column`, `context_column`, and `label_column` are the parameters you have defined for your dataset in the `dataset_args` section of your config file and correspond to the labels in your dataset.json file.
diff --git a/cerebrium/fine-tuning/language-models/dataset.mdx b/cerebrium/fine-tuning/language-models/dataset.mdx
deleted file mode 100644
index b521cd08..00000000
--- a/cerebrium/fine-tuning/language-models/dataset.mdx
+++ /dev/null
@@ -1,83 +0,0 @@
----
-title: "Dataset Curation"
-description: "Advice on how best to structure your datasets"
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and we are adding more
- functionality each week! If there are any issues or if you have an urgent
- requirement, please reach out to [support](mailto:support@cerebrium.ai)
-
-
-The fine-tuning library has been built to leverage Parameter Efficient Fine-tuning and Low-Rank Approximation to reduce the number of trainable parameters by >99.9% while providing results that are comparable to full fine-tuning.
-Due to the much lower number of trainable parameters, the datasets used do not need to be greater than 1000 examples. Often much smaller datasets of 200-500 examples are more than sufficient to achieve good results. However, no matter the size of your dataset, it is always best to use a diverse set training examples such that your dataset has good variation and covers a wide variety of potential inputs.
-
-By default, we provide two templates for question-and-answering which we have adapted from [Alpaca-Lora format](https://github.com/TianyiPeng/alpaca-lora/tree/main/templates) as our prompt template format. You can use these templates or create your own. For more information on templating see [this page](/cerebrium/fine-tuning/language-models/custom-templates).
-The dataset is a JSON or JSONL file which contains a "prompt", "completion", and if needed "context" parameters.
-For your convenience, an example dataset has been provided for testing [here](#example-dataset).
-
-Below are some helpful tips to take into consideration when preparing your datasets for fine-tuning:
-
-**1. Data Quality:**
-
-Before fine-tuning models, ensure your data is of high quality. Clean data by removing outliers, duplicates, handle missing values appropriately and make sure you remove any irrelevant data.
-It is best to use a tool for this such as [NomicAI](https://home.nomic.ai/) in order to see a map of your data and easily remove data that is not relevant to your use case.
-
-It is important to keep a diverse dataset and make sure your dataset is balanced. For example, make sure there are a similar number of positive outcomes than there
-are negative outcomes as well as a similar number of female to male records etc.
-
-**2. Use Case Understanding:**
-
-The approach for fine-tuning should align with the specifics of your use case. Different problems require varying degrees of precision, recall, speed, etc.
-For example, if the model is being used for a task that requires factual accuracy, then the data used to fine-tune the model should be carefully curated to ensure that it is accurate.
-
-**3. Data Structure:**
-
-Consider the structure of your data. Sequential data should be formatted differently than non-sequential data. Also, understand which features of your data are most relevant to your model.
-
-While the initial curation of data is a difficult and time-consuming process it is very important for you to continuously monitor your model and the predictions/generates
-it gives your users and if those are good or bad. In doing this you can fine-tune your model further and provide better generations and cover more edge cases.
-
-## Example Dataset
-
-Below is an example of a short JSON dataset that can be used for fine-tuning:
-
-````json
-[
- {
- "prompt": "What is Cerebrium?",
- "completion": "Cerebrium is an AWS Sagemaker alternative providing all the features you need to quickly build an ML product.",
- "source": "https://docs.cerebrium.ai/introduction"
- },
- {
- "prompt": "What is the goal of Cerebrium?",
- "completion": "The goal of Cerebrium is to help companies create value through machine learning as quickly and as painlessly as possible by abstracting away a lot of the complexity and mundane infrastructure setup.",
- "source": "https://docs.cerebrium.ai/introduction"
- },
- {
- "prompt": "How can I send feedback to Cerebrium?",
- "completion": "You can send feedback requests to Cerebrium at support@cerebrium.ai.",
- "source": "https://docs.cerebrium.ai/introduction"
- },
- {
- "prompt": "Where can I start a project and deploy my first model?",
- "completion": "You can start a project and deploy your first model at https://dashboard.cerebrium.ai.",
- "source": "https://docs.cerebrium.ai/introduction"
- },
- {
- "prompt": "How do I set up a Cerebrium account?",
- "completion": "To set up a Cerebrium account, follow these steps: 1. Go to [dashboard.cerebrium.ai](https://dashboard.cerebrium.ai) 2. Signup or Login 3. Navigate to the API Keys page 4. You should see an API key with the source 'Cerebrium'. Click the eye icon to display it. It will be in the format: c_api_key-xxx",
- "source": "https://docs.cerebrium.ai/quickstarts/xgboost"
- },
- {
- "prompt": "How do I install the Cerebrium framework?",
- "completion": "To install the Cerebrium framework, run the following command in your notebook or terminal: `pip install cerebrium`",
- "source": "https://docs.cerebrium.ai/quickstarts/xgboost"
- },
- {
- "prompt": "How do I create and save an XGBoost model?",
- "completion": "To create and save an XGBoost model, use the following code snippet: ```from sklearn.datasets import load_iris from xgboost import XGBClassifier iris = load_iris() X, y = iris.data, iris.target xgb = XGBClassifier() xgb.fit(X, y) xgb.save_model('iris.json')``` This code creates a simple XGBoost classifier for the Iris dataset and saves the model to a JSON file.",
- "source": "https://docs.cerebrium.ai/quickstarts/xgboost"
- }
-]
-````
diff --git a/cerebrium/fine-tuning/language-models/introduction.mdx b/cerebrium/fine-tuning/language-models/introduction.mdx
deleted file mode 100644
index eb8d0162..00000000
--- a/cerebrium/fine-tuning/language-models/introduction.mdx
+++ /dev/null
@@ -1,30 +0,0 @@
----
-title: "Introduction"
-description: "Quickly and conveniently fine-tune your LLMs on with just one line of code!"
----
-
-
- Cerebrium's fine-tuning functionality is in public beta and so we are adding
- more functionality each week! Currently, we only support the training of
- CasualLM models. If you have an urgent requirement, we can help you just reach
- out to [support](mailto:support@cerebrium.ai)
-
-
-The fine-tuning functionality on Cerebrium allows you to quickly and conveniently fine-tune your LLMs on Cerebrium with just one line of code. Cerebrium leverages
-the latest techniques such as PEFT and LoRA in order to train models in order to do so in the shortest amount of time (and therefore cost) while still achieving the
-same performance.
-
-Currently, our fine-tuning capabilities are limited to any causal language models that support 8bit quantisation/LoRA from the HuggingFace transformers library. Some of these models include:
-
-- [GPT-2](https://huggingface.co/gpt2), [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b), [GPT-Neo](https://huggingface.co/EleutherAI/gpt-neo-2.7B)
-- [OPT](https://huggingface.co/facebook/opt-1.3b)
-- [Bloom](https://huggingface.co/bigscience/bloom-560m)
-- [GPT-Neox-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)
-- [LLaMa](https://huggingface.co/decapoda-research/llama-7b-hf)
-- [LlaMa2](https://huggingface.co/meta-llama/Llama-2-7b-hf)
-- [ChatGLM](https://huggingface.co/THUDM/chatglm2-6b)
-
-You can use any size variations of the models above. Additionally, we currently use the [Alpaca-Lora format](https://github.com/TianyiPeng/alpaca-lora/tree/main/templates) as the prompt template format. In future,
-we plan to release capabilities for custom prompting templates.
-
-We recommend you look at our guide on how to best curate your dataset in order to maximize the performance of your model and make sure it can handle sufficient edge cases.
diff --git a/cerebrium/fine-tuning/language-models/model-specific-docs/training-falcon.mdx b/cerebrium/fine-tuning/language-models/model-specific-docs/training-falcon.mdx
deleted file mode 100644
index 8b945b8a..00000000
--- a/cerebrium/fine-tuning/language-models/model-specific-docs/training-falcon.mdx
+++ /dev/null
@@ -1,207 +0,0 @@
----
-title: Training Falcon Models
-description: "Guide to fine-tuning Falcon models using Cerebrium"
----
-
-## Introduction
-
-`Falcon` is a family of models released by the Technology Innovation Institute (TII) of the UAE.
-At the time of release, it topped the Huggingface leaderboards while also being one of the fastest LLMs available.
-Additionally, the TII team have released the model under an Apache 2.0 license, making it available for commercial use.
-
-The `Falcon` family of models are available in 2 sizes:
-
-- `Falcon 7B` - 7 billion parameters
-- `Falcon 40B` - 40 billion parameters
-
-of which, there are two variants:
-
-- The raw, pre-trained models `Falcon-7B/40B`
-- The instruct trained models `Falcon-7B/40B-instruct`.
- which have been further trained for instruct/chat capabilities.
-
-We provide full support for the finetuning of the [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) model.
-The finetuning of the Falcon 40B model is currently in testing and will be released to the public soon.
-
-## Getting Started
-
-### Creating a Project
-
-To create a project, you will need to initialise a training [configuration file](/cerebrium/fine-tuning/language-models/config) and curate a [dataset](/cerebrium/fine-tuning/language-models/dataset).
-We provide an example configuration file for the Falcon 7B model [here](#example-config-file) which can be used as a starting point for your own configuration file.
-
-### Adjusting your config file
-
-Some important parameters to take note of in your config file are the following:
-
-| Parameter | Description | Required Value |
-| ----------------- | ------------------------------------------------------------------------------------------ | --------------------- |
-| target_modules | The modules to apply PEFT to. For Falcon, this parameter is different to Llama, GPT2, etc. | `["query_key_value"]` |
-| trust_remote_code | Whether to trust the remote code. Required to setup the falcon model | `true` |
-| load_in_8bit | Whether to load the model in 8bit. Set to `True` for Falcon-7B. | `true` |
-| load_in_4bit | Whether to load the model in 4bit. Set to `True` for Falcon-40B. | `true` |
-
-Additionally, you can adjust the following parameters based on your needs to optimise your training:
-
-- num_train_epochs
-- learning_rate
-- lr_scheduler_type
-- prompt_template (see [here](/cerebrium/fine-tuning/language-models/custom-templates) for more information on prompt templates)
-
-## Train your model
-
-Training your model follows the same process as training any other model on Cerebrium, you'll need to run the same `cerebrium train` command.
-
-```bash
-cerebrium train --config-file <>
-```
-
-See the [training](/cerebrium/fine-tuning/language-models/training) page for more information on training your model.
-
-## Evaluate your model
-
-The output of training a Falcon model on Cerebrium is an adapter file. Once you have downloaded your adapter file (see the instructions [here](/cerebrium/fine-tuning/language-models/training)), it can be used to load the model into your own code and run inference on it.
-
-### Deploying your model on Cortex
-
-To load the model into your Cortex deployment, you can use the following code snippet:
-
-```python
-# Your normal imports in Cortex:
-import base64
-from typing import Optional
-
-# ADD THE FOLLOWING TO YOUR IMPORTS
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from peft import PeftModel, PeftConfig
-import torch
-
-# ADD THE FOLLOWING TO YOUR MODEL SETUP IN YOUR MAIN.PY
-peft_model_id = "results/" # Path to your results/ folder which contains "adapter_model.bin" and "adapter_config.json"
-config = PeftConfig.from_pretrained(peft_model_id)
-model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
- low_cpu_mem_usage=True, # You may require this file to avoid memory issues
- load_in_8bit=True,
- trust_remote_code=True,
- device_map="auto")
-model = PeftModel.from_pretrained(model, peft_model_id, trust_remote_code=True) # Add the adapter to the model
-tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
-
-model = model.to("cuda")
-```
-
-Your model is now ready to run inference.
-
-If you would like control over your generation parameters, you can add them to your `Item` class in your `main.py` file. Below is an example adding `max_new_tokens` which controls the maximum number of tokens to generate.
-
-```python
-class Item(BaseModel):
- prompt: str
- max_new_tokens: Optional[int] = 250 # An example optional parameter
-```
-
-You can then add the following code to your `predict` function to run inference on your model:
-
-```python
-# ADD THE FOLLOWING TO YOUR PREDICT FUNCTION IN YOUR MAIN.PY
-def predict(item, run_id, logger):
- item = Item(**item)
- # REPLACE THIS WITH YOUR TEMPLATE USED FOR TRAINING
- template = "### Instruction:\n{instruction}\n\n### Response:\n"
-
- question = template.format(instruction=prompt) # Place the prompt into the template
- inputs = tokenizer(question, return_tensors="pt")
-
- with torch.no_grad(): # Run inference on the model
- outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=item.max_new_tokens)
- result = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
- return result
-```
-
-To run Falcon, you will need the following in your **requirements.txt**
-
-```txt
-torch
-git+https://github.com/huggingface/transformers.git
-git+https://github.com/huggingface/peft.git
-bitsandbytes
-trl
-einops
-```
-
-Your model can now be deployed using Cerebrium's Cortex. Just ensure that your adapter files are in the same directory as your `main.py` file and run `cerebrium deploy` as you would for any other model.
-
-## Falcon 7B vs Falcon 40B
-
-Training the Falcon 7B and Falcon 40B models on cerebrium uses almost the same config files. The main differences are the following:
-
-| Parameter | Falcon 7B | Falcon 40B |
-| --------------------------- | ------------------ | ------------------- |
-| hf_model_path | `tiiuae/falcon-7b` | `tiiuae/falcon-40b` |
-| per_device_train_batch_size | 10 | 2 |
-| per_device_eval_batch_size | 10 | 2 |
-| load_in_8bit | True | False |
-| load_in_4bit | False | True |
-
-If you have a GPU with less than 40GB of memory, you can use load_in_4bit for the Falcon 40B model. Otherwise, load_in_8bit can be used if you have a GPU with more than 40GB of memory.
-
-## Example config file
-
-```yaml
-%YAML 1.2
----
-training_type: "transformer" # Type of training to run. Either "diffuser" or "transformer".
-
-name: your-falcon-7b-name # Name of the experiment.
-api_key: Your API KEY HERE # Your Cerebrium API key.
-
-# Model params:
-hf_model_path: "tiiuae/falcon-7b"
-model_type: "AutoModelForCausalLM"
-dataset_path: /path/to/your/dataset.json # path to your local JSON dataset.
-custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
-seed: 42 # random seed for reproducibility.
-log_level: "WARNING" # log_level level for logging.
-
-# Training params:
-training_args:
- logging_steps: 100
- per_device_train_batch_size: 10
- per_device_eval_batch_size: 10
- warmup_steps: 0
- gradient_accumulation_steps: 4
- num_train_epochs: 30
- learning_rate: 2.0e-4
- group_by_length: False
- fp16: True
- max_grad_norm: 0.3
- # max_steps: 1000 # an optional if you would like to use steps instead of epochs.
- lr_scheduler_type: "constant"
-
-base_model_args: # args for loading in the base model.
- load_in_8bit: True
- device_map: "auto"
- trust_remote_code: True
-
-peft_lora_args: # peft lora args.
- r: 32
- lora_alpha: 16
- lora_dropout: 0.05
- target_modules: ["query_key_value"] # This has to be query_key_value for falcon
- bias: "none"
- task_type: "CAUSAL_LM"
-
-dataset_args:
- # prompt_template: "short"
- # if you would like a custom prompt template it's possible to specify it here as below:
- prompt_template:
- description: "A shorter template to experiment with."
- prompt_input: "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
- prompt_no_input: "### Instruction:\n{instruction}\n\n### Response:\n"
- response_split: "### Response:"
- instruction_column: "prompt"
- label_column: "completion"
- context_column: "context"
- cutoff_len: 512
- train_val_ratio: 0.9
-```
diff --git a/cerebrium/fine-tuning/language-models/model-specific-docs/training-llama2.mdx b/cerebrium/fine-tuning/language-models/model-specific-docs/training-llama2.mdx
deleted file mode 100644
index a9dbd16b..00000000
--- a/cerebrium/fine-tuning/language-models/model-specific-docs/training-llama2.mdx
+++ /dev/null
@@ -1,166 +0,0 @@
----
-title: Training Llama2 Models
-description: "Guide to fine-tuning Llama2 models using Cerebrium"
----
-
-## Introduction
-
-Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models
-ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases.
-Additionally, the Meta team have released the model under an Apache 2.0 license, making it available for commercial use for organizations with less than 700m MAU.
-
-The `Llama 2` family of models are available in 2 sizes:
-
-- `Llama 2 7B` - 7 billion parameters
-- `Llama 2 13B` - 13 billion parameters
-- `Llama 2 70B` - 70 billion parameters
-
-of which, there are two variants on Hugging Face:
-
-- The raw, pre-trained models `llama-7B-hf/13B/70B`
-- The chat dialogue trained models llama-7B-chat-hf/13B/70B`.
-
-We provide full support for the fine- tuning of the [Llama 2 family](https://huggingface.co/meta-llama) model.
-The fine-tuning of the Llama 2 70B model is currently in testing and will be released to the public soon. Contact us for early access
-
-## Getting Started
-
-### Creating a Project
-
-To create a project, you will need to initialise a training configuration file (below) and curate a [dataset](/cerebrium/fine-tuning/language-models/dataset).
-
-### Adjusting your config file
-
-Some important parameters to take note of in your config file are the following:
-
-| Parameter | Description | Required Value |
-| ------------ | ------------------------------------------------------------------------------------- | -------------- |
-| auth_token | We need your Hugging Face authentication token in order to download the model weights | `true` |
-| load_in_8bit | Whether to load the model in 8bit. Set to `True` for Falcon-7B. | `true` |
-| load_in_4bit | Whether to load the model in 4bit. Set to `True` for Llama-13B. | `true` |
-
-Additionally, you can adjust the following parameters based on your needs to optimise your training:
-
-- num_train_epochs
-- learning_rate
-- lr_scheduler_type
-- prompt_template (see [here](/cerebrium/fine-tuning/language-models/custom-templates) for more information on prompt templates)
-
-## Config File
-
-The config file for Llama2 is similar to the config file for Llama.
-The only difference is that you need to specify your Huggingface Auth token that has access to the weights. This lets Cerebrium download the weights from Huggingface and run your fine-tuning job.
-
-You can find an example config file [here](#example-config-file).
-
-## Inferencing with Llama2
-
-Inferencing with Llama2 is as simple as inferencing with any other model on Cerebrium.
-Once you have your adapter files downloaded, place them in your Cortex deployment directory and add the following to your main.py file:
-
-For the model setup:
-
-```python
-
-from transformers import logging, AutoTokenizer, GenerationConfig, AutoModelForCausalLM
-from peft import PeftModel, PeftConfig # Add the peft libraries we need for the adapter
-# Loading in base model and tokenizer
-base_model_name = "meta-llama/Llama-2-7b-hf" # or meta-llama/Llama-2-7b-chat-hf
-auth_token = "YOUR HUGGINGFACE AUTH TOKEN"
-
-model = AutoModelForCausalLM.from_pretrained(
- base_model_name,
- use_auth_token=auth_token,
- dtype='torch.float16', # Load the model in 16bit so it will fit on the A6000 GPU.
- # load_in_8bit=True, # Alternatively, Load the model in 8bit and use much larger batch sizes for significantly faster training
- device_map="auto",
-)
-
-peft_model_id = "./training-output" # where your adapter_model.bin and adapter_config.json are stored
-config = PeftConfig.from_pretrained(peft_model_id)
-model = PeftModel.from_pretrained(model, peft_model_id) # Add the adapter to the model
-tokenizer = AutoTokenizer.from_pretrained(base_model_name, use_auth_token=auth_token)
-```
-
-You can then add the following code to your `predict` function to run inference on your model:
-
-```python
-def predict(item, run_id, logger):
- item = Item(**item)
- # Replace this with your template used in training
- template = "### Instruction:\n{instruction}\n\n### Response:\n"
-
- prompt = item.prompt
- question = template.format(instruction=prompt)
- inputs = tokenizer(question, return_tensors="pt")
-
- generation_config = GenerationConfig(
- top_p=item.top_p,
- top_k=item.top_k,
- num_beams=item.num_beams,
- max_new_tokens=item.max_new_tokens,
- )
-
- outputs = model.generate(
- input_ids=inputs["input_ids"].to("cuda"),
- generation_config=generation_config,
- )
- result = tokenizer.batch_decode(
- outputs.detach().cpu().numpy(), skip_special_tokens=True
- )[0]
-
- return {"Prediction": result}
-```
-
-And you should have a working Llama2 model on Cerebrium!
-
-## Example Config File
-
-```yaml
-%YAML 1.2
----
-training_type: "transformer" # Type of training to run. Either "diffuser" or "transformer".
-
-name: llama2 # Name of the experiment.
-api_key: Your Cerebrium API key.
-auth_token: YOUR HUGGINGFACE API TOKEN THAT HAS ACCESS TO THE WEIGHTS
-
-# Model params:
-hf_model_path: "meta-llama/Llama-2-7b-hf"
-model_type: "AutoModelForCausalLM"
-dataset_path: /path/to/your/dataset.json # path to your local JSON dataset.
-custom_tokenizer: "" # custom tokenizer from AutoTokenizer if required.
-seed: 42 # random seed for reproducibility.
-log_level: "INFO" # log_level level for logging.
-
-# Training params:
-training_args:
- logging_steps: 10
- per_device_train_batch_size: 15
- per_device_eval_batch_size: 15
- warmup_steps: 0
- gradient_accumulation_steps: 4
- num_train_epochs: 30
- learning_rate: 0.0001
- group_by_length: False
-
-base_model_args: # args for loading in the base model.
- load_in_8bit: True
- device_map: "auto"
-
-peft_lora_args: # peft lora args.
- r: 8
- lora_alpha: 32
- lora_dropout: 0.05
- target_modules: ["q_proj", "v_proj"]
- bias: "none"
- task_type: "CAUSAL_LM"
-
-dataset_args:
- prompt_template: "short" # Prompt template to use. Either "short" or "long". Otherwise look at our docs on templating
- instruction_column: "prompt"
- label_column: "completion"
- context_column: "context"
- cutoff_len: 512
- train_val_ratio: 0.9
-```
diff --git a/cerebrium/fine-tuning/language-models/training.mdx b/cerebrium/fine-tuning/language-models/training.mdx
deleted file mode 100644
index f4af41d9..00000000
--- a/cerebrium/fine-tuning/language-models/training.mdx
+++ /dev/null
@@ -1,115 +0,0 @@
----
-title: "Training"
-description: "Commands to use with fine-tuning"
----
-
-Below are the set of commands for you to get started with fine-tuning. Please note that all fine-tuning functionality is currently done through the terminal - frontend pending.
-
-### Creating your project
-
-You can quickly set up a Cerebrium fine-tuning project by running the following command:
-
-```bash
-cerebrium init-trainer <> <>
-```
-
-The above variables are:
-
-- the type of fine-tuning (transformer or diffuser)
-- a path of where to create your config file.
-
-This will set up a YAMl config file with a sensible set of default parameters to help you get started quickly. We recommend you look at the default config files
-based on the model you are training [here](/cerebrium/fine-tuning/language-models/model-specific-docs/training-falcon)
-
-### Starting a job with the CLI
-
-Starting a job on Cerebrium requires four things:
-
-- A name for you to use to identify the training job.
-- Your API key.
-- A config file or JSON string. See [this section](/cerebrium/fine-tuning/language-models/config) for more info.
-- Your local dataset of training prompts and completions. See [this section](/cerebrium/fine-tuning/language-models/dataset) for info on creating your dataset.
-
-Once you have these, you can start a fine-tuning job using the following command:
-
-```bash
-cerebrium train --config-file <>
-```
-
-Your `config-file` or `config-string` could alternatively provide all the other parameters.
-
-If you would like to provide the `name`, `training-type` or `api-key` from the command line, you can add them as follows:
-
-```bash
-cerebrium train --config-file <> --name <> --training-type "diffuser" --api-key <>
-```
-
-_Note that if these parameters are present in your `config-file` or `config-string`, they will be overridden by the command line args._
-
-## Retrieving your most recent training jobs
-
-Keeping track of the `jobIds` for all your different experiments can be challenging.
-To retrieve the status and information on your most recent fine-tuning jobs, you can run the following command:
-
-```bash
-cerebrium get-training-jobs --api-key <> --last-n <>
-```
-
-Where your API_KEY is the key for the project under which your fine-tuning has been deployed. Remember if you used the Cerebrium login command you don't have to paste your API Key
-
-# Stream the logs of your fine-tuning job
-
-To stream the logs of a specific fine-tuning job use:
-
-```bash
-cerebrium get-training-logs <> --api-key <>
-```
-
-# Retrieving your training results
-
-Once your training is complete, you can download the training results using:
-
-```bash
-cerebrium download-model <> --api-key <> --download-path <>
-```
-
-This will return a zip file which contains your **adapter** and **adapter config** which should be in the order of 10MB for your 7B parameter model due to the extreme efficiency of PEFT fine-tuning.
-
-# Deploy your fine-tuned model
-
-To deploy your model you can use Cortex. Below is an example that you can simply adapt in order to get deploy your model in just a few lines of code. We will be releasing
-auto-deploy functionality soon!
-
-```python
- from transformers import AutoModelForCausalLM
- from peft import PeftModel, PeftConfig # Add the peft libraries we need for the adapter
-
- peft_model_id = "path/toYourAdapter"
- config = PeftConfig.from_pretrained(peft_model_id)
- model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
- model = PeftModel.from_pretrained(model, peft_model_id) # Add the adapter to the model
- tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
-
- model = model.to("cuda")
- model.eval() # set the model to inference mode
-
-```
-
-_Note: if you have fine-tuned a Llama based model, ensure that you are using the latest huggingface transformers release that supports Llama models as part of the AutoModelForCausalLM class._
-
-Now for inference, you just need to place the prompt into the template used for training. In this example, we do it as follows
-
-```python
- template = "### Instruction:\n{instruction}\n\n### Response:\n"
- question = template.format(instruction=prompt)
- inputs = tokenizer(question, return_tensors="pt")
-
- with torch.no_grad():
- outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=10)
- print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
-
-```
-
-These adapters can be combined with others when using your model at inference time.
-For more information, see
-[Using Adapter Transformers at Hugging Face](https://huggingface.co/docs/hub/adapter-transformers#exploring-adaptertransformers-in-the-hub)
diff --git a/mint.json b/mint.json
index 26de414f..9e25cc16 100644
--- a/mint.json
+++ b/mint.json
@@ -95,38 +95,6 @@
}
]
},
- {
- "group": "Fine-tuning",
- "pages": [
- {
- "group": "Language Models",
- "pages": [
- "cerebrium/fine-tuning/language-models/introduction",
- "cerebrium/fine-tuning/language-models/training",
- "cerebrium/fine-tuning/language-models/dataset",
- "cerebrium/fine-tuning/language-models/config",
- "cerebrium/fine-tuning/language-models/custom-templates",
- {
- "group": "Model Specific Docs",
- "pages": [
- "cerebrium/fine-tuning/language-models/model-specific-docs/training-falcon",
- "cerebrium/fine-tuning/language-models/model-specific-docs/training-llama2"
- ]
- }
- ]
- },
- {
- "group": "Diffusion Models",
- "pages": [
- "cerebrium/fine-tuning/diffusion-models/introduction",
- "cerebrium/fine-tuning/diffusion-models/training",
- "cerebrium/fine-tuning/diffusion-models/dataset",
- "cerebrium/fine-tuning/diffusion-models/config"
- ]
- },
- "cerebrium/fine-tuning/auto-deploying"
- ]
- },
{
"group": "Prebuilt Models",
"pages": [