Skip to content

Commit

Permalink
Quantized tutorial update (#12)
Browse files Browse the repository at this point in the history
* Update README.md

Add link to quantized tutorial in readme

* add quantized tutorial ipynb

* Create README.md

* Update tutorial readme

* update tutorial

---------

Co-authored-by: avitrost <[email protected]>
Co-authored-by: Nihal Nayak <[email protected]>
Co-authored-by: avitrost <[email protected]>
  • Loading branch information
4 people authored Mar 14, 2024
1 parent d1f05b2 commit 76899e3
Showing 1 changed file with 93 additions and 67 deletions.
160 changes: 93 additions & 67 deletions tutorials/Quantized_Bonito_Tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -1,43 +1,26 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Quantized Bonito Tutorial\n",
"This is a tutorial to set up and run a quantized version of [Bonito](https://github.com/BatsResearch/bonito) on a Google Colab T4 instance. The quantized model was graciously created by GitHub/HuggingFace user `alexandreteles` and we thank them for their contributions! The versions they created are:\n",
" - [alexandreteles/bonito-v1-awq](https://huggingface.co/alexandreteles/bonito-v1-awq) (can be used directly with vllm)\n",
" - [alexandreteles/bonito-v1-gguf](https://huggingface.co/alexandreteles/bonito-v1-gguf) (for llama.cpp inference)\n",
"\n",
"In this tutorial, we demonstrate the `awq` version with the `transformers` package, which works well in a Colab environment.\n"
],
"metadata": {
"id": "-K1cD9V8SDIG"
}
},
"source": [
"# Quantized Bonito Tutorial\n",
"This is a tutorial to set up and run a quantized version of [Bonito](https://github.com/BatsResearch/bonito) on a Google Colab T4 instance using the `transformers` package (instead of `vllm` as in the original repo). The quantized model was graciously created by GitHub/HuggingFace user `alexandreteles` and we thank them for their contributions! Note that quantized models may behave differently than their non-quantized counterparts. The versions they created are:\n",
" - [alexandreteles/bonito-v1-awq](https://huggingface.co/alexandreteles/bonito-v1-awq) (`awq` quantized model, this is the one we'll be using)\n",
" - [alexandreteles/bonito-v1-gguf](https://huggingface.co/alexandreteles/bonito-v1-gguf) (for llama.cpp inference)\n"
]
},
{
"cell_type": "markdown",
"source": [
"First we clone into the repo and install the dependencies. This will take several minutes."
],
"metadata": {
"id": "Gyh5HAFxQlaH"
}
},
"source": [
"## Setup\n",
"First we clone into the repo and install the dependencies. This will take several minutes."
]
},
{
"cell_type": "code",
Expand All @@ -53,39 +36,45 @@
},
{
"cell_type": "markdown",
"source": [
"To use this quantized model, we need to install the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) package, which deals with AWQ ([Activation-aware Weight Quantization](https://arxiv.org/abs/2306.00978)) models, such as the one we'll be using. AWQ is a quantization technique that treats different weight parameters differently based on their importance. To get it to work with Colab, we have to install the kernel from a specialized wheel so the CUDA versions match."
],
"metadata": {
"id": "13b49FjqvRur"
}
},
"source": [
"To use this quantized model, we need to install the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) package, which deals with AWQ ([Activation-aware Weight Quantization](https://arxiv.org/abs/2306.00978)) models, such as the one we'll be using. AWQ is a quantization technique that treats different weight parameters differently based on their importance. To get it to work with Colab, we have to install the kernel from a specialized wheel so the CUDA versions match."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5Y6L6xYP1KTe"
},
"outputs": [],
"source": [
"!pip install autoawq\n",
"!git clone https://github.com/Boltuzamaki/AutoAWQ_kernels.git\n",
"!pip install AutoAWQ_kernels/builds/autoawq_kernels-0.0.6+cu122-cp310-cp310-linux_x86_64.whl"
],
"metadata": {
"id": "5Y6L6xYP1KTe"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"This cell includes the code to work with the quantized Bonito model, utilizing the `transformers` package. It's similar to the `Bonito` code made for `vllm` in the repo."
],
"metadata": {
"id": "xWYY7FYfQyAD"
}
},
"source": [
"## Quantized Bonito Wrapper\n",
"This cell includes the code to work with the quantized Bonito model, utilizing the `transformers` package. It's similar to the `Bonito` code made for `vllm` in the repo."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "NmsYTEdR-m59"
},
"outputs": [],
"source": [
"from typing import Optional\n",
"from typing import Optional, List\n",
"from datasets import Dataset\n",
"from awq import AutoAWQForCausalLM\n",
"from transformers import AutoTokenizer\n",
Expand Down Expand Up @@ -170,7 +159,25 @@
"\n",
" return synthetic_dataset\n",
"\n",
" def _generate_text(self, dataset: Dataset, sampling_params: dict):\n",
" def _generate_text(\n",
" self,\n",
" dataset: Dataset,\n",
" sampling_params: dict,\n",
" ) -> List[str]:\n",
" \"\"\"\n",
" Generate text using the model.\n",
"\n",
" This method takes a dataset of prompts, encodes them,\n",
" generates text using the model, decodes the generated\n",
" text, and appends it to a list.\n",
"\n",
" Args:\n",
" dataset (Dataset): A dataset containing prompts for text generation.\n",
" sampling_params (dict): Parameters for sampling during generation.\n",
"\n",
" Returns:\n",
" List[str]: A list of generated texts corresponding to the prompts.\n",
" \"\"\"\n",
" generated_texts = []\n",
"\n",
" for prompt in dataset:\n",
Expand Down Expand Up @@ -279,24 +286,25 @@
" )\n",
"\n",
" return processed_synthetic_dataset\n"
],
"metadata": {
"id": "NmsYTEdR-m59"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"This is where we load in the model and unannotated dataset. With them, we can generate a synthetic dataset of instructions. This example generates synthetic instructions from a subset of size 10 of the unannotated dataset."
],
"metadata": {
"id": "86OvwN74RcS8"
}
},
"source": [
"## Synthetic Data Generation\n",
"This is where we load in the model and unannotated dataset. With them, we can generate a synthetic dataset of instructions. This example generates synthetic instructions from a subset of size 10 of the unannotated dataset. Note that `sampling_params` is modified to use `transformers` keywords instead of `vllm`'s."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "k4lreUPb0LUX"
},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
Expand All @@ -310,7 +318,7 @@
")[\"train\"].select(range(10))\n",
"\n",
"# Generate synthetic instruction tuning dataset\n",
"sampling_params = {'max_length':256, 'top_p':0.95, 'temperature':0.5, 'num_return_sequences':1}\n",
"sampling_params = {'max_new_tokens':256, 'top_p':0.95, 'temperature':0.5, 'num_return_sequences':1}\n",
"synthetic_dataset = bonito.generate_tasks(\n",
" unannotated_text,\n",
" context_col=\"input\",\n",
Expand All @@ -319,21 +327,39 @@
")\n",
"\n",
"print(synthetic_dataset)"
],
"metadata": {
"id": "k4lreUPb0LUX"
},
"execution_count": null,
"outputs": []
]
},
{
"cell_type": "markdown",
"source": [
"Now go try it out with your own datasets! You can vary the `task_type` for different types of generated instructions."
],
"metadata": {
"id": "mEU1lp5TVjGj"
},
"source": [
"Now go try it out with your own datasets! You can vary the `task_type` for different types of generated instructions."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "zero-shift",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.0 | packaged by conda-forge | (default, Nov 26 2020, 07:55:15) \n[Clang 11.0.0 ]"
},
"vscode": {
"interpreter": {
"hash": "346a5e91c5f8ad4f8eff3966c4562c80ad00d9220d6c3c49b6573b9ba7f5857f"
}
}
]
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 76899e3

Please sign in to comment.