Quantized tutorial update (#12)

* Update README.md Add link to quantized tutorial in readme * add quantized tutorial ipynb * Create README.md * Update tutorial readme * update tutorial --------- Co-authored-by: avitrost <[email protected]> Co-authored-by: Nihal Nayak <[email protected]> Co-authored-by: avitrost <[email protected]>
BatsResearch · Mar 14, 2024 · 76899e3 · 76899e3
1 parent d1f05b2
commit 76899e3
Showing 1 changed file with 93 additions and 67 deletions.
diff --git a/tutorials/Quantized_Bonito_Tutorial.ipynb b/tutorials/Quantized_Bonito_Tutorial.ipynb
@@ -1,43 +1,26 @@
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "provenance": [],
-      "gpuType": "T4"
-    },
-    "kernelspec": {
-      "name": "python3",
-      "display_name": "Python 3"
-    },
-    "language_info": {
-      "name": "python"
-    },
-    "accelerator": "GPU"
-  },
   "cells": [
     {
       "cell_type": "markdown",
-      "source": [
-        "# Quantized Bonito Tutorial\n",
-        "This is a tutorial to set up and run a quantized version of [Bonito](https://github.com/BatsResearch/bonito) on a Google Colab T4 instance. The quantized model was graciously created by GitHub/HuggingFace user `alexandreteles` and we thank them for their contributions! The versions they created are:\n",
-        " - [alexandreteles/bonito-v1-awq](https://huggingface.co/alexandreteles/bonito-v1-awq) (can be used directly with vllm)\n",
-        " - [alexandreteles/bonito-v1-gguf](https://huggingface.co/alexandreteles/bonito-v1-gguf) (for llama.cpp inference)\n",
-        "\n",
-        "In this tutorial, we demonstrate the `awq` version with the `transformers` package, which works well in a Colab environment.\n"
-      ],
       "metadata": {
         "id": "-K1cD9V8SDIG"
-      }
+      },
+      "source": [
+        "# Quantized Bonito Tutorial\n",
+        "This is a tutorial to set up and run a quantized version of [Bonito](https://github.com/BatsResearch/bonito) on a Google Colab T4 instance using the `transformers` package (instead of `vllm` as in the original repo). The quantized model was graciously created by GitHub/HuggingFace user `alexandreteles` and we thank them for their contributions! Note that quantized models may behave differently than their non-quantized counterparts. The versions they created are:\n",
+        " - [alexandreteles/bonito-v1-awq](https://huggingface.co/alexandreteles/bonito-v1-awq) (`awq` quantized model, this is the one we'll be using)\n",
+        " - [alexandreteles/bonito-v1-gguf](https://huggingface.co/alexandreteles/bonito-v1-gguf) (for llama.cpp inference)\n"
+      ]
     },
     {
       "cell_type": "markdown",
-      "source": [
-        "First we clone into the repo and install the dependencies. This will take several minutes."
-      ],
       "metadata": {
         "id": "Gyh5HAFxQlaH"
-      }
+      },
+      "source": [
+        "## Setup\n",
+        "First we clone into the repo and install the dependencies. This will take several minutes."
+      ]
     },
     {
       "cell_type": "code",
@@ -53,39 +36,45 @@
     },
     {
       "cell_type": "markdown",
-      "source": [
-        "To use this quantized model, we need to install the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) package, which deals with AWQ ([Activation-aware Weight Quantization](https://arxiv.org/abs/2306.00978)) models, such as the one we'll be using. AWQ is a quantization technique that treats different weight parameters differently based on their importance. To get it to work with Colab, we have to install the kernel from a specialized wheel so the CUDA versions match."
-      ],
       "metadata": {
         "id": "13b49FjqvRur"
-      }
+      },
+      "source": [
+        "To use this quantized model, we need to install the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) package, which deals with AWQ ([Activation-aware Weight Quantization](https://arxiv.org/abs/2306.00978)) models, such as the one we'll be using. AWQ is a quantization technique that treats different weight parameters differently based on their importance. To get it to work with Colab, we have to install the kernel from a specialized wheel so the CUDA versions match."
+      ]
     },
     {
       "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "5Y6L6xYP1KTe"
+      },
+      "outputs": [],
       "source": [
         "!pip install autoawq\n",
         "!git clone https://github.com/Boltuzamaki/AutoAWQ_kernels.git\n",
         "!pip install AutoAWQ_kernels/builds/autoawq_kernels-0.0.6+cu122-cp310-cp310-linux_x86_64.whl"
-      ],
-      "metadata": {
-        "id": "5Y6L6xYP1KTe"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
     },
     {
       "cell_type": "markdown",
-      "source": [
-        "This cell includes the code to work with the quantized Bonito model, utilizing the `transformers` package. It's similar to the `Bonito` code made for `vllm` in the repo."
-      ],
       "metadata": {
         "id": "xWYY7FYfQyAD"
-      }
+      },
+      "source": [
+        "## Quantized Bonito Wrapper\n",
+        "This cell includes the code to work with the quantized Bonito model, utilizing the `transformers` package. It's similar to the `Bonito` code made for `vllm` in the repo."
+      ]
     },
     {
       "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "id": "NmsYTEdR-m59"
+      },
+      "outputs": [],
       "source": [
-        "from typing import Optional\n",
+        "from typing import Optional, List\n",
         "from datasets import Dataset\n",
         "from awq import AutoAWQForCausalLM\n",
         "from transformers import AutoTokenizer\n",
@@ -170,7 +159,25 @@
         "\n",
         "        return synthetic_dataset\n",
         "\n",
-        "    def _generate_text(self, dataset: Dataset, sampling_params: dict):\n",
+        "    def _generate_text(\n",
+        "        self,\n",
+        "        dataset: Dataset,\n",
+        "        sampling_params: dict,\n",
+        "        ) -> List[str]:\n",
+        "        \"\"\"\n",
+        "        Generate text using the model.\n",
+        "\n",
+        "        This method takes a dataset of prompts, encodes them,\n",
+        "        generates text using the model, decodes the generated\n",
+        "        text, and appends it to a list.\n",
+        "\n",
+        "        Args:\n",
+        "            dataset (Dataset): A dataset containing prompts for text generation.\n",
+        "            sampling_params (dict): Parameters for sampling during generation.\n",
+        "\n",
+        "        Returns:\n",
+        "            List[str]: A list of generated texts corresponding to the prompts.\n",
+        "        \"\"\"\n",
         "        generated_texts = []\n",
         "\n",
         "        for prompt in dataset:\n",
@@ -279,24 +286,25 @@
         "        )\n",
         "\n",
         "        return processed_synthetic_dataset\n"
-      ],
-      "metadata": {
-        "id": "NmsYTEdR-m59"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
     },
     {
       "cell_type": "markdown",
-      "source": [
-        "This is where we load in the model and unannotated dataset. With them, we can generate a synthetic dataset of instructions. This example generates synthetic instructions from a subset of size 10 of the unannotated dataset."
-      ],
       "metadata": {
         "id": "86OvwN74RcS8"
-      }
+      },
+      "source": [
+        "## Synthetic Data Generation\n",
+        "This is where we load in the model and unannotated dataset. With them, we can generate a synthetic dataset of instructions. This example generates synthetic instructions from a subset of size 10 of the unannotated dataset. Note that `sampling_params` is modified to use `transformers` keywords instead of `vllm`'s."
+      ]
     },
     {
       "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "k4lreUPb0LUX"
+      },
+      "outputs": [],
       "source": [
         "from datasets import load_dataset\n",
         "\n",
@@ -310,7 +318,7 @@
         ")[\"train\"].select(range(10))\n",
         "\n",
         "# Generate synthetic instruction tuning dataset\n",
-        "sampling_params = {'max_length':256, 'top_p':0.95, 'temperature':0.5, 'num_return_sequences':1}\n",
+        "sampling_params = {'max_new_tokens':256, 'top_p':0.95, 'temperature':0.5, 'num_return_sequences':1}\n",
         "synthetic_dataset = bonito.generate_tasks(\n",
         "    unannotated_text,\n",
         "    context_col=\"input\",\n",
@@ -319,21 +327,39 @@
         ")\n",
         "\n",
         "print(synthetic_dataset)"
-      ],
-      "metadata": {
-        "id": "k4lreUPb0LUX"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
     },
     {
       "cell_type": "markdown",
-      "source": [
-        "Now go try it out with your own datasets! You can vary the `task_type` for different types of generated instructions."
-      ],
       "metadata": {
         "id": "mEU1lp5TVjGj"
+      },
+      "source": [
+        "Now go try it out with your own datasets! You can vary the `task_type` for different types of generated instructions."
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "T4",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "zero-shift",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.9.0 | packaged by conda-forge | (default, Nov 26 2020, 07:55:15) \n[Clang 11.0.0 ]"
+    },
+    "vscode": {
+      "interpreter": {
+        "hash": "346a5e91c5f8ad4f8eff3966c4562c80ad00d9220d6c3c49b6573b9ba7f5857f"
       }
     }
-  ]
-}
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}