Skip to content

Commit 819a676

Browse files
authored
Merge pull request #228 from leestott/patch-5
Update
2 parents a598b35 + a88531d commit 819a676

File tree

8 files changed

+610
-2
lines changed

8 files changed

+610
-2
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
[![GitHub forks](https://img.shields.io/github/forks/microsoft/phi-3cookbook.svg?style=social&label=Fork)](https://GitHub.com/microsoft/phi-3cookbook/network/?WT.mc_id=aiml-137032-kinfeylo)
1313
[![GitHub stars](https://img.shields.io/github/stars/microsoft/phi-3cookbook?style=social&label=Star)](https://GitHub.com/microsoft/phi-3cookbook/stargazers/?WT.mc_id=aiml-137032-kinfeylo)
1414

15+
1516
[![Azure AI Community Discord](https://dcbadge.vercel.app/api/server/ByRwuEEgH4)](https://discord.com/invite/ByRwuEEgH4?WT.mc_id=aiml-137032-kinfeylo)
1617

1718
Phi, is a family of open AI models developed by Microsoft. Phi models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. The Phi-3 Family includes mini, small, medium and vision versions, trained based on different parameter amounts to serve various application scenarios. For more detailed information about Microsoft's Phi family, please visit the [Welcome to the Phi Family](/md/01.Introduce/Phi3Family.md) page.
@@ -32,8 +33,8 @@ Follow these steps:
3233
- [Phi-3 Hardware Support](./md/01.Introduce/Hardwaresupport.md)(✅)
3334
- [Phi-3 Models & Availability across platforms](./md/01.Introduce/Edgeandcloud.md)(✅)
3435
- [Using Guidance-ai and Phi](./md/01.Introduce/Guidance.md)(✅)
35-
- [GitHub Marketplace Models](https://github.com/marketplace/models)
36-
- [Azure AI Model Catalog](https://ai.azure.com)
36+
- [GitHub Marketplace Models](https://github.com/marketplace/models)(✅)
37+
- [Azure AI Model Catalog](https://ai.azure.com)(✅)
3738

3839
- Quick Start
3940
- [Using Phi-3 in GitHub Model Catalog](./md/02.QuickStart/GitHubModel_QuickStart.md)(✅)
@@ -76,6 +77,7 @@ Follow these steps:
7677
- [Fine-tuning Phi-3 with Azure AI Studio](./md/04.Fine-tuning/FineTuning_AIStudio.md)(✅)
7778
- [Fine-tuning Phi-3 with Azure ML CLI/SDK](./md/04.Fine-tuning/FineTuning_MLSDK.md)(✅)
7879
- [Fine-tuning with Microsoft Olive](./md/04.Fine-tuning/FineTuning_MicrosoftOlive.md)(✅)
80+
- [Fine-tuning with Microsoft Olive Hands-On Lab](./code/04.Finetuning/olive-lab/readme.md)(✅)
7981
- [Fine-tuning Phi-3-vision with Weights and Bias](./md/04.Fine-tuning/FineTuning_Phi-3-visionWandB.md)(✅)
8082
- [Fine-tuning Phi-3 with Apple MLX Framework](./md/04.Fine-tuning/FineTuning_MLX.md)(✅)
8183
- [Fine-tuning Phi-3-vision (official support)](./md/04.Fine-tuning/FineTuning_Vision.md)(✅)

code/04.Finetuning/olive-lab/data/data_sample_travel.jsonl

Lines changed: 289 additions & 0 deletions
Large diffs are not rendered by default.
138 KB
Loading
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Lab. Optimize AI models for on-device inference
2+
3+
## Introduction
4+
5+
> [!IMPORTANT]
6+
> This lab requires an **Nvidia A10 or A100 GPU** with associated drivers and CUDA toolkit (version 12+) installed.
7+
8+
> [!NOTE]
9+
> This is a **35-minute** lab that will give you a hands-on introduction to the core concepts of optimizing models for on-device inference using OLIVE.
10+
11+
## Learning Objectives
12+
13+
By the end of this lab, you will be able to use OLIVE to:
14+
15+
- Quantize an AI Model using the AWQ quantization method.
16+
- Fine-tune an AI model for a specific task.
17+
- Generate LoRA adapters (fine-tuned model) for efficient on-device inference on the ONNX Runtime.
18+
19+
### What is Olive
20+
21+
Olive (*O*NNX *live*) is a model optimization toolkit with accompanying CLI that enables you to ship models for the ONNX runtime +++https://onnxruntime.ai+++ with quality and performance.
22+
23+
![Olive Flow](./images/olive-flow.png)
24+
25+
The input to Olive is typically a PyTorch or Hugging Face model and the output is an optimized ONNX model that is executed on a device (deployment target) running the ONNX runtime. Olive will optimize the model for the deployment target's AI accelerator (NPU, GPU, CPU) provided by a hardware vendor such as Qualcomm, AMD, Nvidia or Intel.
26+
27+
Olive executes a *workflow*, which is an ordered sequence of individual model optimization tasks called *passes* - example passes include: model compression, graph capture, quantization, graph optimization. Each pass has a set of parameters that can be tuned to achieve the best metrics, say accuracy and latency, that are evaluated by the respective evaluator. Olive employs a search strategy that uses a search algorithm to auto-tune each pass one by one or set of passes together.
28+
29+
#### Benefits of Olive
30+
31+
- **Reduce frustration and time** of trial-and-error manual experimentation with different techniques for graph optimization, compression and quantization. Define your quality and performance constraints and let Olive automatically find the best model for you.
32+
- **40+ built-in model optimization components** covering cutting edge techniques in quantization, compression, graph optimization and finetuning.
33+
- **Easy-to-use CLI** for common model optimization tasks. For example, olive quantize, olive auto-opt, olive finetune.
34+
- Model packaging and deployment built-in.
35+
- Supports generating models for **Multi LoRA serving**.
36+
- Construct workflows using YAML/JSON to orchestrate model optimization and deployment tasks.
37+
- **Hugging Face** and **Azure AI** Integration.
38+
- Built-in **caching** mechanism to **save costs**.
39+
40+
## Lab Instructions
41+
> [!NOTE]
42+
> Please ensure you have provision your Azure AI Hub and Project and setup your A100 compute as per Lab 1.
43+
44+
### Step 0: Connect to your Azure AI Compute
45+
46+
You'll connect to the Azure AI compute using the remote feature in **VS Code.**
47+
48+
1. Open your **VS Code** desktop application:
49+
1. Open the **command palette** using **Shift+Ctrl+P**
50+
1. In the command palette search for **AzureML - remote: Connect to compute instance in New Window**.
51+
1. Follow the on-screen instructions to connect to the Compute. This will involve selecting your Azure Subscription, Resource Group, Project and Compute name you set up in Lab 1.
52+
1. Once your connected to your Azure ML Compute node this will be displayed in the **bottom left of Visual Code** `><Azure ML: Compute Name`
53+
54+
### Step 1: Clone this repo
55+
56+
In VS Code, you can open a new terminal with **Ctrl+J** and clone this repo:
57+
58+
In the terminal you should see the prompt
59+
60+
```
61+
azureuser@computername:~/cloudfiles/code$
62+
```
63+
Clone the solution
64+
65+
```bash
66+
cd ~/localfiles
67+
git clone https://github.com/microsoft/phi-3cookbook.git
68+
```
69+
70+
### Step 2: Open Folder in VS Code
71+
72+
To open VS Code in the relevant folder execute the following command in the terminal, which will open a new window:
73+
74+
```bash
75+
code phi-3cookbook/code/04.Finetuning/Olive-lab
76+
```
77+
78+
Alternatively, you can open the folder by selecting **File** > **Open Folder**.
79+
80+
### Step 3: Dependencies
81+
82+
Open a terminal window in VS Code in your Azure AI Compute Instance (tip: **Ctrl+J**) and execute the following commands to install the dependencies:
83+
84+
```bash
85+
conda create -n olive-ai python=3.11 -y
86+
conda activate olive-ai
87+
pip install -r requirements.txt
88+
az extension remove -n azure-cli-ml
89+
az extension add -n ml
90+
```
91+
92+
> [!NOTE]
93+
> It will take ~5mins to install all the dependencies.
94+
95+
In this lab you'll download and upload models to the Azure AI Model catalog. So that you can access the model catalog, you'll need to login to Azure using:
96+
97+
```bash
98+
az login
99+
```
100+
101+
> [!NOTE]
102+
> At login time you'll be asked to select your subscription. Ensure you set the subscription to the one provided for this lab.
103+
104+
### Step 4: Execute Olive commands
105+
106+
Open a terminal window in VS Code in your Azure AI Compute Instance (tip: **Ctrl+J**) and ensure the `olive-ai` conda environment is activated:
107+
108+
```bash
109+
conda activate olive-ai
110+
```
111+
112+
Next, execute the following Olive commands in the command line.
113+
114+
1. **Inspect the data:** In this example, you're going to fine-tune Phi-3.5-Mini model so that it is specialized in answering travel related questions. The code below displays the first few records of the dataset, which are in JSON lines format:
115+
116+
```bash
117+
head data/data_sample_travel.jsonl
118+
```
119+
1. **Quantize the model:** Before training the model, you first quantize with the following command that uses a technique called Active Aware Quantization (AWQ) +++https://arxiv.org/abs/2306.00978+++. AWQ quantizes the weights of a model by considering the activations produced during inference. This means that the quantization process takes into account the actual data distribution in the activations, leading to better preservation of model accuracy compared to traditional weight quantization methods.
120+
121+
```bash
122+
olive quantize \
123+
--model_name_or_path microsoft/Phi-3.5-mini-instruct \
124+
--trust_remote_code \
125+
--algorithm awq \
126+
--output_path models/phi/awq \
127+
--log_level 1
128+
```
129+
130+
It takes **~8mins** to complete the AWQ quantization, which will **reduce the model size from ~7.5GB to ~2.5GB**.
131+
132+
In this lab, we're showing you how to input models from Hugging Face (for example: `microsoft/Phi-3.5-mini-instruct`). However, Olive also allows you to input models from the Azure AI catalog by updating the `model_name_or_path` argument to an Azure AI asset ID (for example: `azureml://registries/azureml/models/Phi-3.5-mini-instruct/versions/4`).
133+
134+
1. **Train the model:** Next, the `olive finetune` command finetunes the quantized model. Quantizing the model *before* fine-tuning instead of afterwards gives better accuracy as the fine-tuning process recovers some of the loss from the quantization.
135+
136+
```bash
137+
olive finetune \
138+
--method lora \
139+
--model_name_or_path models/phi/awq \
140+
--data_files "data/data_sample_travel.jsonl" \
141+
--data_name "json" \
142+
--text_template "<|user|>\n{prompt}<|end|>\n<|assistant|>\n{response}<|end|>" \
143+
--max_steps 100 \
144+
--output_path ./models/phi/ft \
145+
--log_level 1
146+
```
147+
148+
It takes **~6mins** to complete the Fine-tuning (with 100 steps).
149+
150+
1. **Optimize:** With the model trained, you now optimize the model using Olive's `auto-opt` command, which will capture the ONNX graph and automatically perform a number of optimizations to improve the model performance for CPU by compressing the model and doing fusions. It should be noted, that you can also optimize for other devices such as NPU or GPU by just updating the `--device` and `--provider` arguments - but for the purposes of this lab we'll use CPU.
151+
152+
```bash
153+
olive auto-opt \
154+
--model_name_or_path models/phi/ft/model \
155+
--adapter_path models/phi/ft/adapter \
156+
--device cpu \
157+
--provider CPUExecutionProvider \
158+
--use_ort_genai \
159+
--output_path models/phi/onnx-ao \
160+
--log_level 1
161+
```
162+
163+
It takes **~5mins** to complete the optimization.
164+
165+
### Step 5: Model inference quick test
166+
167+
To test inferencing the model, create a Python file in your folder called **app.py** and copy-and-paste the following code:
168+
169+
```python
170+
import onnxruntime_genai as og
171+
import numpy as np
172+
173+
print("loading model and adapters...", end="", flush=True)
174+
model = og.Model("models/phi/onnx-ao/model")
175+
adapters = og.Adapters(model)
176+
adapters.load("models/phi/onnx-ao/model/adapter_weights.onnx_adapter", "travel")
177+
print("DONE!")
178+
179+
tokenizer = og.Tokenizer(model)
180+
tokenizer_stream = tokenizer.create_stream()
181+
182+
params = og.GeneratorParams(model)
183+
params.set_search_options(max_length=100, past_present_share_buffer=False)
184+
user_input = "what is the best thing to see in chicago"
185+
params.input_ids = tokenizer.encode(f"<|user|>\n{user_input}<|end|>\n<|assistant|>\n")
186+
187+
generator = og.Generator(model, params)
188+
189+
generator.set_active_adapter(adapters, "travel")
190+
191+
print(f"{user_input}")
192+
193+
while not generator.is_done():
194+
generator.compute_logits()
195+
generator.generate_next_token()
196+
197+
new_token = generator.get_next_tokens()[0]
198+
print(tokenizer_stream.decode(new_token), end='', flush=True)
199+
200+
print("\n")
201+
```
202+
203+
Execute the code using:
204+
205+
```bash
206+
python app.py
207+
```
208+
209+
### Step 6: Upload model to Azure AI
210+
211+
Uploading the model to an Azure AI model repository makes the model sharable with other members of your development team and also handles version control of the model. To upload the model run the following command:
212+
213+
> [!NOTE]
214+
> Update the `{}` placeholders with the name of your resource group and Azure AI Project Name.
215+
216+
To find your resource group `"resourceGroup"and Azure AI Project name, run the following command
217+
218+
```
219+
az ml workspace show
220+
```
221+
222+
Or by going to +++ai.azure.com+++ and selecting **management center** **project** **overview**
223+
224+
Update the `{}` placeholders with the name of your resource group and Azure AI Project Name.
225+
226+
```bash
227+
az ml model create \
228+
--name ft-for-travel \
229+
--version 1 \
230+
--path ./models/phi/onnx-ao \
231+
--resource-group {RESOURCE_GROUP_NAME} \
232+
--workspace-name {PROJECT_NAME}
233+
```
234+
You can then see your uploaded model and deploy your model at https://ml.azure.com/model/list
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
olive-ai==0.7.1
2+
transformers==4.44.2
3+
autoawq==0.2.6
4+
optimum==1.23.1
5+
peft==0.13.2
6+
bitsandbytes==0.44.1
7+
accelerate>=0.30.0
8+
scipy==1.14.1
9+
azure-ai-ml==1.21.1
10+
onnxruntime-genai-cuda==0.5.0
11+
tabulate==0.9.0
12+
openai==1.54.4
13+
python-dotenv==1.0.1
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import onnxruntime_genai as og
2+
import numpy as np
3+
import time
4+
5+
model = og.Model("models/phi/onnx-ao/model")
6+
adapters = og.Adapters(model)
7+
adapters.load("models/phi/onnx-ao/model/adapter_weights.onnx_adapter", "travel")
8+
9+
tokenizer = og.Tokenizer(model)
10+
tokenizer_stream = tokenizer.create_stream()
11+
12+
params = og.GeneratorParams(model)
13+
params.set_search_options(max_length=100, past_present_share_buffer=False)
14+
params.input_ids = tokenizer.encode("<|user|>\nwhere is the best place in london<|end|>\n<|assistant|>\n")
15+
16+
generator = og.Generator(model, params)
17+
18+
generator.set_active_adapter(adapters, "travel")
19+
20+
print(f"[Travel]: Tell me what to do in London")
21+
start = time.time()
22+
token_count = 0
23+
while not generator.is_done():
24+
generator.compute_logits()
25+
generator.generate_next_token()
26+
27+
new_token = generator.get_next_tokens()[0]
28+
print(tokenizer_stream.decode(new_token), end='', flush=True)
29+
token_count = token_count+1
30+
31+
print("\n")
32+
end = time.time()
33+
print(f"Tk.sec:{token_count/(end - start)}")
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
echo -e "\n>>>>>> running awq quantization >>>>>>>>\n"
2+
3+
olive quantize \
4+
--model_name_or_path azureml://registries/azureml/models/Phi-3.5-mini-instruct/versions/4 \
5+
--algorithm awq \
6+
--output_path models/phi/awq \
7+
--log_level 1
8+
9+
echo -e "\n>>>>>> running finetuning >>>>>>>>\n"
10+
11+
olive finetune \
12+
--method lora \
13+
--model_name_or_path models/phi/awq \
14+
--trust_remote_code \
15+
--data_files "data/data_sample_travel.jsonl" \
16+
--data_name "json" \
17+
--text_template "<|user|>\n{prompt}<|end|>\n<|assistant|>\n{response}<|end|>" \
18+
--max_steps 100 \
19+
--output_path ./models/phi/ft \
20+
--log_level 1
21+
22+
echo -e "\n>>>>>> running optimizer >>>>>>>>\n"
23+
24+
olive auto-opt \
25+
--model_name_or_path models/phi/ft/model \
26+
--adapter_path models/phi/ft/adapter \
27+
--device cpu \
28+
--provider CPUExecutionProvider \
29+
--use_ort_genai \
30+
--output_path models/phi/onnx-ao \
31+
--log_level 1
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
az ml model create \
2+
--name ft-for-travel \
3+
--version 1 \
4+
--path ./models/phi/onnx-ao \
5+
--resource-group RESOURCE_GROUP \
6+
--workspace-name PROJECT_NAME

0 commit comments

Comments
 (0)