-
Notifications
You must be signed in to change notification settings - Fork 30
feat: support stable diffusion v1-5 with qnn #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds an AITK workflow + supporting scripts/configs to optimize and run Stable Diffusion v1.5 with ONNX Runtime QNN EP (plus supporting CPU/CUDA/OpenVINO paths), including data generation for static quantization and model adaptation/monkey-patching for QNN.
Changes:
- Introduces a full
sd-legacy-stable-diffusion-v1-5/aitkworkflow (configs, optimization/inference scripts, evaluation tooling, sample notebook). - Adds QDQ/QNN pipeline utilities (EP registration, QDQ config shaping, ORT/OpenVINO pipelines, ONNX save/patch helpers).
- Registers the model + dataset in
.aitkconfigs and adds SD-specific requirements.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| sd-legacy-stable-diffusion-v1-5/olive/winml.py | Helper to enumerate WinML execution provider library paths. |
| sd-legacy-stable-diffusion-v1-5/olive/sd_utils/ort.py | Updates Olive footprint filename lookup for optimized ONNX extraction. |
| sd-legacy-stable-diffusion-v1-5/aitk/winml.py | AITK-side helper to enumerate WinML execution provider library paths. |
| sd-legacy-stable-diffusion-v1-5/aitk/user_script.py | Model loaders, input builders, and dataloaders for Olive passes (incl. LoRA merge + QNN patch hook). |
| sd-legacy-stable-diffusion-v1-5/aitk/stable_diffusion.py | End-to-end CLI for optimizing and running SD v1.5 across providers/formats (incl. QDQ). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/qdq_xl.py | ORT SDXL pipeline wrappers with data-save hooks (for QDQ data generation). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/qdq.py | QDQ-specific Olive config shaping + ONNX pipeline wrapper + EP registration + QDQ pipeline loader. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/ov.py | OpenVINO pipeline implementation and Olive config helpers for OV conversion/runtime. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/ort.py | ORT/CUDA optimization helpers, footprint parsing, and pipeline materialization (incl. QNN ctx bin copy). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/onnx_patch.py | Patched ONNX model wrapper to support saving external weights alongside ONNX artifacts. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/config.py | Shared runtime config values (sample sizes, flags, data dir). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.py | AITK workflow driver orchestrating conversion, data generation, and quantized model generation. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.json.config | AITK workflow UI/config template for QNN conversion + quantization + evaluation. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.json | Olive/AITK workflow definition for SD v1.5 QNN target. |
| sd-legacy-stable-diffusion-v1-5/aitk/model_project.config | Registers the workflow in the model project configuration. |
| sd-legacy-stable-diffusion-v1-5/aitk/model_adaptations.py | QNN-focused UNet monkey-patches (attention/activations/norm/proj changes) for compatibility/perf. |
| sd-legacy-stable-diffusion-v1-5/aitk/info.yml | AITK metadata for the SD v1.5 QNN recipe. |
| sd-legacy-stable-diffusion-v1-5/aitk/inference_sample.ipynb | Sample notebook demonstrating QNN EP registration and inference. |
| sd-legacy-stable-diffusion-v1-5/aitk/evaluation.py | Evaluation/data-generation script (CLIP/FID/MSE/HPSv2 hooks, dataset streaming). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_vae_encoder.json | Olive config for VAE encoder conversion/optimization/quantization. |
| sd-legacy-stable-diffusion-v1-5/aitk/config_vae_decoder.json | Olive config for VAE decoder conversion/optimization/quantization (+ optional EP context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_unet.json | Olive config for UNet conversion/optimization/quantization (+ optional EP context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_text_encoder.json | Olive config for text encoder conversion/optimization/quantization (+ surgery/context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_safety_checker.json | Olive config for safety checker conversion/optimization. |
| sd-legacy-stable-diffusion-v1-5/aitk/README.md | Usage documentation for data generation, optimization, and evaluation. |
| sd-legacy-stable-diffusion-v1-5/aitk/.gitignore | Ignores generated caches, artifacts, and results. |
| .aitk/scripts/project_processor.py | Adds HuggingFace icon mapping for the SD v1.5 model family key. |
| .aitk/requirements/requirements-WCR-SD.txt | Adds SD workflow runtime requirements (accelerate/diffusers/torch-fidelity pins). |
| .aitk/configs/model_list.json | Registers SD v1.5 model entry and adds dataset link for phiyodr/coco2017. |
| .aitk/configs/checks.json | Updates check counters to reflect the new model/workflow assets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| optimized_model_dir = script_dir / "models" / "optimized" / model_id | ||
|
|
||
| if common_args.clean_cache: | ||
| shutil.rmtree(common_args.cache_dir, ignore_errors=True) |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --clean_cache is provided without --cache_dir, shutil.rmtree(common_args.cache_dir, ...) will raise a TypeError because cache_dir is None. Consider either making --cache_dir required when --clean_cache is set, or defaulting to a known cache path (e.g., script_dir / "cache") and guarding against None.
| shutil.rmtree(common_args.cache_dir, ignore_errors=True) | |
| # If no cache_dir was provided, default to a "cache" directory under script_dir. | |
| cache_dir = common_args.cache_dir or (script_dir / "cache") | |
| if cache_dir is not None: | |
| shutil.rmtree(cache_dir, ignore_errors=True) |
| worker_script = os.path.abspath('winml.py') | ||
| result = subprocess.check_output([sys.executable, worker_script], text=True) | ||
| paths = json.loads(result) |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker_script = os.path.abspath('winml.py') depends on the current working directory, so this can fail (or pick up an unintended/malicious winml.py) when the script is launched from another directory. Use a path relative to this module (e.g., Path(__file__).resolve().parents[1] / "winml.py" or similar) to ensure the intended helper is executed.
| if qdq_args.save_data: | ||
| pipeline.save_data_dir = script_dir / qdq_args.data_dir / common_args.prompt | ||
| os.makedirs(pipeline.save_data_dir, exist_ok=True) | ||
| else: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline.save_data_dir = script_dir / qdq_args.data_dir / common_args.prompt uses the raw prompt as a directory name. Prompts can contain path separators or characters invalid on Windows, which can break saving or allow writing outside the intended directory. Sanitize/slugify the prompt (or hash it) before using it in a filesystem path.
|
|
||
| ### Test and evaluate | ||
|
|
||
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir optimized-qnn_qdq` |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README’s evaluation example uses --model_dir optimized-qnn_qdq, but stable_diffusion.py writes optimized models under models/optimized/<model_id> (no provider/format suffix). As written, the evaluation command will look in a directory that is never created; update the README command (or align the output directory naming in code).
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir optimized-qnn_qdq` | |
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir models/optimized/stable-diffusion-v1-5/stable-diffusion-v1-5` |
| if not common_args.optimize: | ||
| model_dir = unoptimized_model_dir if common_args.test_unoptimized else optimized_model_dir | ||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") | ||
| if provider == "openvino": | ||
| from sd_utils.ov import get_ov_pipeline | ||
|
|
||
| pipeline = get_ov_pipeline(common_args, ov_args, optimized_model_dir) | ||
| elif common_args.format == "qdq": | ||
| from sd_utils.qdq import get_qdq_pipeline | ||
|
|
||
| pipeline = get_qdq_pipeline(model_dir, common_args, qdq_args, script_dir) | ||
| else: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QDQ export path uses fixed batch=1 shapes (see dynamic_shape_to_fixed in the configs), but the CLI still allows --batch_size > 1 and passes it through to get_qdq_pipeline. This will likely fail at runtime with shape mismatches. Consider enforcing batch_size==1 when --format qdq (or documenting/handling larger batches).
| elif provider == "qnn" and submodel_name not in ("vae_encoder"): | ||
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | ||
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | ||
| config["passes"]["convert"]["target_opset"] = 20 | ||
|
|
||
| # Quantization params | ||
| if submodel_name not in ("text_encoder"): |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition submodel_name not in ("vae_encoder") is using a string instead of a 1-element tuple, so it performs substring membership rather than comparing names. This is brittle and can lead to incorrect branching if submodel_name ever changes; use submodel_name != "vae_encoder" (or not in ("vae_encoder",)) instead. Same issue for the ("text_encoder") check below.
| elif provider == "qnn" and submodel_name not in ("vae_encoder"): | |
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | |
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | |
| config["passes"]["convert"]["target_opset"] = 20 | |
| # Quantization params | |
| if submodel_name not in ("text_encoder"): | |
| elif provider == "qnn" and submodel_name != "vae_encoder": | |
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | |
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | |
| config["passes"]["convert"]["target_opset"] = 20 | |
| # Quantization params | |
| if submodel_name != "text_encoder": |
| else: | ||
| if "src_height" in meta: | ||
| orig_height, orig_width = meta["src_height"], meta["src_width"] | ||
| image = [cv2.resize(img, (orig_width, orig_width)) for img in image] |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv2.resize(img, (orig_width, orig_width)) uses the width value for both dimensions, which will distort images when orig_height != orig_width. This should resize to (orig_width, orig_height) (and note OpenCV expects size as (width, height)).
| image = [cv2.resize(img, (orig_width, orig_width)) for img in image] | |
| image = [cv2.resize(img, (orig_width, orig_height)) for img in image] |
| model (nn.Module): The model in which to replace Attention modules. | ||
|
|
||
| """ | ||
| traverse_and_replace(model, attention_processor.Attention, lambda orig_attn: SHAAttention(orig_attn)) |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
| traverse_and_replace(model, attention_processor.Attention, lambda orig_attn: SHAAttention(orig_attn)) | |
| traverse_and_replace(model, attention_processor.Attention, SHAAttention) |
|
|
||
| try: | ||
| shutil.copyfile(src_path, dst_path) | ||
| except shutil.SameFileError: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| dst_path = Path(save_directory).joinpath(ONNX_EXTERNAL_WEIGHTS_NAME) | ||
| try: | ||
| shutil.copyfile(src_path, dst_path) | ||
| except shutil.SameFileError: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| copy_olive_config(history_folder, config_name, cache_dir, output_dir, activation_type, precision) | ||
|
|
||
| # run stable_diffusion.py to generate onnx unoptimized model | ||
| subprocess.run([sys.executable, "stable_diffusion.py", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had better follow whisper to share original model and skip if exist
|
|
||
| # # run evaluation.py to generate data | ||
| subprocess.run([sys.executable, "evaluation.py", | ||
| "--script_dir", history_folder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for data to share, so no need to rerun if exist
| @@ -0,0 +1,21 @@ | |||
| import json | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this file? if not, please remove as it is big
No description provided.