Release v0.9.0 · ssube/onnx-web

Features

add prompt tokens for LoRAs and Textual Inversions (#212, #213)
- blend additional networks without writing huge files to disk
- works with ONNX acceleration, but not all optimizations
improve support for fp16 models (#121, #290)
- supports ONNX partial fp16 and PyTorch full fp16
- supports AMD and Nvidia, but not CPU
- works with LoRAs and Textual Inversions (#274)
add more ONNX optimizations (#241)
add noise level parameter to upscaling tab (#196)
add diagnostic scripts to check your pip environment or a model file (#210)
add a way to set the CUDA memory limit for each ONNX runtime session (#211)
add an error state and retry button to the image loading card when the image fails (#225)
experimental support for prompt-based CLIP skip (#202)
fix an error when using the long prompt weighting pipeline with diffusers >= 0.14.0 (#298)
improvements to device worker pool
- increment job counter when job starts rather than when it is queued (#283)
- occasionally sweep for pending jobs and idle devices (#284)
- clear the cancelled flag before starting a new job (#269)
remove the appearance of a prompt length limit (#268)
- there has not been a real limit since v0.7.1

LoRAs and Textual Inversions

You can now blend additional networks with the diffusion model at runtime, rather than including them during conversion, using <type:name:weight> tokens. I've tried to keep these compatible with the Auto1111 prompt syntax and other Stable Diffusion UIs, but some tokens depend on the filename, all of which is explained in the user guide.

You can still permanently blend the additional models by including them in your extras.json file.

FP16 and other ONNX optimizations

Using ONNX for inference requires a little bit more memory than some other runtimes, but offers some optimizations to help counter that. This release adds broad support for FP16 models, using both the ONNX runtime's optimization tools and PyTorch's native support. This should expand support to 8GB cards and may work on 6GB cards, although 4GB is not quite there yet.

The ONNX optimizations are supported on both AMD and Nvidia, while the PyTorch fp16 mode only works with CUDA on Nvidia.

Artifacts

https://ssube.github.io/onnx-web/v0.9.0/index.html
https://hub.docker.com/repository/docker/ssube/onnx-web-api
- podman pull docker.io/ssube/onnx-web-api:v0.9.0-cpu-buster
- podman pull docker.io/ssube/onnx-web-api:v0.9.0-cuda-ubuntu
- podman pull docker.io/ssube/onnx-web-api:v0.9.0-rocm-ubuntu
https://hub.docker.com/repository/docker/ssube/onnx-web-gui
- podman pull docker.io/ssube/onnx-web-gui:v0.9.0-nginx-alpine
- podman pull docker.io/ssube/onnx-web-gui:v0.9.0-nginx-bullseye
- podman pull docker.io/ssube/onnx-web-gui:v0.9.0-node-alpine
- podman pull docker.io/ssube/onnx-web-gui:v0.9.0-node-bullseye
https://www.npmjs.com/package/@apextoaster/onnx-web
- yarn add @apextoaster/[email protected]
https://pypi.org/project/onnx-web/
- pip install onnx-web==0.9.0

Release checklist: #261
Release milestone: https://github.com/ssube/onnx-web/milestone/8?closed=1
Release pipeline: https://git.apextoaster.com/ssube/onnx-web/-/pipelines/50223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0

Features

LoRAs and Textual Inversions

FP16 and other ONNX optimizations

Artifacts