Click to expand
This repository provides scripts for training LoRA (Low-Rank Adaptation) models with HunyuanVideo, Wan2.1/2.2, FramePack, FLUX.1 Kontext, FLUX.2 dev/klein, Qwen-Image, and Z-Image architectures.
This repository is unofficial and not affiliated with the official repositories of these architectures.
This repository is under development.
We are grateful to the following companies for their generous sponsorship:
If you find this project helpful, please consider supporting its development via GitHub Sponsors. Your support is greatly appreciated!
GitHub Discussions Enabled: We've enabled GitHub Discussions for community Q&A, knowledge sharing, and technical information exchange. Please use Issues for bug reports and feature requests, and Discussions for questions and sharing experiences. Join the conversation →
-
January 24, 2026
- Fixed an issue where LoRA training for FLUX.2 [klein] did not work. Also made various bug fixes and feature additions related to FLUX.2. See PR #858.
- The
--model_versionspecification has changed fromflux.2-devorflux.2-klein-4btodevorklein-4b, etc. - fp8 optimization and other features also work. Please refer to the documentation for details.
- Since klein 9B, dev models, and training with multiple control images have not been sufficiently tested, please report any issues via Issue.
- The
- Fixed an issue where LoRA training for FLUX.2 [klein] did not work. Also made various bug fixes and feature additions related to FLUX.2. See PR #858.
-
January 21, 2026
- Added support for LoRA training of FLUX.2 [dev]/[klein]. See PR #841. Many thanks to christopher5106 from https://www.scenario.com for this contribution.
- Please refer to the documentation for details.
- Added support for LoRA training of FLUX.2 [dev]/[klein]. See PR #841. Many thanks to christopher5106 from https://www.scenario.com for this contribution.
-
January 17, 2026
- Changed to use
convert_lora.pyfor converting Z-Image LoRA for ComfyUI to improve compatibility. See PR #851.- The previous
convert_z_image_lora_to_comfy.pycan still be used, but the converted weights may not work correctly with nunchaku. - Please refer to the documentation for details.
- Many thanks to fai-9 for providing the solution in Issue #847.
- The previous
- Added
--remove_first_image_from_targetoption for LoRA training of Qwen-Image-Layered. See PR #852.- Please refer to the documentation for details.
- Changed to use
-
January 11, 2026
- Added support for LoRA training of Qwen-Image-Layered. See PR #816.
- Please refer to the documentation for details.
- In the caching, training, and inference scripts, specify
--model_versionoption aslayered.
- Added support for LoRA training of Qwen-Image-Layered. See PR #816.
-
December 27, 2025
- Added support for Qwen-Image-Edit-2511. See PR #808.
- Please refer to the documentation for details such as checkpoints and options.
- In the caching, training, and inference scripts, specify
--model_versionoption asedit-2511.
- Added support for Qwen-Image-Edit-2511. See PR #808.
-
December 25, 2025
- Added support for LoRA training of Kandinsky 5. See PR #774. Many thanks to AkaneTendo25 for this contribution.
- Please refer to the documentation for details.
- *Note that some weight specifications are in Hugging Face ID format. We plan to change to direct .safetensors specification like other models soon, so please be aware.
- Added support for LoRA training of Kandinsky 5. See PR #774. Many thanks to AkaneTendo25 for this contribution.
We are grateful to everyone who has been contributing to the Musubi Tuner ecosystem through documentation and third-party tools. To support these valuable contributions, we recommend working with our releases as stable reference points, as this project is under active development and breaking changes may occur.
You can find the latest release and version history in our releases page.
This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards.
To use them, you need to opt-in by creating your own configuration file in the project root.
Quick Setup:
-
Create a
CLAUDE.md,GEMINI.md, and/orAGENTS.mdfile in the project root. -
Add the following line to your
CLAUDE.mdto import the repository's recommended prompt (currently they are the almost same):@./.ai/claude.prompt.md
or for Gemini:
@./.ai/gemini.prompt.md
You may be also import the prompt depending on the agent you are using with the custom prompt file such as
AGENTS.md. -
You can now add your own personal instructions below the import line (e.g.,
Always include a short summary of the change before diving into details.).
This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your CLAUDE.md, GEMINI.md and AGENTS.md (as well as Claude's .mcp.json) are already listed in .gitignore, so they won't be committed to the repository.
- VRAM: 12GB or more recommended for image training, 24GB or more for video training
- Actual requirements depend on resolution and training settings. For 12GB, use a resolution of 960x544 or lower and use memory-saving options such as
--blocks_to_swap,--fp8_llm, etc.
- Actual requirements depend on resolution and training settings. For 12GB, use a resolution of 960x544 or lower and use memory-saving options such as
- Main Memory: 64GB or more recommended, 32GB + swap may work
- Memory-efficient implementation
- Windows compatibility confirmed (Linux compatibility confirmed by community)
- Multi-GPU training (using Accelerate), documentation will be added later
For detailed information on specific architectures, configurations, and advanced features, please refer to the documentation below.
Architecture-specific:
- HunyuanVideo
- Wan2.1/2.2
- Wan2.1/2.2 (Single Frame)
- FramePack
- FramePack (Single Frame)
- FLUX.1 Kontext
- Qwen-Image
- Z-Image
- HunyuanVideo 1.5
- Kandinsky 5
- FLUX.2
Common Configuration & Usage:
- Dataset Configuration
- Advanced Configuration
- Sampling during Training
- Tools and Utilities
- Using torch.compile
Python 3.10 or later is required (verified with 3.10).
Create a virtual environment and install PyTorch and torchvision matching your CUDA version.
PyTorch 2.5.1 or later is required (see note).
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124Install the required dependencies using the following command.
pip install -e .Optionally, you can use FlashAttention and SageAttention (for inference only; see SageAttention Installation for installation instructions).
Optional dependencies for additional features:
ascii-magic: Used for dataset verificationmatplotlib: Used for timestep visualizationtensorboard: Used for logging training progressprompt-toolkit: Used for interactive prompt editing in Wan2.1 and FramePack inference scripts. If installed, it will be automatically used in interactive mode. Especially useful in Linux environments for easier prompt editing.
pip install ascii-magic matplotlib tensorboard prompt-toolkitYou can also install using uv, but installation with uv is experimental. Feedback is welcome.
- Install uv (if not already present on your OS).
curl -LsSf https://astral.sh/uv/install.sh | shFollow the instructions to add the uv path manually until you restart your session...
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"Follow the instructions to add the uv path manually until you reboot your system... or just reboot your system at this point.
Model download procedures vary by architecture. Please refer to the architecture-specific documents in the Documentation section for instructions.
Please refer to here.
Pre-caching procedures vary by architecture. Please refer to the architecture-specific documents in the Documentation section for instructions.
Run accelerate config to configure Accelerate. Choose appropriate values for each question based on your environment (either input values directly or use arrow keys and enter to select; uppercase is default, so if the default value is fine, just press enter without inputting anything). For training with a single GPU, answer the questions as follows:
- In which compute environment are you running?: This machine
- Which type of machine are you using?: No distributed training
- Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)?[yes/NO]: NO
- Do you wish to optimize your script with torch dynamo?[yes/NO]: NO
- Do you want to use DeepSpeed? [yes/NO]: NO
- What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]: all
- Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
- Do you wish to use mixed precision?: bf16Note: In some cases, you may encounter the error ValueError: fp16 mixed precision requires a GPU. If this happens, answer "0" to the sixth question (What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:). This means that only the first GPU (id 0) will be used.
Training and inference procedures vary significantly by architecture. Please refer to the architecture-specific documents in the Documentation section and the various configuration documents for detailed instructions.
sdbsd has provided a Windows-compatible SageAttention implementation and pre-built wheels here: https://github.com/sdbds/SageAttention-for-windows. After installing triton, if your Python, PyTorch, and CUDA versions match, you can download and install the pre-built wheel from the Releases page. Thanks to sdbsd for this contribution.
For reference, the build and installation instructions are as follows. You may need to update Microsoft Visual C++ Redistributable to the latest version.
-
Download and install triton 3.1.0 wheel matching your Python version from here.
-
Install Microsoft Visual Studio 2022 or Build Tools for Visual Studio 2022, configured for C++ builds.
-
Clone the SageAttention repository in your preferred directory:
git clone https://github.com/thu-ml/SageAttention.git
-
Open
x64 Native Tools Command Prompt for VS 2022from the Start menu under Visual Studio 2022. -
Activate your venv, navigate to the SageAttention folder, and run the following command. If you get a DISTUTILS not configured error, set
set DISTUTILS_USE_SDK=1and try again:python setup.py install
This completes the SageAttention installation.
If you specify torch for --attn_mode, use PyTorch 2.5.1 or later (earlier versions may result in black videos).
If you use an earlier version, use xformers or SageAttention.
This repository is unofficial and not affiliated with the official repositories of the supported architectures.
This repository is experimental and under active development. While we welcome community usage and feedback, please note:
- This is not intended for production use
- Features and APIs may change without notice
- Some functionalities are still experimental and may not work as expected
- Video training features are still under development
If you encounter any issues or bugs, please create an Issue in this repository with:
- A detailed description of the problem
- Steps to reproduce
- Your environment details (OS, GPU, VRAM, Python version, etc.)
- Any relevant error messages or logs
We welcome contributions! Please see CONTRIBUTING.md for details.
Code under the hunyuan_model directory is modified from HunyuanVideo and follows their license.
Code under the hunyuan_video_1_5 directory is modified from HunyuanVideo 1.5 and follows their license.
Code under the wan directory is modified from Wan2.1. The license is under the Apache License 2.0.
Code under the frame_pack directory is modified from FramePack. The license is under the Apache License 2.0.
Other code is under the Apache License 2.0. Some code is copied and modified from Diffusers.