UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

UltraFlux is a diffusion transformer that extends Flux backbones to native 4K synthesis with consistent quality across a wide range of aspect ratios. The project unifies data, architecture, objectives, and optimization so that positional encoding, VAE compression, and loss design reinforce each other rather than compete.

Show UltraFlux generation examples

Each sample is rendered at 4096×4096 resolution.

👥 Authors

Tian Ye¹*‡,Song Fei¹*, Lei Zhu^1,2†

¹The Hong Kong University of Science and Technology (Guangzhou)
²The Hong Kong University of Science and Technology

*Equal Contribution, ‡Project Leader, †Corresponding Author

📰 News ✨✨

[2025.11.21] – We released the UltraFlux-v1.1 transformer checkpoint. It is fine-tuned on a carefully curated set of high-aesthetic synthetic images to further improve visual aesthetics and composition quality. You can now enable it easily by uncommenting the corresponding lines in inf_ultraflux.py!

[2025.11.20] – We released the UltraFlux-v1 checkpoint, inference code, and the accompanying tech report.

Inference Quickstart

The script inf_ultraflux.py downloads the latest Owen777/UltraFlux-v1 weights (transformer + VAE) and runs a set of curated prompts.
Ensure PyTorch, diffusers, and CUDA are available, then run:

python inf_ultraflux.py

Generated images are saved into results/ultra_flux_*.jpeg at 4096×4096 resolution; edit the prompt list or pipeline arguments inside the script to customize inference.

Why UltraFlux?

4K positional robustness. Resonance 2D RoPE with YaRN keeps training-window awareness while remaining band-aware and aspect-ratio aware to avoiding ghosting.
Detail-preserving compression. A lightweight, non-adversarial post-training routine sharpens Flux VAE reconstructions at 4K without sacrificing throughput, resolving the usual trade-off between speed and micro-detail.
4K-aware objectives. The SNR-Aware Huber Wavelet Training Objective emphasizes high-frequency fidelity in the latent space so gradients stay balanced across timesteps and frequency bands.
Aesthetic-aware scheduling. Stage-wise Aesthetic Curriculum Learning (SACL) routes high-aesthetic supervision toward high-noise steps, sculpting the model prior where it matters most for vivid detail and alignment.

MultiAspect-4K-1M Dataset

Scale and coverage. 1M native and near-4K images with controlled aspect-ratio sampling to ensure both wide and portrait regimes are equally represented.
Content balance. A dual-channel collection pipeline debiases landscape-heavy sources toward human-centric content.
Rich metadata. Every sample includes bilingual captions, subject tags, CLIP/VLM-based quality and aesthetic scores, and classical IQA metrics, enabling targeted subset sampling for specific training stages.

Model & Training Recipe

Backbone. Flux-style DiT trained directly on MultiAspect-4K-1M with token-efficient blocks and Resonance 2D RoPE + YaRN for AR-aware positional encoding.
Objective. SNR-Aware Huber Wavelet loss aligns gradient magnitudes with 4K statistics, reinforcing high-frequency fidelity under strong VAE compression.
Curriculum. SACL injects high-aesthetic data primarily into high-noise timesteps so the model’s prior captures human-desired structure early in the trajectory.
VAE Post-training. A simple, non-adversarial fine-tuning pass boosts 4K reconstruction quality while keeping inference cost low.

Results

UltraFlux surpasses recent native-4K and training-free scaling baselines on standard 4K benchmarks spanning:

Image fidelity at 4096×4096 and higher
Aesthetic preference scores
Text-image alignment metrics across diverse aspect ratios

Resources

We will release the full stack upon publication:

MultiAspect-4K-1M dataset with metadata loaders
Training pipelines
Evaluation code covering fidelity, aesthetic, and alignment metrics

🚀 Updates

For the purpose of fostering research and the open-source community, we plan to open-source the entire project, encompassing training, inference, weights, etc. Thank you for your patience and support! 🌟

Release GitHub repo.
Release inference code (inf_ultraflux.py).
Release training code.
Release model checkpoints.
Release arXiv paper.
Release HuggingFace Space demo.
Release dataset (MultiAspect-4K-1M).

Stay tuned for links and usage instructions. For updates, please watch this repository or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
fig		fig
ultraflux		ultraflux
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inf_ultraflux.py		inf_ultraflux.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

👥 Authors

📰 News ✨✨

Inference Quickstart

Why UltraFlux?

MultiAspect-4K-1M Dataset

Model & Training Recipe

Results

Resources

🚀 Updates

About

Uh oh!

Releases

Packages

Languages

License

yottalabsai/UltraFlux

Folders and files

Latest commit

History

Repository files navigation

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

👥 Authors

📰 News ✨✨

Inference Quickstart

Why UltraFlux?

MultiAspect-4K-1M Dataset

Model & Training Recipe

Results

Resources

🚀 Updates

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages