![image](https://private-user-images.githubusercontent.com/16319629/397986041-54db2bdc-40f8-47e2-81a3-aa26cbbcd611.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5ODM5NzUsIm5iZiI6MTczODk4MzY3NSwicGF0aCI6Ii8xNjMxOTYyOS8zOTc5ODYwNDEtNTRkYjJiZGMtNDBmOC00N2UyLTgxYTMtYWEyNmNiYmNkNjExLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDAzMDExNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFlNmJmYzBmMmQxMDFhNzEzNmIwZTk4NzQxOGZmNzcwMTg5NjY1Y2Q5NGU0MThhODU2MGQ4OWNlNThlMmM4M2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Haxz5fzLVu_ZAyGD_5utKKN4YIsL08p4WEMTGIj1k9o)
@article{wang2024parallelized,
title={Parallelized Autoregressive Visual Generation},
author={Wang, Yuqing and Ren, Shuhuai and Lin, Zhijie and Han, Yujin and Guo, Haoyuan and Yang, Zhenheng and Zou, Difan and Feng, Jiashi and Liu, Xihui},
journal={arXiv preprint arXiv:2412.15119},
year={2024}
}
- Linux with Python ≥ 3.7
- PyTorch ≥ 2.1
- A100 GPUs
We use the same environment as LLamaGen. For more details, please refer to here.
Method | params | tokens | rFID (256x256) | weight |
---|---|---|---|---|
vq_ds16_c2i | 72M | 16x16 | 2.19 | vq_ds16_c2i.pt |
Method | params | tokens | FID (256x256) | weight |
---|---|---|---|---|
PAR-XL-4x | 775M | 24x24 | 2.61 | PAR-XL-4x.pt |
PAR-XXL-4x | 1.4B | 24x24 | 2.35 | PAR-XXL-4x.pt |
PAR-3B-4x | 3.1B | 24x24 | 2.29 | PAR-3B-4x.pt |
PAR-3B-16x | 3.1B | 24x24 | 2.88 | PAR-3B-16x.pt |
Please download the above models, put them in the folder ./pretrained_models
bash scripts/autoregressive/extract_codes_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --data-path /path/to/imagenet/train --code-path /path/to/imagenet_code_c2i_flip_ten_crop --ten-crop --crop-range 1.1 --image-size 384
Before running, please change nnodes, nproc_per_node, node_rank, master_addr, master_port
in .sh
. The spe-token-num
and ar-token-num
represent the number of learnable tokens (n-1
) and the number of tokens for parallel generation (n
), respectively.
bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XL
bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XXL
bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-3B
bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.345
bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-1B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XXL --image-size 384 --image-size-eval 256 --cfg-scale 1.435
bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-XL-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XL --image-size 384 --image-size-eval 256 --cfg-scale 1.5
bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-16x.pt --spe-token-num 15 --ar-token-num 16 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.5
Before evaluation, please refer evaluation readme to install required packages.
python3 evaluations/c2i/evaluator.py VIRTUAL_imagenet256_labeled.npz samples/GPT-XXL-PAR-XXL-4x-size-384-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-1.435-seed-0.npz
The development of PAR is based on LlamaGen. We deeply appreciate this contribution to the community.