Predicts aesthetic scores for images. Trained on AI Horde community ratings of Stable Diffusion generated images.
- ratings | artifacts | openclip_vit_bigg_14
- ratings | artifacts | openclip_vit_h_14
- ratings | artifacts | openclip_vit_l_14
Accuracy score on testset from https://github.com/THUDM/ImageReward#reproduce-experiments-in-table-2
Model files in aesthetics_scorer/models folder
Simple gradio demo
python aesthetics_scorer/demo.py
- dataset-process/dataset_downloader.py downloads zipped diffusiondb dataset images, change path to where you want it stored (~200gb)
- dataset-process/dataset_parquet_files.py downloads dataset parquet files and sets up train and validation splits
- dataset-process/dataset_image_extract.py extract the rated images from the zipped dataset files
- dataset-process/clip_encode_dataset.py precomputes clip embeddings for all rated images (change config if you don't need the different clip versions)
If 1) is already downloaded then 2) can be rerun to update dataset parquet files and 3) and 4) will only perform work on new images needed that hasn't already been processed.
In aesthetics_scorer/train.py change whatever configs you want. Mostly importantly EMBEDDING_FILE to whatever embeddingfile you preprocessed. There are a bunch of different hyperparams that can be changed.
python aesthetics_scorer/train.py
- Inspired by https://github.com/christophschuhmann/improved-aesthetic-predictor
- Image dataset https://poloclub.github.io/diffusiondb/
- Image ratings by https://aihorde.net/
- Benchmark from https://github.com/THUDM/ImageReward/
@article{wangDiffusionDBLargescalePrompt2022,
title = {Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models},
author = {Wang, Zijie J. and Montoya, Evan and Munechika, David and Yang, Haoyang and Hoover, Benjamin and Chau, Duen Horng},
year = {2022},
journal = {arXiv:2210.14896 [cs]},
url = {https://arxiv.org/abs/2210.14896}
}
@software{ilharco_gabriel_2021_5143773,
author = {Ilharco, Gabriel and
Wortsman, Mitchell and
Wightman, Ross and
Gordon, Cade and
Carlini, Nicholas and
Taori, Rohan and
Dave, Achal and
Shankar, Vaishaal and
Namkoong, Hongseok and
Miller, John and
Hajishirzi, Hannaneh and
Farhadi, Ali and
Schmidt, Ludwig},
title = {OpenCLIP},
month = jul,
year = 2021,
note = {If you use this software, please cite it as below.},
publisher = {Zenodo},
version = {0.1},
doi = {10.5281/zenodo.5143773},
url = {https://doi.org/10.5281/zenodo.5143773}
}
@misc{xu2023imagereward,
title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation},
author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong},
year={2023},
eprint={2304.05977},
archivePrefix={arXiv},
primaryClass={cs.CV}
}