PyTorch implementation of Domain-Scalable Unpaired Image Translation via Latent Space Anchoring
Siyu Huang* (Harvard), Jie An* (Rochester), Donglai Wei (BC), Zudi Lin (Amazon Alexa), Jiebo Luo (Rochester), Hanspeter Pfister (Harvard)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Given an unpaired image-to-image translation (UNIT) model trained on certain domains, it is challenging to incorporate new domains. This work includes a domain-scalable UNIT method, termed as latent space anchoring, anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models to reconstruct single-domain images. In inference, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning:
We recommend installing using Anaconda. All dependencies are provided in env.yaml
.
conda env create -f env.yaml
conda activate lsa
Please download the pre-trained models from the following links.
Name | Enc/Dec Domain | Generator Backbone |
---|---|---|
seg2ffhq.pt | facial segmentation mask (CelebAMask-HQ) | StyleGAN2 trained on FFHQ face. |
sketch2ffhq.pt | facial sketch (CUFSF) | StyleGAN2 trained on FFHQ face. |
cat2dog.pt | cat face (AFHQ-cat) | StyleGAN2 trained on AFHQ-dog. |
In addition, we provide the auxiliary pre-trained models used for training our models.
Name | Description |
---|---|
stylegan2-ffhq-config-f.pt | StyleGAN2 generator on FFHQ face. |
psp_ffhq_encode.pt | The encoder for StyleGAN2-FFHQ inversion. |
model_ir_se50.pth | IR-SE50 model used for encoder's weight initialization. |
input regressor output generator output
Unpaired image translation from CelebAMask-HQ mask to FFHQ image.
Please download CelebAMask-HQ dataset, put it in ./data/
. Download pre-trained model seg2ffhq.pt, put it in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── CelebAMask-HQ
│ │ ├── face_parsing
│ │ │ ├── Data_preprocessing
│ │ │ ├── ├── train_img
│ │ │ ├── ├── train_label
│ │ │ ├── ├── test_img
│ │ │ ├── ├── test_label
├── pretrained_models
│ ├── seg2ffhq.pt
├── commands
│ ├── test_seg2ffhq.sh
Run:
bash commands/test_seg2ffhq.sh
input regressor output generator output
Unpaired image translation from CUFSF facial sketch to FFHQ image. Figures: input, regressor output, generator output.
Please download CUFSF dataset, put it in ./data/
. Manually split the dataset into training and test sets (we use the first 1k images as training set and the rest as test set). Download pre-trained model sketch2ffhq.pt, put it in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── CUFSF
│ │ ├── train
│ │ ├── test
├── pretrained_models
│ ├── sketch2ffhq.pt
├── commands
│ ├── test_sketch2ffhq.sh
Run:
bash commands/test_sketch2ffhq.sh
input regressor output generator output
Unpaired image translation from FFHQ image to CelebAMask-HQ mask. Figures: input, regressor output, generator output.
Please download FFHQ dataset, put it in ./data/
. Download pre-trained models seg2ffhq.pt and psp_ffhq_encode.pt, put them in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── ffhq
│ │ ├── images1024x1024
├── pretrained_models
│ ├── seg2ffhq.pt
│ ├── psp_ffhq_encode.pt
├── commands
│ ├── test_ffhq2seg.sh
Run:
bash commands/test_seg2ffhq.sh
input regressor output generator output
Unpaired image translation from FFHQ image to CUFSF facial sketch. Figures: input, regressor output, generator output.
Please download FFHQ dataset, put it in ./data/
. Download pre-trained models sketch2ffhq.pt and psp_ffhq_encode.pt, put them in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── ffhq
│ │ ├── images1024x1024
├── pretrained_models
│ ├── sketch2ffhq.pt
│ ├── psp_ffhq_encode.pt
├── commands
│ ├── test_ffhq2sketch.sh
Run:
bash commands/test_ffhq2sketch.sh
input regressor output generator output
Unpaired image translation from CelebAMask-HQ mask to CUFSF facial sketch.
Please download CelebAMask-HQ dataset, put it in ./data/
. Download pre-trained model seg2ffhq.pt and sketch2ffhq.pt, put them in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── CelebAMask-HQ
│ │ ├── face_parsing
│ │ │ ├── Data_preprocessing
│ │ │ ├── ├── train_img
│ │ │ ├── ├── train_label
│ │ │ ├── ├── test_img
│ │ │ ├── ├── test_label
├── pretrained_models
│ ├── seg2ffhq.pt
│ ├── sketch2ffhq.pt
├── commands
│ ├── test_seg2sketch.sh
Run:
bash commands/test_seg2sketch.sh
input regressor output generator output
Unpaired image translation from AFHQ-cat to AFHQ-dog.
Please download AFHQ dataset, put it in ./data/
. Download pre-trained model cat2dog.pt, put it in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── AFHQ
│ │ ├── afhq
│ │ │ ├── train
│ │ │ ├── ├── cat
│ │ │ ├── ├── dog
│ │ │ ├── ├── wild
│ │ │ ├── test
│ │ │ ├── ├── cat
│ │ │ ├── ├── dog
│ │ │ ├── ├── wild
├── pretrained_models
│ ├── cat2dog.pt
├── commands
│ ├── test_cat2dog.sh
Run:
bash commands/test_cat2dog.sh
It requires a single GPU with at least 16GB memory. Less GPU memory with a smaller batch size is potentially feasible, although we have not tested it.
Train encoder and regressor for CelebAMask-HQ mask domain, by using StyleGAN2-FFHQ as the generator backbone.
Please download CelebAMask-HQ dataset and FFHQ dataset, put them in ./data/
. Download pre-trained models stylegan2-ffhq-config-f.pt and model_ir_se50.pth, put them in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── CelebAMask-HQ
│ │ ├── face_parsing
│ │ │ ├── Data_preprocessing
│ │ │ ├── ├── train_img
│ │ │ ├── ├── train_label
│ │ │ ├── ├── test_img
│ │ │ ├── ├── test_label
│ ├── ffhq
│ │ ├── images1024x1024
├── pretrained_models
│ ├── stylegan2-ffhq-config-f.pt
│ ├── model_ir_se50.pth
├── commands
│ ├── train_seg2ffhq.sh
Run:
bash commands/train_seg2ffhq.sh
The training results and model checkpoints will be saved in ./logs/seg2ffhq
.
Train encoder and regressor for CUFSF facial sketch domain, by using StyleGAN2-FFHQ as the generator backbone.
Please download CUFSF and FFHQ dataset, put them in ./data/
. Download pre-trained models stylegan2-ffhq-config-f.pt and model_ir_se50.pth, put them in ./pretrained_models
. The folder structure is
Latent-Space-Anchoring
├── data
│ ├── CUFSF
│ │ ├── train
│ │ ├── test
│ ├── ffhq
│ │ ├── images1024x1024
├── pretrained_models
│ ├── stylegan2-ffhq-config-f.pt
│ ├── model_ir_se50.pth
├── commands
│ ├── train_sketch2ffhq.sh
Run:
bash commands/train_sketch2ffhq.sh
The training results and model checkpoints will be saved in ./logs/sketch2ffhq
.
We support diverse model generations.
Please follow *Testing: CelebAMask-to-FFHQ to prepare dataset and pre-trained models. Run:
bash commands/inference_seg2ffhq.sh
We support sampling high-resolution (i.e., 1024x1024) mask and images from a random noise.
Please follow Testing: CelebAMask-to-FFHQ to prepare dataset and pre-trained models. Run:
bash commands/sampling_seg2ffhq.sh
This implementation is built upon StyleGAN2 and pixel2style2pixel.
@article{huang2023domain,
author={Huang, Siyu and An, Jie and Wei, Donglai and Lin, Zudi and Luo, Jiebo and Pfister, Hanspeter},
title={Domain-Scalable Unpaired Image Translation Via Latent Space Anchoring},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
}