Skip to content

Commit f30da45

Browse files
committed
PR NVlabs#80, NVlabs#143 and NVlabs#111, NVlabs#116, NVlabs#125 added for correctly building conda env in Windows; update README; add models in gen_utils.py; general code linting
1 parent 4ce9d6f commit f30da45

File tree

4 files changed

+140
-81
lines changed

4 files changed

+140
-81
lines changed

README.md

+111-63
Original file line numberDiff line numberDiff line change
@@ -8,69 +8,113 @@ of being backwards-compatible. As such, we can use our previously-trained models
88
get acquainted with the official repository and its codebase, as we will be building upon it and as such, increase its
99
capabilities (but hopefully not its complexity!).
1010

11-
This repository adds the following (not yet the complete list):
11+
This repository adds/has the following changes (not yet the complete list):
1212

1313
* Dataset tool
1414
* Add `--center-crop-tall`: add vertical black bars to the sides instead, in the same vein as the horizontal bars in
1515
`--center-crop-wide`.
1616
* Grayscale images in the dataset are converted to `RGB`.
1717
* If the dataset tool encounters an error, print it along the offending image, but continue with the rest of the dataset
18-
([pull #39](https://github.com/NVlabs/stylegan3/pull/39) from [Andreas Jansson](https://github.com/andreasjansson)).
19-
* *TODO*: Add multi-crop, as used in [Earth View](https://github.com/PDillis/earthview#multi-crop---data_augmentpy).
18+
([PR #39](https://github.com/NVlabs/stylegan3/pull/39) from [Andreas Jansson](https://github.com/andreasjansson)).
19+
* ***TODO:*** Add multi-crop, as used in [Earth View](https://github.com/PDillis/earthview#multi-crop---data_augmentpy).
2020
* Training
21-
* `--mirrory`: Added vertical mirroring for doubling the dataset size
22-
* `--gamma`: If no R1 regularization is provided, the heuristic formula will be used from [StyleGAN2](https://github.com/NVlabs/stylegan2).
23-
* `--aug`: ***TODO*** add [Deceive-D/APA](https://github.com/EndlessSora/DeceiveD) as an option.
24-
* `--augpipe`: Now available to use is [StyleGAN2-ADA's](https://github.com/NVlabs/stylegan2-ada-pytorch) full list of augpipe, e,g., `blit`, `geom`, `bgc`, `bgcfnc`, etc.
25-
* `--img-snap`: When to save snapshot images, so now it's independent of when the model is saved;
26-
* `--snap-res`: The resolution of the snapshots, depending on your screen resolution, or how many images you wish to see per tick. Available resolutions: `1080p`, `4k`, and `8k`.
21+
* `--mirrory`: Added vertical mirroring for doubling the dataset size (quadrupling if `--mirror` is used; make sure your dataset has either or both
22+
of these symmetries in order for it to make sense to use them)
23+
* `--gamma`: If no R1 regularization is provided, the heuristic formula from [StyleGAN2](https://github.com/NVlabs/stylegan2) will be used.
24+
* `--aug`: ***TODO:*** add [Deceive-D/APA](https://github.com/EndlessSora/DeceiveD) as an option.
25+
* `--augpipe`: Now available to use is [StyleGAN2-ADA's](https://github.com/NVlabs/stylegan2-ada-pytorch) full list of augpipe, i.e., individual augmentations (`blit`, `geom`, `color`, `filter`, `noise`, `cutout`) or their combinations (`bg`, `bgc`, `bgcf`, `bgcfn`, `bgcfnc`).
26+
* `--img-snap`: Set when to save snapshot images, so now it's independent of when the model is saved (e.g., save image snapshots more often to know how the model is training without saving the model itself, to save space).
27+
* `--snap-res`: The resolution of the snapshots, depending on how many images you wish to see per snapshot. Available resolutions: `1080p`, `4k`, and `8k`.
2728
* `--resume-kimg`: Starting number of `kimg`, useful when continuing training a previous run
28-
* `--outdir`: Automatically set as `training-runs`
29+
* `--outdir`: Automatically set as `training-runs`, so no need to set beforehand (in general this is true throughout the repository)
2930
* `--metrics`: Now set by default to `None`, so there's no need to worry about this one
30-
* `--resume`: All available pre-trained models from NVIDIA can be found with a simple dictionary, depending on the `--cfg` used.
31-
For example, if `--cfg=stylegan3-r`, then to transfer learn from FFHQU at 1024 resolution, set `--resume=ffhqu1024`.
32-
***TODO***: finish the following table, but full list available [here](https://github.com/PDillis/stylegan3-fun/blob/0bfa8e108487b50d6ecb73718c60497f063d8c17/train.py#L297).
33-
<table>
34-
<tr>
35-
<td>Available Models</td>
36-
<td><pre>ffhq256</pre></td>
37-
<td><pre>ffhqu256</pre></td>
38-
<td><pre>ffhq512</pre></td>
39-
<td><pre>ffhq1024</pre></td>
40-
<td><pre>ffhqu1024</pre></td>
41-
</tr>
42-
<tr>
43-
<td><pre>stylegan2</pre></td>
44-
<td>:heavy_check_mark:</td>
45-
<td>:heavy_check_mark:</td>
46-
<td>:heavy_check_mark:</td>
47-
<td>:heavy_check_mark:</td>
48-
<td>:heavy_check_mark:</td>
49-
</tr>
50-
<tr>
51-
<td><pre>stylegan3-t</pre></td>
52-
<td></td>
53-
<td>:heavy_check_mark:</td>
54-
<td></td>
55-
<td>:heavy_check_mark:</td>
56-
<td>:heavy_check_mark:</td>
57-
</tr>
58-
<tr>
59-
<td><pre>stylegan3-r</pre></td>
60-
<td></td>
61-
<td>:heavy_check_mark:</td>
62-
<td></td>
63-
<td>:heavy_check_mark:</td>
64-
<td>:heavy_check_mark:</td>
65-
</tr>
66-
</table>
31+
* `--resume`: All available pre-trained models from NVIDIA (and more) can be used with a simple dictionary, depending on the `--cfg` used.
32+
For example, if you wish to use StyleGAN3's `config-r`, then set `--cfg=stylegan3-r`. In addition, if you wish to transfer learn from FFHQU at 1024 resolution, set `--resume=ffhqu1024`.
33+
* The full list of currently available models to transfer learn from (or synthesize new images with) is the following (***TODO:*** add small description of each model,
34+
so the user can better know which to use for their particular usecase; proper citation to original authors as well):
35+
36+
<details>
37+
<summary>StyleGAN2 models</summary>
38+
39+
1. Majority, if not all, are `config-f`: set `--cfg=stylegan2`
40+
* `ffhq256`
41+
* `ffhqu256`
42+
* `ffhq512`
43+
* `ffhq1024`
44+
* `ffhqu1024`
45+
* `celebahq256`
46+
* `lsundog256`
47+
* `afhqcat512`
48+
* `afhqdog512`
49+
* `afhqwild512`
50+
* `afhq512`
51+
* `brecahad512`
52+
* `cifar10` (conditional, 10 classes)
53+
* `metfaces1024`
54+
* `metfacesu1024`
55+
* `lsuncar512` (config-f)
56+
* `lsuncat256` (config-f)
57+
* `lsunchurch256` (config-f)
58+
* `lsunhorse256` (config-f)
59+
* `minecraft1024` (thanks to @jeffheaton)
60+
* `imagenet512` (thanks to @shawwn)
61+
* `wikiart1024-C` (conditional, 167 classes; thanks to @pbaylies)
62+
* `wikiart1024-U` (thanks to @pbaylies)
63+
* `maps1024` (thanks to @tjukanov)
64+
* `fursona512` (thanks to @arfafax)
65+
* `mlpony512` (thanks to @arfafax)
66+
* `afhqcat256` (Deceive-D/APA models)
67+
* `anime256` (Deceive-D/APA models)
68+
* `cub256` (Deceive-D/APA models)
69+
* `sddogs1024` (Self-Distilled StyleGAN models)
70+
* `sdelephant512` (Self-Distilled StyleGAN models)
71+
* `sdhorses512` (Self-Distilled StyleGAN models)
72+
* `sdbicycles256` (Self-Distilled StyleGAN models)
73+
* `sdlions512` (Self-Distilled StyleGAN models)
74+
* `sdgiraffes512` (Self-Distilled StyleGAN models)
75+
* `sdparrots512` (Self-Distilled StyleGAN models)
76+
</details>
77+
78+
<details>
79+
<summary>StyleGAN3 models</summary>
80+
81+
1. `config-t`: set `--cfg=stylegan3-t`
82+
* `afhq512`
83+
* `ffhqu256`
84+
* `ffhq1024`
85+
* `ffhqu1024`
86+
* `metfaces1024`
87+
* `metfacesu1024`
88+
* `landscapes256` (thanks to @justinpinkney)
89+
* `wikiart1024` (thanks to @justinpinkney)
90+
* `mechfuture256` (thanks to @edstoica; 29 kimg tick)
91+
* `vivflowers256` (thanks to @edstoica; 68 kimg tick)
92+
* `alienglass256` (thanks to @edstoica; 38 kimg tick)
93+
* `scificity256` (thanks to @edstoica; 210 kimg tick)
94+
* `scifiship256` (thanks to @edstoica; 168 kimg tick)
95+
2. `config-r`: set `--cfg=stylegan3-r`
96+
* `afhq512`
97+
* `ffhq1024`
98+
* `ffhqu1024`
99+
* `ffhqu256`
100+
* `metfaces1024`
101+
* `metfacesu1024`
102+
</details>
103+
104+
* The main sources of these pretrained models are both the [official NVIDIA repository](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/research/models/stylegan3),
105+
as well as other community repositories, such as [Justin Pinkney](https://github.com/justinpinkney) 's [Awesome Pretrained StyleGAN2](https://github.com/justinpinkney/awesome-pretrained-stylegan2)
106+
and [Awesome Pretrained StyleGAN3](https://github.com/justinpinkney/awesome-pretrained-stylegan3), [Deceive-D/APA](https://github.com/EndlessSora/DeceiveD),
107+
[Self-Distilled StyleGAN/Internet Photos](https://github.com/self-distilled-stylegan/self-distilled-internet-photos), and [edstoica](https://github.com/edstoica) 's
108+
[Wombo Dream](https://www.wombo.art/) [-based models](https://github.com/edstoica/lucid_stylegan3_datasets_models). Others can be found around the net and are properly credited in this repository,
109+
so long as they can be easily downloaded with [`dnnlib.util.open_url`](https://github.com/PDillis/stylegan3-fun/blob/4ce9d6f7601641ba1e2906ed97f2739a63fb96e2/dnnlib/util.py#L396).
67110

68111
* Interpolation videos
69112
* [Random interpolation](https://youtu.be/DNfocO1IOUE)
70113
* Style-mixing
71114
* Sightseeding
72115
* [Circular interpolation](https://youtu.be/4nktYGjSVHg)
73116
* [Visual-reactive interpolation](https://youtu.be/KoEAkPnE-zA) (Beta)
117+
* Audiovisual-reactive interpolation (TODO)
74118
* Projection into the latent space
75119
* [Project into W+](https://arxiv.org/abs/1904.03189)
76120
* Additional losses to use for better projection (e.g., using VGG16 or [CLIP](https://github.com/openai/CLIP))
@@ -79,12 +123,16 @@ This repository adds the following (not yet the complete list):
79123
* Start from a random image (`random` or `perlin`, using [Mathieu Duchesneau's implementation](https://github.com/duchesneaumathieu/pyperlin)) or from an existing one
80124
* Expansion on GUI/`visualizer.py`
81125
* Added the rest of the affine transformations
126+
* Added widget for class-conditional models (***TODO:*** mix classes with continuous values for `cls`!)
82127
* General model and code additions
83-
* No longer necessary to specify `--outdir` when running the code, as the output directory will be automatically generated
84-
* [Better sampling?](https://arxiv.org/abs/2110.08009) (TODO)
85-
* StyleGAN3: anchor the latent space for easier to follow interpolations
128+
* ***TODO:*** [Better sampling?](https://arxiv.org/abs/2110.08009)
129+
* [Multi-modal truncation trick](https://arxiv.org/abs/2202.12211): find the different clusters in your model and use the closest one to your dlatent, in order to increase the fidelity (TODO: finish skeleton implementation)
130+
* StyleGAN3: anchor the latent space for easier to follow interpolations (thanks to [Rivers Have Wings](https://github.com/crowsonkb) and [nshepperd](https://github.com/nshepperd)).
131+
* Use CPU instead of GPU if desired (not recommended, but perfectly fine for generating images, whenever the custom CUDA kernels fail to compile).
132+
* Add missing dependencies and channels so that the [`conda`](https://docs.conda.io/en/latest/) environment is correctly setup in Windows
133+
(PR's [#111](https://github.com/NVlabs/stylegan3/pull/111) /[#116](https://github.com/NVlabs/stylegan3/pull/116) /[#125](https://github.com/NVlabs/stylegan3/pull/125) and [#80](https://github.com/NVlabs/stylegan3/pull/80) /[#143](https://github.com/NVlabs/stylegan3/pull/143) from the base, respectively)
86134

87-
***TODO:*** Finish documentation for better user experience, add videos/images, code samples.
135+
***TODO:*** Finish documentation for better user experience, add videos/images, code samples, visuals...
88136

89137
---
90138

@@ -161,7 +209,7 @@ See [Troubleshooting](./docs/troubleshooting.md) for help on common installation
161209

162210
Pre-trained networks are stored as `*.pkl` files that can be referenced using local filenames or URLs:
163211

164-
```.bash
212+
```bash
165213
# Generate an image using pre-trained AFHQv2 model ("Ours" in Figure 1, left).
166214
python gen_images.py --outdir=out --trunc=1 --seeds=2 \
167215
--network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl
@@ -175,7 +223,7 @@ Outputs from the above commands are placed under `out/*.png`, controlled by `--o
175223

176224
**Docker**: You can run the above curated image example using Docker as follows:
177225

178-
```.bash
226+
```bash
179227
# Build the stylegan3:latest image
180228
docker build --tag stylegan3 .
181229

@@ -199,7 +247,7 @@ The `docker run` invocation may look daunting, so let's unpack its contents here
199247

200248
This release contains an interactive model visualization tool that can be used to explore various characteristics of a trained model. To start it, run:
201249

202-
```.bash
250+
```bash
203251
python visualizer.py
204252
```
205253

@@ -209,7 +257,7 @@ python visualizer.py
209257

210258
You can use pre-trained networks in your own Python code as follows:
211259

212-
```.python
260+
```python
213261
with open('ffhq.pkl', 'rb') as f:
214262
G = pickle.load(f)['G_ema'].cuda() # torch.nn.Module
215263
z = torch.randn([1, G.z_dim]).cuda() # latent codes
@@ -223,7 +271,7 @@ The pickle contains three networks. `'G'` and `'D'` are instantaneous snapshots
223271

224272
The generator consists of two submodules, `G.mapping` and `G.synthesis`, that can be executed separately. They also support various additional options:
225273

226-
```.python
274+
```python
227275
w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
228276
img = G.synthesis(w, noise_mode='const', force_fp32=True)
229277
```
@@ -236,7 +284,7 @@ Datasets are stored as uncompressed ZIP archives containing uncompressed PNG fil
236284

237285
**FFHQ**: Download the [Flickr-Faces-HQ dataset](https://github.com/NVlabs/ffhq-dataset) as 1024x1024 images and create a zip archive using `dataset_tool.py`:
238286

239-
```.bash
287+
```bash
240288
# Original 1024x1024 resolution.
241289
python dataset_tool.py --source=/tmp/images1024x1024 --dest=~/datasets/ffhq-1024x1024.zip
242290

@@ -249,21 +297,21 @@ See the [FFHQ README](https://github.com/NVlabs/ffhq-dataset) for information on
249297

250298
**MetFaces**: Download the [MetFaces dataset](https://github.com/NVlabs/metfaces-dataset) and create a ZIP archive:
251299

252-
```.bash
300+
```bash
253301
python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces-1024x1024.zip
254302
```
255303

256304
See the [MetFaces README](https://github.com/NVlabs/metfaces-dataset) for information on how to obtain the unaligned MetFaces dataset images. Use the same steps as above to create a ZIP archive for training and validation.
257305

258306
**AFHQv2**: Download the [AFHQv2 dataset](https://github.com/clovaai/stargan-v2/blob/master/README.md#animal-faces-hq-dataset-afhq) and create a ZIP archive:
259307

260-
```.bash
308+
```bash
261309
python dataset_tool.py --source=~/downloads/afhqv2 --dest=~/datasets/afhqv2-512x512.zip
262310
```
263311

264312
Note that the above command creates a single combined dataset using all images of all three classes (cats, dogs, and wild animals), matching the setup used in the StyleGAN3 paper. Alternatively, you can also create a separate dataset for each class:
265313

266-
```.bash
314+
```bash
267315
python dataset_tool.py --source=~/downloads/afhqv2/train/cat --dest=~/datasets/afhqv2cat-512x512.zip
268316
python dataset_tool.py --source=~/downloads/afhqv2/train/dog --dest=~/datasets/afhqv2dog-512x512.zip
269317
python dataset_tool.py --source=~/downloads/afhqv2/train/wild --dest=~/datasets/afhqv2wild-512x512.zip
@@ -273,7 +321,7 @@ python dataset_tool.py --source=~/downloads/afhqv2/train/wild --dest=~/datasets/
273321

274322
You can train new networks using `train.py`. For example:
275323

276-
```.bash
324+
```bash
277325
# Train StyleGAN3-T for AFHQv2 using 8 GPUs.
278326
python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/afhqv2-512x512.zip \
279327
--gpus=8 --batch=32 --gamma=8.2 --mirror=1
@@ -298,7 +346,7 @@ By default, `train.py` automatically computes FID for each network pickle export
298346

299347
Additional quality metrics can also be computed after the training:
300348

301-
```.bash
349+
```bash
302350
# Previous training run: look up options automatically, save result to JSONL file.
303351
python calc_metrics.py --metrics=eqt50k_int,eqr50k \
304352
--network=~/training-runs/00000-stylegan3-r-mydataset/network-snapshot-000000.pkl
@@ -339,7 +387,7 @@ References:
339387

340388
The easiest way to inspect the spectral properties of a given generator is to use the built-in FFT mode in `visualizer.py`. In addition, you can visualize average 2D power spectra (Appendix A, Figure 15) as follows:
341389

342-
```.bash
390+
```bash
343391
# Calculate dataset mean and std, needed in subsequent steps.
344392
python avg_spectra.py stats --source=~/datasets/ffhq-1024x1024.zip
345393

environment.yml

+4-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ name: stylegan3
22
channels:
33
- pytorch
44
- nvidia
5+
- conda-forge # PR #80 by @SetZero / #143 by @coldwaterq
56
dependencies:
67
- python >= 3.8
78
- pip
@@ -10,7 +11,7 @@ dependencies:
1011
- pillow=8.3.1
1112
- scipy=1.7.1
1213
- pytorch=1.9.1
13-
- cudatoolkit=11.1
14+
- cudatoolkit>=11.1 # PR #116 by @edstoica
1415
- requests=2.26.0
1516
- tqdm=4.62.2
1617
- ninja=1.10.2
@@ -22,3 +23,5 @@ dependencies:
2223
- pyopengl==3.1.5
2324
- imageio-ffmpeg==0.4.3
2425
- pyspng
26+
- psutil # PR #125 by @fastflair / #111 by @siddharthksah
27+
- tensorboard # PR #125 by @fastflair

style_mixing.py

+11-12
Original file line numberDiff line numberDiff line change
@@ -48,23 +48,22 @@ def style_names(max_style: int, file_name: str, desc: str, col_styles: List[int]
4848
to both the file name and the new directory to be created.
4949
"""
5050
if list(range(0, 4)) == col_styles:
51-
file_name = f'{file_name}-coarse_styles'
52-
desc = f'{desc}-coarse_styles'
51+
styles = 'coarse_styles'
5352
elif list(range(4, 8)) == col_styles:
54-
file_name = f'{file_name}-middle_styles'
55-
desc = f'{desc}-middle_styles'
53+
styles = 'middle_styles'
5654
elif list(range(8, max_style)) == col_styles:
57-
file_name = f'{file_name}-fine_styles'
58-
desc = f'{desc}-fine_styles'
55+
styles = 'fine_styles'
5956
elif list(range(0, 8)) == col_styles:
60-
file_name = f'{file_name}-coarse+middle_styles'
61-
desc = f'{desc}-coarse+middle_styles'
57+
styles = 'coarse+middle_styles'
6258
elif list(range(4, max_style)) == col_styles:
63-
file_name = f'{file_name}-middle+fine_styles'
64-
desc = f'{desc}-middle+fine_styles'
59+
styles = 'middle+fine_styles'
6560
elif list(range(0, 4)) + list(range(8, max_style)) == col_styles:
66-
file_name = f'{file_name}-coarse+fine_styles'
67-
desc = f'{desc}-coarse+fine_styles'
61+
styles = 'coarse+fine_styles'
62+
else:
63+
styles = 'custom_styles'
64+
65+
file_name = f'{file_name}-{styles}'
66+
desc = f'{desc}-{styles}'
6867

6968
return file_name, desc
7069

0 commit comments

Comments
 (0)