BigGAN Audio Visualizer

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualizer.py [-h] -s SONG [-r {128,256,512}] [-d DURATION]
                     [-ps [200-295]] [-ts [0.05-0.8]]
                     [-c CLASSES [CLASSES ...]] [-n NUM_CLASSES] [-j [0-1]]
                     [-fl i*2^6] [-t [0.1-1]] [-sf [10-30]] [-bs BATCH_SIZE]
                     [-o OUTPUT_FILE] [--use_last_vectors]
                     [--use_last_classes] [--sort_pitch] [-l LYRICS]
                     [-e {sbert,doc2vec}] [-es {best,random,ransac}]

In order to speed up runtime, code can be run on Google Colab GPUs (or other cloud notebook providers) using biggan_music_visualizer.ipynb (hosted here).
The [-n NUM_CLASSES] parameter selects the number of classes to interpolate between.
Default behavior is to select [-n NUM_CLASSES] random classes. The [-c CLASSES [CLASSES ...]] parameter can be used to select specific ImageNet classes. A full list can be found here, and a list categorized by coarse descriptors here. Be sure to use the int ids and not the string labels, and set [-n NUM_CLASSES] to the number of chosen classes.
Use the [--sort_by_power] flag to map classes to the [-n NUM_CLASSES] highest power pitches. By default, classes are mapped to a chromatic scale.
The [-d DURATION] parameter can be useful to generate short videos while tweaking other parameters. Once the desired parameters are set, use the [--use_last_vector] flag and remove the [-d DURATION] parameter to generate the same video at full length.
Reducing the output resolution with [-r {128,256,512}] and/or increasing the frame length with [-fl i*2^6] can help reduce the runtime.
To compute classes through semantic similarity to a lyrics file, use the [-l LYRICS] parameter. The embedding technique and strategy for choosing classes can be set with [-e {sbert,doc2vec}] and [-es {best,random,ransac}] respectively.
Pitch and tempo sensitivity can be set with [-ps [200-295]] and [-ts [0.05-0.8]] respectively. Jitter, truncation and smooth factor can be set with [-j [0-1]], [-t [0.1-1]] and [-sf [10-30]] respectively.
See the help column of the arguments section for details on all parameters.

Arguments

short	long	default	range	help
`-h`	`--help`			show this help message and exit
`-s`	`--song`			path to input audio file `[REQUIRED]`
`-r`	`--resolution`	`512`	`{128,256,512}`	output video resolution
`-d`	`--duration`	`None`	`int`	output video duration
`-ps`	`--pitch_sensitivity`	`220`	`[200-295]`	controls the sensitivity of the class vector to changes in pitch
`-ts`	`--tempo_sensitivity`	`0.25`	`[0.05-0.8]`	controls the sensitivity of the noise vector to changes in volume and tempo
`-c`	`--classes`	`None`		manually specify `[--num_classes]` ImageNet classes
`-n`	`--num_classes`	`12`	`[1-12]`	number of unique classes to use
`-j`	`--jitter`	`0.5`	`[0-1]`	controls jitter of the noise vector to reduce repitition
`-fl`	`--frame_length`	`512`	`i*2^6`	number of audio frames to video frames in the output
`-t`	`--truncation`	`1`	`[0.1-1]`	BigGAN truncation parameter controls complexity of structure within frames
`-sf`	`--smooth_factor`	`20`	`[10-30]`	controls interpolation between class vectors to smooth rapid flucations
`-bs`	`--batch_size`	`20`	`int`	BigGAN batch_size
`-o`	`--output_file`			name of output file stored in `output/`, defaults to `[--song]` path base_name
	`--use_last_vectors`	`False`	`bool`	set flag to use previous saved class/noise vectors
	`--use_last_classes`	`False`	`bool`	set flag to use previous classes
	`--sort_pitches`	`False`	`bool`	set flag to sort pitches by the ordering of classes
`-l`	`--lyrics`	`None`		path to lyrics file; setting `[--lyrics LYRICS]` computes classes by semantic similarity under BERT encodings
`-e`	`--encoding`	`sbert`	`{sbert,doc2vec}`	controls choice of sentence embeddings technique
`-es`	`--encoding_strategy`	`None`	`{random,best,ransac}`	controls strategy for choosing classes: `[-e sbert]` can use `best` or `random` while `[-e doc2vec]` can use `ransac`

Acknowledgments

Thanks to Matt Siegelman for providing the inspiration as well as a boilerplate for the project.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
output		output
saved_vectors		saved_vectors
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
biggan_music_visualizer.ipynb		biggan_music_visualizer.ipynb
encoding.py		encoding.py
imagenet-simple-labels.json		imagenet-simple-labels.json
index.md		index.md
requirements.txt		requirements.txt
utils.py		utils.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigGAN Audio Visualizer

Description

Usage:

Arguments

Acknowledgments

References

About

Releases

Packages

Languages

rushk014/biggan-visualizer

Folders and files

Latest commit

History

Repository files navigation

BigGAN Audio Visualizer

Description

Usage:

Arguments

Acknowledgments

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages