Last layer of the generator in the CNN (size 16384) #103

spagliarini · 2021-03-16T15:58:58Z

Hi!

I would like to clean up a doubt I have.
Is the fact that the last layer of the generator has dimension 16384 related to CNN constraints or is it important that the dimension is a bit more than one would like to obtain? Here, 1 s.

Thank you in advance!!!

chrisdonahue · 2021-03-17T16:11:13Z

Ah, good question. The last layer of the generator has dimension 16384 simply because it is a power of four; each of the five layers of the generator increases the number of timesteps by a factor of four, starting from 16 (arbitrary choice).

This output length could represent any amount of time depending on sampling rate, but 16kHz is a common sampling rate in speech processing and conveniently works out to around one second of generated audio (our goal), so we went with that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Last layer of the generator in the CNN (size 16384) #103

Last layer of the generator in the CNN (size 16384) #103

spagliarini commented Mar 16, 2021

chrisdonahue commented Mar 17, 2021

Last layer of the generator in the CNN (size 16384) #103

Last layer of the generator in the CNN (size 16384) #103

Comments

spagliarini commented Mar 16, 2021

chrisdonahue commented Mar 17, 2021