-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About music generation with perceiver-ar model #3
Comments
🎶🤖😄 |
@feizc how are you approaching the problem of generating starting from a length that is less than the prefix? |
Actually, I use a fixed length of conditional context, i.e., prefix length of prior music, to continue writing the next melody. In my opinion, to start from zero, we can use special token like [pad] to supplement the prefix length, or only use decoder to generate an initial sentence then generate conditioned on latents. I read the source code and find the author begin with zero :)
|
After reviewing the current implementation (autoregressive_wrapper) it seems you generate each subsequent token one at a time as would be the case in most architectures. The authors of the perceiver-ar paper outlined a strided approach (typically the size of the self-attention sequence length) where the sampled tokens would be cached up to a certain size and then the buffer would be freed. Have you considered implementing this? The actual released implementation perceiver-ar is relatively easy to follow. |
noo not yet, i haven't implemented their special caching strategy at inference but if i keep hearing more positive results, i may implement it! have to admit i was doubtful about the architecture initially |
I’m curious to see how well this would work at inference, particularly when using a vqvae / vqgan to encode images. If you could decode in only several steps that would really speed up generation. I suspect quality would suffer, but the paper’s results seem promising w.r.t. to the ImageNet results. |
Hi, @lucidrains
Thanks for the implementation of Perceiver-AR model.
We conduct the experiments on pop music generation at: https://github.com/feizc/Perceiver-Music-Generation.
The results are encouraging, be grateful to you : )
The text was updated successfully, but these errors were encountered: