-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Way to relax constrain for prefix length #2
Comments
@inspirit but you need at least one token that outputs a logit |
yeah, or maybe the other way is to randomly curtail the prefix during training, in which case it will generalize on being conditioned from 0 prefix length to the maximum |
feel like the paper should have addressed this, especially if book-level autoregressive generation is the goal here |
i think having prefix and query dynamically sized is the best for robustness as well as inference usage |
@inspirit yeah, maybe i'll just have to push this responsibility to the dataloading |
@lucidrains I searched in the paper I don't see a prefix length mentioned. I am confused about this prefix length issue, wouldn't you want the prefix length to be the full size of the context window? ( I guess you guys meant the length of the latents ) |
I think we can get away with having learnable null-latents in case we dont have prefix initially
The text was updated successfully, but these errors were encountered: