Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to relax constrain for prefix length #2

Open
inspirit opened this issue Jun 22, 2022 · 6 comments
Open

Way to relax constrain for prefix length #2

inspirit opened this issue Jun 22, 2022 · 6 comments

Comments

@inspirit
Copy link

I think we can get away with having learnable null-latents in case we dont have prefix initially

@lucidrains
Copy link
Owner

@inspirit but you need at least one token that outputs a logit

@lucidrains
Copy link
Owner

yeah, or maybe the other way is to randomly curtail the prefix during training, in which case it will generalize on being conditioned from 0 prefix length to the maximum

@lucidrains
Copy link
Owner

feel like the paper should have addressed this, especially if book-level autoregressive generation is the goal here

@inspirit
Copy link
Author

inspirit commented Jun 22, 2022

i think having prefix and query dynamically sized is the best for robustness as well as inference usage

@lucidrains
Copy link
Owner

@inspirit yeah, maybe i'll just have to push this responsibility to the dataloading

@ArEnSc
Copy link

ArEnSc commented Aug 11, 2022

@lucidrains I searched in the paper I don't see a prefix length mentioned. I am confused about this prefix length issue, wouldn't you want the prefix length to be the full size of the context window? ( I guess you guys meant the length of the latents )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants