Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1? #243

Open
zdxdsw opened this issue May 8, 2024 · 1 comment

Comments

@zdxdsw
Copy link

zdxdsw commented May 8, 2024

Can you intuitively explain what ratio_0_to_1 is doing in RWKV_Tmix_x060?
https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290

I find that ratio_0_to_1 is defined by: ratio_0_to_1 = layer_id / (args.n_layer - 1)
Then it defines multiple things for time_mix and time_decay.

However, my issue is I want to set args.n_layer = 1 , which would lead to the zero-division error.
Does it make sense to hardcode ratio_0_to_1 = 0 when args.n_layer = l?

@BlinkDL
Copy link
Owner

BlinkDL commented Jul 5, 2024

you can hardcode ratio_0_to_1 to 0.5 in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants