如何使用state tuning rwkv6-7B? #246

xinyinan9527 · 2024-05-23T06:39:38Z

我按照官网尝试,
应该是只训练time_state,然而报错

RuntimeError: element o of tensors does not require grad and does not have a grad_fn

JL-er · 2024-07-06T03:58:31Z

请问你是直接使用的RWKV-LM项目，还是自己修改的？如果是自己修改的项目，在冻结梯度是deepspeed的checkpoint会报错，你需要使用torch.checkpoint 详细可以参考RWKV-PEFT

shouldsee · 2024-07-30T14:16:38Z

@JL-er 谢谢，请问为啥是只tune time_state 64*64的矩阵呀，另外的两组state为啥不一起微调呢？

            state[i*3+0] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()
            state[i*3+1] = state_xueshan_raw[f'blocks.{i}.att.time_state'].transpose(1,2).to(dtype=torch.float, device=dev).requires_grad_(False).contiguous()
            state[i*3+2] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()

JL-er · 2024-08-01T15:05:23Z

@JL-er 谢谢，请问为啥是只tune time_state 64*64的矩阵呀，另外的两组state为啥不一起微调呢？

            state[i*3+0] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()
            state[i*3+1] = state_xueshan_raw[f'blocks.{i}.att.time_state'].transpose(1,2).to(dtype=torch.float, device=dev).requires_grad_(False).contiguous()
            state[i*3+2] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()

这两个参数非常小影响不大，所以只取核心部分的state，为了简单方便

shouldsee · 2024-08-02T05:16:47Z

好的了解了，谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何使用state tuning rwkv6-7B? #246

如何使用state tuning rwkv6-7B? #246

xinyinan9527 commented May 23, 2024

JL-er commented Jul 6, 2024

shouldsee commented Jul 30, 2024

JL-er commented Aug 1, 2024

shouldsee commented Aug 2, 2024

如何使用state tuning rwkv6-7B? #246

如何使用state tuning rwkv6-7B? #246

Comments

xinyinan9527 commented May 23, 2024

JL-er commented Jul 6, 2024

shouldsee commented Jul 30, 2024

JL-er commented Aug 1, 2024

shouldsee commented Aug 2, 2024