关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

bobo-wmdigit · 2024-08-27T09:28:32Z

看了下论文的方向，挺棒的，但是整个设计对实际想进一步研究的人非常不友好，因为想用这个框架的，都是希望移植到边缘端，可是核心代码，用的又是cuda实现的，移植起来非常麻烦，还要自己手动对齐，好像除了1代都是这么干的？
我也去测试了demo，感觉对终止符的推荐也不是很好，建议这么好的理论框架，最好能够设计的更方便大家去实验，才有机会被真正落地用起来。
仅供参考。

BlinkDL · 2024-08-29T04:39:14Z

谢谢关注，推理不需要cuda（虽然有cuda会prefill更快）： https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v6_demo.py

以及这是聊天demo（用\n\n作为终止符，因为我会将用户输入内容中的\n\n全部替换为\n）
https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py

BlinkDL · 2024-09-05T08:26:27Z

另外请看 https://github.com/TorchRWKV/rwkv-kit

ustbzgn · 2024-09-20T13:43:15Z

联名上书，希望官方大佬复现Pytorch版本，https://github.com/TorchRWKV/rwkv-kit这个仓库复现的需要加载预训练模型的权重，希望官方复现一个能从头训练的pytorch版本，这样才能有正真和transformer对抗的基础生态

uniartisan · 2024-10-18T04:19:37Z

联名上书，希望官方大佬复现Pytorch版本，https://github.com/TorchRWKV/rwkv-kit这个仓库复现的需要加载预训练模型的权重，希望官方复现一个能从头训练的pytorch版本，这样才能有正真和transformer对抗的基础生态

hi, This repo is currently support by me, and I'm currently working with RWKV team. Therefore you could treat it as an official version.

The whole model is still in Pytorch, except for wkv kernel. If you take consider of transformer, attention kernel is wrote by cuda/c in Pytorch or in triton. It's same for both RWKV and Transformer, because if we use native torch to achieve the same computation, it will be really slow, about 50x timers more. Because if you look into torch's eager mode, it will launch about 10000 more small kernels when a 4096 prefill are being made. It's necessary to write a fused function in CUDA or Triton.

And you can move forward rwkv-fla for more details. Thank you!

By the way, rwkv-kit can initialize rwkv 0x60 from scratch.
https://github.com/TorchRWKV/rwkv-kit/blob/dev/rwkvkit/utils/rwkv6.py#L543

uniartisan · 2024-10-18T04:20:33Z

看了下论文的方向，挺棒的，但是整个设计对实际想进一步研究的人非常不友好，因为想用这个框架的，都是希望移植到边缘端，可是核心代码，用的又是cuda实现的，移植起来非常麻烦，还要自己手动对齐，好像除了1代都是这么干的？我也去测试了demo，感觉对终止符的推荐也不是很好，建议这么好的理论框架，最好能够设计的更方便大家去实验，才有机会被真正落地用起来。仅供参考。

You can take consider of rwkv.cpp/llama.cpp, we also provide onnx and pure torch code. https://github.com/TorchRWKV/flash-linear-attention/blob/main/fla/ops/rwkv6/recurrent_naive.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

bobo-wmdigit commented Aug 27, 2024

BlinkDL commented Aug 29, 2024

BlinkDL commented Sep 5, 2024

ustbzgn commented Sep 20, 2024

uniartisan commented Oct 18, 2024 •

edited

Loading

uniartisan commented Oct 18, 2024

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

Comments

bobo-wmdigit commented Aug 27, 2024

BlinkDL commented Aug 29, 2024

BlinkDL commented Sep 5, 2024

ustbzgn commented Sep 20, 2024

uniartisan commented Oct 18, 2024 • edited Loading

uniartisan commented Oct 18, 2024

uniartisan commented Oct 18, 2024 •

edited

Loading