Is the 1-D depthwise conv still critical for RWKV? #4

Chong-Chen-UNLV · 2022-10-21T22:51:43Z

Seems the RWKV-v4 didn't use the 1-d depthwise kernel you developed before. Is the 1-D depthwise CUDA kernel in this repo still a critical operator for RWKV?

I just want to check if I intend to contribute to this project, which CUDA kernel should I work on? Should I work on the codes in the 1-D depthwise folder or the codes in the WKV folders of this repo?

BlinkDL · 2022-10-22T01:53:38Z

The latest RWKV-4 is only using the WKV kernel :)

Chong-Chen-UNLV · 2022-10-31T18:05:51Z

The latest RWKV-4 is only using the WKV kernel :)

I saw exponent function is used in the WKV kernel. The operator will cause many Flops, which means the WKV kernel has a big Flop/Byte value. I believe there is no need to optimize the efficiency of the WKV kernel because it can efficiently utilize the 100% float computing capability of GPU. Please let me know if I am wrong and if your test result indicates that the WKV kernel can't make use of the full power of the GPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the 1-D depthwise conv still critical for RWKV? #4

Is the 1-D depthwise conv still critical for RWKV? #4

Chong-Chen-UNLV commented Oct 21, 2022

BlinkDL commented Oct 22, 2022

Chong-Chen-UNLV commented Oct 31, 2022

Is the 1-D depthwise conv still critical for RWKV? #4

Is the 1-D depthwise conv still critical for RWKV? #4

Comments

Chong-Chen-UNLV commented Oct 21, 2022

BlinkDL commented Oct 22, 2022

Chong-Chen-UNLV commented Oct 31, 2022