Does Bitnet support ROCm testing？ #281

ym-guan · 2025-01-14T06:52:10Z

Hi Lei, I'm trying to reproduce bitnet result with Mi308, But I meet some errors.

I use main branch to run and error like this

I'm trying to modify the code to support ROCm like this:

unfortunately I failed, it told me

I guess there is no scheduler func for Mi308 in backend tl.

So, I checkout branch "amd_hip", run , and I found error

In this branch, which use backend tir, there is a lot differences between branch "amd_hip" and "main". I have two Question: Does backend tl support ROCm? Does Bitnet support testing in ROCm?

Please tell me how to test bitnet in ROCm, thx!

My system

ubuntu==20.04
rocm==6.3.1
torch==2.5.1+rocm6.2
GPU==Mi308

LeiWang1999 · 2025-01-14T07:54:34Z

@ym-guan Thank you for reporting this. We’ve recently migrated the backend from tensorir to tilelang. While we do plan to support ROCm and CDNA with matrix code and already have implementations, it isn’t migrated yet. I’ll work on adding this as soon as possible, though I am currently juggling two upcoming deadlines, I'll take a look on this next week.

LeiWang1999 · 2025-01-14T07:58:00Z

@ym-guan Btw, I noticed that you’ve made some great modifications to support ROCm—awesome work! It looks like the bug might be in this section, where the kernel builder is dispatched to CUDA instead of HIP in certain places.

Feel free to ask any questions :)

ym-guan · 2025-01-14T11:45:25Z

Thank you @LeiWang1999 ! I follow your instruction and modified the code more. Actually, I support the matrix code https://github.com/microsoft/BitBLAS/blob/main/testing/python/amd/test_backend_hip_wrapper_matmul.py#L54, However, it doesn't support "matmul_backend_code_wrap(1, 256, 256, "float16", "float16", "uint4", "float16", False)"

This is because the scheduler_ir_module cannot be tl.lower() correctly https://github.com/microsoft/BitBLAS/blob/main/bitblas/ops/operator.py#L195, it told me

I guess the MatmulDequantizeScheduler cannot be transfer into hip correctly. If it can be, please correct me.

In a word, when I test matmul with config like int8xint2, I got the error

The error message: '/tl/tl_templates/cuda/common.h:3:10: fatal error: 'cuda_runtime.h' file not found

But it should be compile through hip, how?

Thanks for your reply！

LeiWang1999 · 2025-01-15T06:33:04Z

pr #282 made a simple fix, but only support target inference and consistent matmul (A and W are in the same datatypes), with dequantize may take few days to implement.

LeiWang1999 self-assigned this Jan 14, 2025

LeiWang1999 added the enhancement New feature or request label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Bitnet support ROCm testing？ #281

Does Bitnet support ROCm testing？ #281

ym-guan commented Jan 14, 2025

LeiWang1999 commented Jan 14, 2025

LeiWang1999 commented Jan 14, 2025 •

edited

Loading

ym-guan commented Jan 14, 2025

LeiWang1999 commented Jan 15, 2025

Does Bitnet support ROCm testing？ #281

Does Bitnet support ROCm testing？ #281

Comments

ym-guan commented Jan 14, 2025

LeiWang1999 commented Jan 14, 2025

LeiWang1999 commented Jan 14, 2025 • edited Loading

ym-guan commented Jan 14, 2025

LeiWang1999 commented Jan 15, 2025

LeiWang1999 commented Jan 14, 2025 •

edited

Loading