-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake : fix ARM feature detection #10543
Conversation
I am not sure that I understand what is happening. Why does the ARM feature detection in ggml report that i8mm is supported if it is not? Why does the repacked Q4_0 work with i8mm enabled? |
So far I have 3 datapoints:
I don't understand yet why M2 Ultra does not work. |
I wonder if OS support is also necessary. I get this on M3 Max: $ sysctl -a | grep hw.optional.arm. 12:45
hw.optional.arm.FEAT_FlagM: 1
hw.optional.arm.FEAT_FlagM2: 1
hw.optional.arm.FEAT_FHM: 1
hw.optional.arm.FEAT_DotProd: 1
hw.optional.arm.FEAT_SHA3: 1
hw.optional.arm.FEAT_RDM: 1
hw.optional.arm.FEAT_LSE: 1
hw.optional.arm.FEAT_SHA256: 1
hw.optional.arm.FEAT_SHA512: 1
hw.optional.arm.FEAT_SHA1: 1
hw.optional.arm.FEAT_AES: 1
hw.optional.arm.FEAT_PMULL: 1
hw.optional.arm.FEAT_SPECRES: 0
hw.optional.arm.FEAT_SB: 1
hw.optional.arm.FEAT_FRINTTS: 1
hw.optional.arm.FEAT_LRCPC: 1
hw.optional.arm.FEAT_LRCPC2: 1
hw.optional.arm.FEAT_FCMA: 1
hw.optional.arm.FEAT_JSCVT: 1
hw.optional.arm.FEAT_PAuth: 1
hw.optional.arm.FEAT_PAuth2: 1
hw.optional.arm.FEAT_FPAC: 1
hw.optional.arm.FEAT_DPB: 1
hw.optional.arm.FEAT_DPB2: 1
hw.optional.arm.FEAT_BF16: 1
>>hw.optional.arm.FEAT_I8MM: 1
hw.optional.arm.FEAT_WFxT: 0
hw.optional.arm.FEAT_RPRES: 1
hw.optional.arm.FEAT_ECV: 1
hw.optional.arm.FEAT_AFP: 1
hw.optional.arm.FEAT_LSE2: 1
hw.optional.arm.FEAT_CSV2: 1
hw.optional.arm.FEAT_CSV3: 1
hw.optional.arm.FEAT_DIT: 1
hw.optional.arm.FEAT_FP16: 1
hw.optional.arm.FEAT_SSBS: 1
hw.optional.arm.FEAT_BTI: 1
hw.optional.arm.FEAT_SME: 0
hw.optional.arm.FEAT_SME2: 0
hw.optional.arm.SME_F32F32: 0
hw.optional.arm.SME_BI32I32: 0
hw.optional.arm.SME_B16F32: 0
hw.optional.arm.SME_F16F32: 0
hw.optional.arm.SME_I8I32: 0
hw.optional.arm.SME_I16I32: 0
hw.optional.arm.FEAT_SME_F64F64: 0
hw.optional.arm.FEAT_SME_I16I64: 0
hw.optional.arm.FP_SyncExceptions: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.armv8_3_compnum: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_gpi: 1
hw.optional.arm64: 1 |
M2 Ultra also reports 28c28
< hw.optional.arm.FEAT_RPRES: 1
---
> hw.optional.arm.FEAT_RPRES: 0
30c30
< hw.optional.arm.FEAT_AFP: 1
---
> hw.optional.arm.FEAT_AFP: 0
48a49
> hw.optional.arm.caps: 868632327146696703
|
Btw, it's weird because: # this passes all tests on M2 Ultra
make -j && ./bin/test-backend-ops -o MUL_MAT -b Metal
# this fails the 2 Q4_0 tests as shown earlier
make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU But both commands would run the CPU backend, correct? |
Correction, the rm -rf build-mm
mkdir build-mm
cd build-mm
cmake ..
make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU |
It also fails for me. The only way I can see this happening is if the second matrix multiplication on the CPU produces different results, which I don't understand how it could be happening.
|
Almost positive that it is related to the compile flags somehow because it does not fail before 25669aa. Also, using the cd llama.cpp
make -j tests && ./tests/test-backend-ops -o MUL_MAT -b CPU And the only difference is that with the make, we don't pass Edit: Ah, but he |
In case it isn't clear, a sentinel mismatch means that there is a buffer overflow. I tried allocating a different buffer per tensor to see where it happens, and I got this:
To reproduce: diff --git a/tests/test-backend-ops.cpp b/tests/test-backend-ops.cpp
index da66ed85..31fe4c33 100644
--- a/tests/test-backend-ops.cpp
+++ b/tests/test-backend-ops.cpp
@@ -473,6 +473,10 @@ struct test_case {
return false;
}
+ for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
+ t->data = malloc(ggml_nbytes(t));
+ }
+
// build graph
ggml_build_forward_expand(gf, out); |
|
5d7868c
to
0adfd0f
Compare
The changes look good to me. I'll approve if requested or once more feedback is gathered. |
cont #10487
Fix MSVC I8MM feature detection + add logs.