Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

Conversation

chraac
Copy link

@chraac chraac commented Jun 17, 2024

  • Self Reported Review Complexity:
    • Review Complexity : Low
    • Review Complexity : Medium
    • Review Complexity : High
  • I have read the contributing guidelines

As I said in your upstream PR, better to have a function for wrapping ggml_tensor into Qnn_Tensor_t.
So here i create a PR for it.

Run test on cpu backend, works well
5338c775ff17bc845aca02c6380446e

Run on npu backend, also works well:
1648ce8fa1b30cb385978edd4840dbd

@chraac chraac force-pushed the dev-function-to-map-tensor branch from 4d70039 to 65a14d9 Compare June 18, 2024 15:09
@chraac chraac requested a review from zhouwg June 19, 2024 02:48
@chraac chraac force-pushed the dev-function-to-map-tensor branch from 7a77028 to dfe159f Compare June 19, 2024 03:16
@myan-o
Copy link

myan-o commented Aug 18, 2024

@chraac

Thank you for the development.
i used dev-refactoring branch.
but Creating a tensor for a matmul node fails with an error and doesn't work.

Device:Snapdragon8 Gen3 16GB

llama-server -m models/Kitsunebi-v1-Gemma2-8k-9B.Q4_K_M.gguf -ngl 40

...
[ggml_qnn_graph, 27]: graph name MUL_MAT_3584x2048
x1x1_3584x2x1x1_2048x2x1x1
[ggml_qnn_graph, 75]: can't create qnn graph handl
e with graph name MUL_MAT_3584x2048x1x1_3584x2x1x1
_2048x2x1x1, error = 6003
diff --git a/ggml/src/ggml-backend.c b/ggml/src/gg
ml-backend.c
index a8eafac4..e2b421e2 100644
--- a/ggml/src/ggml-backend.c
+++ b/ggml/src/ggml-backend.c
@@ -287,6 +287,7 @@ bool ggml_backend_supports_op(
ggml_backend_t backend, const struct ggml_tensor *
 }

 bool ggml_backend_supports_buft(ggml_backend_t backend, ggml_backend_buffer_type_t buft) {
+    if (NULL == backend->iface.supports_buft) return true;
  ┆ ┆return backend->iface.supports_buft(backend, buft);
 }

@chraac
Copy link
Author

chraac commented Aug 18, 2024

@chraac

Thank you for the development. i used dev-refactoring branch. but Creating a tensor for a matmul node fails with an error and doesn't work.

Device:Snapdragon8 Gen3 16GB

llama-server -m models/Kitsunebi-v1-Gemma2-8k-9B.Q4_K_M.gguf -ngl 40

...
[ggml_qnn_graph, 27]: graph name MUL_MAT_3584x2048
x1x1_3584x2x1x1_2048x2x1x1
[ggml_qnn_graph, 75]: can't create qnn graph handl
e with graph name MUL_MAT_3584x2048x1x1_3584x2x1x1
_2048x2x1x1, error = 6003
diff --git a/ggml/src/ggml-backend.c b/ggml/src/gg
ml-backend.c
index a8eafac4..e2b421e2 100644
--- a/ggml/src/ggml-backend.c
+++ b/ggml/src/ggml-backend.c
@@ -287,6 +287,7 @@ bool ggml_backend_supports_op(
ggml_backend_t backend, const struct ggml_tensor *
 }

 bool ggml_backend_supports_buft(ggml_backend_t backend, ggml_backend_buffer_type_t buft) {
+    if (NULL == backend->iface.supports_buft) return true;
  ┆ ┆return backend->iface.supports_buft(backend, buft);
 }

Hi @myan-o , thanks your for the feedback, as i said before, in ggml, the input tensor of matmul operator need to be transposed, to achieve that i've a lot more refactoring work to do, so now the mulmat operator is still under constructiion, for more information, could have a look here: chraac@63dc587

@myan-o
Copy link

myan-o commented Aug 18, 2024

@chraac

Thank you for your answer. So does that mean that matmul operations are not implemented yet?

I have also made a pull request for some minor fixes, so please take a look.

@myan-o
Copy link

myan-o commented Aug 18, 2024

@chraac

The termux development environment lacks the c++ std library and the build fails.

  • std::aligned_alloc
  • atomic_is_lock_free
[  5%] Building CXX object ggml/src/CMakeFiles/ggm
l.dir/ggml-qnn/utils.cpp.o
/data/data/com.termux/files/home/git/llama.cpp/ggm
l/src/ggml-qnn/utils.cpp:124:23: error: reference
to unresolved using declaration
  124 |     void *data = std::aligned_alloc(alignm
ent, size_aligned);

@FranzKafkaYu
Copy link

@chraac 抱歉打扰开发者,当前我也在高通骁龙Gen2的设备上基于llama.cpp部署LLM,使用的模型为Qwen2 0.5B,我已经注意到你的仓库中存在多个branch,请问如果我需要测试的话应该使用哪个分支呢?

@chraac
Copy link
Author

chraac commented Aug 28, 2024

@chraac 抱歉打扰开发者,当前我也在高通骁龙Gen2的设备上基于llama.cpp部署LLM,使用的模型为Qwen2 0.5B,我已经注意到你的仓库中存在多个branch,请问如果我需要测试的话应该使用哪个分支呢?

Hi @FranzKafkaYu , 可以用 dev-refactoring 这个分支哈,但是就像我前面说的,现在mulmat跑还有问题,还在改中哈

@FranzKafkaYu
Copy link

@chraac 抱歉打扰开发者,当前我也在高通骁龙Gen2的设备上基于llama.cpp部署LLM,使用的模型为Qwen2 0.5B,我已经注意到你的仓库中存在多个branch,请问如果我需要测试的话应该使用哪个分支呢?

Hi @FranzKafkaYu , 可以用 dev-refactoring 这个分支哈,但是就像我前面说的,现在mulmat跑还有问题,还在改中哈

今天有空测试了一下该分支,无法正常编译通过,编译command:

 mkdir build-android && cd build-android && cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=/home/franzkafka/Desktop/qnn/qairt/2.22.6.240515  .. && make -j4  

相关报错日志:

-- Using latest available ANDROID_PLATFORM: 33.
-- The C compiler identification is Clang 14.0.7
-- The CXX compiler identification is Clang 14.0.7
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Found OpenMP_C: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- OpenMP found
-- Using llamafile
QNN_SDK_PATH: /home/franzkafka/Desktop/qnn/qairt/2.22.6.240515
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /home/franzkafka/Desktop/llama/llama.cpp/build-android
[  0%] Generating build details from Git
[  1%] Building C object examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[  2%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  3%] Building C object examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
-- Found Git: /usr/bin/git (found version "2.34.1") 
[  4%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:2065:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
    GGML_F16_VEC_REDUCE(sumf, sum);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1166:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1156:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1086:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1071:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
[  4%] Built target build_info
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:2113:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
        GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1166:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1156:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1086:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1071:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
[  4%] Building C object examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[  4%] Built target sha256
[  4%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[  4%] Built target sha1
[  5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[  6%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
[  6%] Building CXX object ggml/src/CMakeFiles/ggml.dir/llamafile/sgemm.cpp.o
[  6%] Built target xxhash
[  7%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-qnn/backend-ops.cpp.o
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:11:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:13:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/op-config.hpp:9:
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:254:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:361:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:412:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:11:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:13:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/op-config.hpp:11:
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/tensor.hpp:193:41: error: implicit instantiation of undefined template 'std::array<unsigned int, 4>'
    std::array<uint32_t, GGML_MAX_DIMS> _dimensions = {};
                                        ^
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__tuple:219:64: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TEMPLATE_VIS array;
                                                               ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:246:1: error: static_assert failed due to requirement 'sizeof (kGgmlOpToQnnOp) / sizeof (kGgmlOpToQnnOp[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT)' "GGML_OP_COUNT does not match the size of the kGgmlOpToQnnOp table"
static_assert(sizeof(kGgmlOpToQnnOp) / sizeof(kGgmlOpToQnnOp[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:248:1: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[GGML_UNARY_OP_GELU + kGgmlUnaryOpStart] != nullptr' "GGML_UNARY_OP_GELU does not correspond to QNN_OP_GELU"
static_assert(kGgmlOpToQnnOp[GGML_UNARY_OP_GELU + kGgmlUnaryOpStart] != nullptr,
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:294:23: error: no matching function for call to 'get_qnn_graph_from_cache'
    auto *graph_ptr = get_qnn_graph_from_cache<2, 1>(ctx, _GgmlOp, { src0, src1 }, { dst });
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:252:22: note: candidate function template not viable: cannot convert initializer list argument to 'const std::array<ggml_tensor *, 2UL>'
qnn::ggml_qnn_graph *get_qnn_graph_from_cache(ggml_backend_qnn_context *ctx, size_t op,
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:317:23: error: no matching function for call to 'get_qnn_graph_from_cache'
    auto *graph_ptr = get_qnn_graph_from_cache<1, 1>(ctx, _GgmlOp, { src }, { dst });
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:252:22: note: candidate function template not viable: cannot convert initializer list argument to 'const std::array<ggml_tensor *, 1UL>'
qnn::ggml_qnn_graph *get_qnn_graph_from_cache(ggml_backend_qnn_context *ctx, size_t op,
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:431:1: error: static_assert failed due to requirement 'sizeof (kQnnUnaryOpsTable) / sizeof (kQnnUnaryOpsTable[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT)' "GGML_OP_COUNT does not match the size of the kQnnUnaryOpsTable table"
static_assert(sizeof(kQnnUnaryOpsTable) / sizeof(kQnnUnaryOpsTable[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:519:1: error: static_assert failed due to requirement 'sizeof (kQnnBinaryOpsTable) / sizeof (kQnnBinaryOpsTable[0]) == GGML_OP_COUNT' "GGML_OP_COUNT does not match the size of the kQnnBinaryOpsTable table"
static_assert(sizeof(kQnnBinaryOpsTable) / sizeof(kQnnBinaryOpsTable[0]) == GGML_OP_COUNT,
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:4:
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:3927:24: error: incompatible pointer types assigning to 'std::__shared_weak_count *' from 'std::__shared_ptr_emplace<qnn::ggml_qnn_tensor, std::allocator<qnn::ggml_qnn_tensor>> *'
        __r.__cntrl_ = __cntrl;
                       ^~~~~~~
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:4444:29: note: in instantiation of function template specialization 'std::shared_ptr<qnn::ggml_qnn_tensor>::__create_with_control_block<qnn::ggml_qnn_tensor, std::__shared_ptr_emplace<qnn::ggml_qnn_tensor, std::allocator<qnn::ggml_qnn_tensor>>>' requested here
    return shared_ptr<_Tp>::__create_with_control_block(__ptr, __hold2.release());
                            ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:108:22: note: in instantiation of function template specialization 'std::make_shared<qnn::ggml_qnn_tensor, std::basic_string<char>, const QNNBackend &, void *&, std::shared_ptr<qnn::qnn_instance> &>' requested here
                std::make_shared<ggml_qnn_tensor>(std::string(buffer), _device, _graph_handle, _qnn_instance);
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:312:5: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[86UL] != nullptr' "GGML_OP does not have a corresponding QNN_OP"
    static_assert(kGgmlOpToQnnOp[_GgmlOp] != nullptr, "GGML_OP does not have a corresponding QNN_OP");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:424:5: note: in instantiation of function template specialization '(anonymous namespace)::qnn_unary_op_impl<86UL>' requested here
    qnn_unary_op_impl<GGML_UNARY_OP_GELU + kGgmlUnaryOpStart>, // GGML_UNARY_OP_GELU
    ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:289:5: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[(ggml_op)25U] != nullptr' "GGML_OP does not have a corresponding QNN_OP"
    static_assert(kGgmlOpToQnnOp[_GgmlOp] != nullptr, "GGML_OP does not have a corresponding QNN_OP");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:459:5: note: in instantiation of function template specialization '(anonymous namespace)::qnn_binary_op_impl<GGML_OP_MUL_MAT>' requested here
    qnn_binary_op_impl<GGML_OP_MUL_MAT>, // GGML_OP_MUL_MAT
    ^
13 errors generated.
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:146: ggml/src/CMakeFiles/ggml.dir/ggml-qnn/backend-ops.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
2 warnings generated.
make[1]: *** [CMakeFiles/Makefile2:1603: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:146: all] Error 2  

不确定具体问题在哪里,也许是QNN SDK不对?

PS:是否可以开放仓库的issue区,相关问题我们可到你的仓库进行讨论,避免打扰其他开发者

@chraac
Copy link
Author

chraac commented Sep 18, 2024

@chraac 抱歉打扰开发者,当前我也在高通骁龙Gen2的设备上基于llama.cpp部署LLM,使用的模型为Qwen2 0.5B,我已经注意到你的仓库中存在多个branch,请问如果我需要测试的话应该使用哪个分支呢?

Hi @FranzKafkaYu , 可以用 dev-refactoring 这个分支哈,但是就像我前面说的,现在mulmat跑还有问题,还在改中哈

今天有空测试了一下该分支,无法正常编译通过,编译command:

 mkdir build-android && cd build-android && cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=/home/franzkafka/Desktop/qnn/qairt/2.22.6.240515  .. && make -j4  

相关报错日志:

-- Using latest available ANDROID_PLATFORM: 33.
-- The C compiler identification is Clang 14.0.7
-- The CXX compiler identification is Clang 14.0.7
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Found OpenMP_C: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- OpenMP found
-- Using llamafile
QNN_SDK_PATH: /home/franzkafka/Desktop/qnn/qairt/2.22.6.240515
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /home/franzkafka/Desktop/llama/llama.cpp/build-android
[  0%] Generating build details from Git
[  1%] Building C object examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[  2%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  3%] Building C object examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
-- Found Git: /usr/bin/git (found version "2.34.1") 
[  4%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:2065:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
    GGML_F16_VEC_REDUCE(sumf, sum);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1166:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1156:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1086:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1071:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
[  4%] Built target build_info
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:2113:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
        GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1166:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1156:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1086:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml.c:1071:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
[  4%] Building C object examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[  4%] Built target sha256
[  4%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[  4%] Built target sha1
[  5%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[  6%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
[  6%] Building CXX object ggml/src/CMakeFiles/ggml.dir/llamafile/sgemm.cpp.o
[  6%] Built target xxhash
[  7%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-qnn/backend-ops.cpp.o
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:11:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:13:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/op-config.hpp:9:
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:254:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:361:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/qnn-lib.hpp:412:47: error: no member named 'variant_npos' in namespace 'std'
        if (_backend_name.find("Htp") != std::variant_npos) {
                                         ~~~~~^
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:11:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:13:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/op-config.hpp:11:
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/tensor.hpp:193:41: error: implicit instantiation of undefined template 'std::array<unsigned int, 4>'
    std::array<uint32_t, GGML_MAX_DIMS> _dimensions = {};
                                        ^
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__tuple:219:64: note: template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TEMPLATE_VIS array;
                                                               ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:246:1: error: static_assert failed due to requirement 'sizeof (kGgmlOpToQnnOp) / sizeof (kGgmlOpToQnnOp[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT)' "GGML_OP_COUNT does not match the size of the kGgmlOpToQnnOp table"
static_assert(sizeof(kGgmlOpToQnnOp) / sizeof(kGgmlOpToQnnOp[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:248:1: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[GGML_UNARY_OP_GELU + kGgmlUnaryOpStart] != nullptr' "GGML_UNARY_OP_GELU does not correspond to QNN_OP_GELU"
static_assert(kGgmlOpToQnnOp[GGML_UNARY_OP_GELU + kGgmlUnaryOpStart] != nullptr,
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:294:23: error: no matching function for call to 'get_qnn_graph_from_cache'
    auto *graph_ptr = get_qnn_graph_from_cache<2, 1>(ctx, _GgmlOp, { src0, src1 }, { dst });
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:252:22: note: candidate function template not viable: cannot convert initializer list argument to 'const std::array<ggml_tensor *, 2UL>'
qnn::ggml_qnn_graph *get_qnn_graph_from_cache(ggml_backend_qnn_context *ctx, size_t op,
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:317:23: error: no matching function for call to 'get_qnn_graph_from_cache'
    auto *graph_ptr = get_qnn_graph_from_cache<1, 1>(ctx, _GgmlOp, { src }, { dst });
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:252:22: note: candidate function template not viable: cannot convert initializer list argument to 'const std::array<ggml_tensor *, 1UL>'
qnn::ggml_qnn_graph *get_qnn_graph_from_cache(ggml_backend_qnn_context *ctx, size_t op,
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:431:1: error: static_assert failed due to requirement 'sizeof (kQnnUnaryOpsTable) / sizeof (kQnnUnaryOpsTable[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT)' "GGML_OP_COUNT does not match the size of the kQnnUnaryOpsTable table"
static_assert(sizeof(kQnnUnaryOpsTable) / sizeof(kQnnUnaryOpsTable[0]) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:519:1: error: static_assert failed due to requirement 'sizeof (kQnnBinaryOpsTable) / sizeof (kQnnBinaryOpsTable[0]) == GGML_OP_COUNT' "GGML_OP_COUNT does not match the size of the kQnnBinaryOpsTable table"
static_assert(sizeof(kQnnBinaryOpsTable) / sizeof(kQnnBinaryOpsTable[0]) == GGML_OP_COUNT,
^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:2:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.hpp:5:
In file included from /home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend.hpp:4:
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:3927:24: error: incompatible pointer types assigning to 'std::__shared_weak_count *' from 'std::__shared_ptr_emplace<qnn::ggml_qnn_tensor, std::allocator<qnn::ggml_qnn_tensor>> *'
        __r.__cntrl_ = __cntrl;
                       ^~~~~~~
/home/franzkafka/Desktop/ndk/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:4444:29: note: in instantiation of function template specialization 'std::shared_ptr<qnn::ggml_qnn_tensor>::__create_with_control_block<qnn::ggml_qnn_tensor, std::__shared_ptr_emplace<qnn::ggml_qnn_tensor, std::allocator<qnn::ggml_qnn_tensor>>>' requested here
    return shared_ptr<_Tp>::__create_with_control_block(__ptr, __hold2.release());
                            ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/graph.hpp:108:22: note: in instantiation of function template specialization 'std::make_shared<qnn::ggml_qnn_tensor, std::basic_string<char>, const QNNBackend &, void *&, std::shared_ptr<qnn::qnn_instance> &>' requested here
                std::make_shared<ggml_qnn_tensor>(std::string(buffer), _device, _graph_handle, _qnn_instance);
                     ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:312:5: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[86UL] != nullptr' "GGML_OP does not have a corresponding QNN_OP"
    static_assert(kGgmlOpToQnnOp[_GgmlOp] != nullptr, "GGML_OP does not have a corresponding QNN_OP");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:424:5: note: in instantiation of function template specialization '(anonymous namespace)::qnn_unary_op_impl<86UL>' requested here
    qnn_unary_op_impl<GGML_UNARY_OP_GELU + kGgmlUnaryOpStart>, // GGML_UNARY_OP_GELU
    ^
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:289:5: error: static_assert failed due to requirement 'kGgmlOpToQnnOp[(ggml_op)25U] != nullptr' "GGML_OP does not have a corresponding QNN_OP"
    static_assert(kGgmlOpToQnnOp[_GgmlOp] != nullptr, "GGML_OP does not have a corresponding QNN_OP");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/franzkafka/Desktop/llama/llama.cpp/ggml/src/ggml-qnn/backend-ops.cpp:459:5: note: in instantiation of function template specialization '(anonymous namespace)::qnn_binary_op_impl<GGML_OP_MUL_MAT>' requested here
    qnn_binary_op_impl<GGML_OP_MUL_MAT>, // GGML_OP_MUL_MAT
    ^
13 errors generated.
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:146: ggml/src/CMakeFiles/ggml.dir/ggml-qnn/backend-ops.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
2 warnings generated.
make[1]: *** [CMakeFiles/Makefile2:1603: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:146: all] Error 2  

不确定具体问题在哪里,也许是QNN SDK不对?

PS:是否可以开放仓库的issue区,相关问题我们可到你的仓库进行讨论,避免打扰其他开发者

Hi @FranzKafkaYu , 抱歉现在才回复,issue已经开了,另外你的这个问题也已经fix,这个是static assert,为了防止op增减造成内部op数组index错误故意设计的, 具体的设计可以到我的fork讨论

@jorge-abarca
Copy link

Hello, @chraac and @zhouwg. I wanted to thank you both for your work on this feature, just know that there are others like me that are following this closely. @AndreasKunar has mentioned this effort on his Performance of llama.cpp on Snapdragon X Elite/Plus discussion and the Support for Snapdragon X Elite NPU & GPU issue open in the ollama repo.

There is a bit of interest for those of us who want to use llama.cpp and ollama with Snapdragon X Elite, we are rooting for you!

As I was trying to see if there was anything I could do to give you a hand, I noticed that you seemed to be struggling a bit with a few things that might not be documented such as if the tensor should be freed up or if the SDK managed those resources internally, along with questions related to synchronization operations that might have made you consider waiting for technical support from Qualcomm.

How about engaging @yeonseok-zeticai? As he mentioned in the previously closed PR, he worked at Qualcomm until early this year, he has quite a bit of experience with the Qualcomm AI SDK, and he is interested in getting these things done. (Thank you @yeonseok-zeticai!)

It also appears that Andreas might have more time on October to also take a look at this. Would you like to coordinate any efforts? I know how to program in C++ but I am not as familiar with llama.cpp nor ollama as I would like to be; however, I can do my best to learn and aid, too, in any way possible.

@chraac
Copy link
Author

chraac commented Sep 26, 2024

Hi @jorge-abarca ,

Hello, @chraac and @zhouwg. I wanted to thank you both for your work on this feature, just know that there are others like me that are following this closely. @AndreasKunar has mentioned this effort on his Performance of llama.cpp on Snapdragon X Elite/Plus discussion and the Support for Snapdragon X Elite NPU & GPU issue open in the ollama repo.

There is a bit of interest for those of us who want to use llama.cpp and ollama with Snapdragon X Elite, we are rooting for you!

I'd like to start by thanking everyone for their attention to this project!
While this PR is currently inactive, I'm continuing to work on the refactoring in my own fork: chraac:dev-refactoring. If anyone is interested, please feel free to take a look and provide feedback!

As I was trying to see if there was anything I could do to give you a hand, I noticed that you seemed to be struggling a bit with a few things that might not be documented such as if the tensor should be freed up or if the SDK managed those resources internally, along with questions related to synchronization operations that might have made you consider waiting for technical support from Qualcomm.

We've made significant progress since my last comment. Here's our current status:

  1. The ADD operator is now functional, and the test-backend-ops runs without errors for this operation.
  2. We're currently working on implementing the MUL_MAT operation to pass the test-backend-ops. I've created a PR in my fork addressing this. Your feedback would be greatly appreciated!

How about engaging @yeonseok-zeticai? As he mentioned in the previously closed PR, he worked at Qualcomm until early this year, he has quite a bit of experience with the Qualcomm AI SDK, and he is interested in getting these things done. (Thank you @yeonseok-zeticai!)

Any assistance would be greatly appreciated! Please direct your comments and contributions to my fork: chraac:dev-refactoring

It also appears that Andreas might have more time on October to also take a look at this. Would you like to coordinate any efforts? I know how to program in C++ but I am not as familiar with llama.cpp nor ollama as I would like to be; however, I can do my best to learn and aid, too, in any way possible.

Reviewed the issue, and I'm delighted to hear that someone is interested in contributing to my fork. I'd be happy to discuss this further. Please feel free to raise issues and submit pull requests (PRs) on my fork. Your input is welcome and appreciated. Thank you!

@Pateo-sunnyhuang
Copy link

Pateo-sunnyhuang commented Sep 29, 2024

@chraac 你好,我验证了dev-refactoring分支,使用以下命令编译
mkdir build-android && cd build-android && cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=$QNN_SDK_ROOT .. && make -j4
编译完成后push到设备,使用llama-cli运行,发现还是使用cpu的,是因为MUL_MAT还没有实现的原因吗?
另外qnn的so如果不copy到/data/local/tmp,仍然能运行,不会报错

@chraac
Copy link
Author

chraac commented Sep 29, 2024

@chraac 你好,我验证了dev-refactoring分支,使用以下命令编译 mkdir build-android && cd build-android && cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=$QNN_SDK_ROOT .. && make -j4 编译完成后push到设备,使用llama-cli运行,发现还是使用cpu的,是因为MUL_MAT还没有实现的原因吗? 另外qnn的so如果不copy到/data/local/tmp,仍然能运行,不会报错

Hi @Pateo-sunnyhuang , 你好,感谢关注,现在mat_mul的支持还没完成,这里还在进行,所以你这里提到还是用cpu的情况是正常的,另外qnn的so是通过dlopen动态加载的,如果不复制过去,那qnn backend会返回失败,这时候应该会fallback到cpu backend
PS:Android的loader是google自己维护的,记得以前好像不会去LD_LIBRARY_PATH里面查找动态库,不知道现在改没有

@Pateo-sunnyhuang
Copy link

Pateo-sunnyhuang commented Oct 9, 2024

Hi @Pateo-sunnyhuang , 你好,感谢关注,现在mat_mul的支持还没完成,这里还在进行,所以你这里提到还是用cpu的情况是正常的,另外qnn的so是通过dlopen动态加载的,如果不复制过去,那qnn backend会返回失败,这时候应该会fallback到cpu backend PS:Android的loader是google自己维护的,记得以前好像不会去LD_LIBRARY_PATH里面查找动态库,不知道现在改没有

感谢回复,关于推理过程我加了一些日志,发现两个问题
1. 使用cpu推理,是因为在llama_new_context_with_model方法中,ggml_backend_qnn_init没有被调用
在我推理的qwen2.5-1.5b大模型的时候,qnn model->n_gpu_layers=0, model->main_gpu=0,所以if (model->n_gpu_layers > 0)这个条件无法满足,当我去除这个条件时,发现程序运行出错,以下是console的部分日志
`
.........................................................................
llama_new_context_with_model: n_ctx = 32768
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: qnn model->n_gpu_layers=0, model->main_gpu=0
ggml-qnn:[ggml_backend_qnn_init, 390]: ggml_backend_qnn_init(0, (null))
ggml-qnn:[ggml_backend_qnn_init, 393]: extend_lib_search_path is nullptr, will use /data/local/tmp/ as default
ggml-qnn:[ggml_backend_qnn_init, 429]: QNN-CPU backend setenv successfully

ggml-qnn:[load_system, 751]: find a valid qnn system interface

ggml-qnn:[qnn_system_interface, 10]: initialize qnn system successfully

ggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.so

ggml-qnn:[load_backend, 788]: num_providers=1

ggml-qnn:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, major=2, QNN_API_VERSION_MINOR=14, minor=14,

ggml-qnn:[load_backend, 814]: find a valid qnn interface

ggml-qnn:[qnn_init, 248]: device property is not supported

ggml-qnn:[qnn_init, 299]: create QNN device successfully

ggml-qnn:[ggml_backend_qnn_init, 449]: qnn device name QNN-CPU
llama_kv_cache_init: CPU KV buffer size = 896.00 MiB
llama_new_context_with_model: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.58 MiB
Segmentation fault
`
2. 希望使用npu而不是cpu/gpu进行推理
但是如上面打印的日志,传入ggml_backend_qnn_init方法的device参数是0不是2,如何才能使用NPU呢?

是否可以给出这两个问题的排查的方向或者指导。
另外,ggml_backend_registry_init中的代码似乎没走到,这里的qnn代码有什么作用。

@scguang301
Copy link

Hi @Pateo-sunnyhuang , 你好,感谢关注,现在mat_mul的支持还没完成,这里还在进行,所以你这里提到还是用cpu的情况是正常的,另外qnn的so是通过dlopen动态加载的,如果不复制过去,那qnn backend会返回失败,这时候应该会fallback到cpu backend PS:Android的loader是google自己维护的,记得以前好像不会去LD_LIBRARY_PATH里面查找动态库,不知道现在改没有

感谢回复,关于推理过程我加了一些日志,发现两个问题 1. 使用cpu推理,是因为在llama_new_context_with_model方法中,ggml_backend_qnn_init没有被调用 在我推理的qwen2.5-1.5b大模型的时候,qnn model->n_gpu_layers=0, model->main_gpu=0,所以if (model->n_gpu_layers > 0)这个条件无法满足,当我去除这个条件时,发现程序运行出错,以下是console的部分日志 ` ......................................................................... llama_new_context_with_model: n_ctx = 32768llama_new_context_with_model:n_ctx = 32768 llama_new_context_with_model: n_batch = 2048llama_new_context_with_model:n_batch = 2048 llama_new_context_with_model: n_ubatch = 512llama_new_context_with_model:n_ubatch = 512 llama_new_context_with_model: flash_attn = 0llama_new_context_with_model:flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0llama_new_context_with_model:freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1llama_new_context_with_model:freq_scale = 1 llama_new_context_with_model: qnn model->n_gpu_layers=0, model->main_gpu=0llama_new_context_with_model:qnn model->n_gpu_layers=0, model->main_gpu=0 ggml-qnn:[ggml_backend_qnn_init, 390]: ggml_backend_qnn_init(0, (null))ggml-qnn:[ggml_backend_qnn_init, 390]: ggml_backend_qnn_init(0, (null)) ggml-qnn:[ggml_backend_qnn_init, 393]: extend_lib_search_path is nullptr, will use /data/local/tmp/ as defaultggml-qnn:[ggml_backend_qnn_init, 393]:extend_lib_search_path为 nullptr,默认使用 /data/local/tmp/ ggml-qnn:[ggml_backend_qnn_init, 429]: QNN-CPU backend setenv successfullyggml-qnn:[ggml_backend_qnn_init, 429]:QNN-CPU 后端 setenv 成功

ggml-qnn:[load_system, 751]: find a valid qnn system interfaceggml-qnn:[load_system, 751]:查找有效的 QNN 系统接口

ggml-qnn:[qnn_system_interface, 10]: initialize qnn system successfullyggml-qnn:[qnn_system_interface, 10]: 初始化 QNN 系统成功

ggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.soggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.so

ggml-qnn:[load_backend, 788]: num_providers=1ggml-qnn:[load_backend, 788]: num_providers=1

ggml-qnn:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, major=2, QNN_API_VERSION_MINOR=14, minor=14,GGML-QNN:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, 主要=2, QNN_API_VERSION_MINOR=14, 次要=14,

ggml-qnn:[load_backend, 814]: find a valid qnn interfaceGGGML-qnn:[load_backend, 814]: 查找有效的 QNN 接口

ggml-qnn:[qnn_init, 248]: device property is not supportedggml-qnn:[qnn_init, 248]:不支持 device 属性

ggml-qnn:[qnn_init, 299]: create QNN device successfullyggml-qnn:[qnn_init, 299]: 成功创建 QNN 设备

ggml-qnn:[ggml_backend_qnn_init, 449]: qnn device name QNN-CPUggml-qnn:[ggml_backend_qnn_init, 449]:qnn 设备名称 QNN-CPU llama_kv_cache_init: CPU KV buffer size = 896.00 MiBllama_kv_cache_init:CPU KV 缓冲区大小 = 896.00 MiB llama_new_context_with_model: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiBllama_new_context_with_model:KV 自身大小 = 896.00 MiB,K (f16):448.00 MiB,V (f16):448.00 MiB llama_new_context_with_model: CPU output buffer size = 0.58 MiBllama_new_context_with_model:CPU 输出缓冲区大小 = 0.58 MiB Segmentation fault 分段错误 ` 2. 希望使用npu而不是cpu/gpu进行推理 但是如上面打印的日志,传入ggml_backend_qnn_init方法的device参数是0不是2,如何才能使用NPU呢?

是否可以给出这两个问题的排查的方向或者指导。 另外,ggml_backend_registry_init中的代码似乎没走到,这里的qnn代码有什么作用。

Hi @Pateo-sunnyhuang , 你好,感谢关注,现在mat_mul的支持还没完成,这里还在进行,所以你这里提到还是用cpu的情况是正常的,另外qnn的so是通过dlopen动态加载的,如果不复制过去,那qnn backend会返回失败,这时候应该会fallback到cpu backend PS:Android的loader是google自己维护的,记得以前好像不会去LD_LIBRARY_PATH里面查找动态库,不知道现在改没有

感谢回复,关于推理过程我加了一些日志,发现两个问题 1. 使用cpu推理,是因为在llama_new_context_with_model方法中,ggml_backend_qnn_init没有被调用 在我推理的qwen2.5-1.5b大模型的时候,qnn model->n_gpu_layers=0, model->main_gpu=0,所以if (model->n_gpu_layers > 0)这个条件无法满足,当我去除这个条件时,发现程序运行出错,以下是console的部分日志 ` ......................................................................... llama_new_context_with_model: n_ctx = 32768 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: qnn model->n_gpu_layers=0, model->main_gpu=0 ggml-qnn:[ggml_backend_qnn_init, 390]: ggml_backend_qnn_init(0, (null)) ggml-qnn:[ggml_backend_qnn_init, 393]: extend_lib_search_path is nullptr, will use /data/local/tmp/ as default ggml-qnn:[ggml_backend_qnn_init, 429]: QNN-CPU backend setenv successfully

ggml-qnn:[load_system, 751]: find a valid qnn system interface

ggml-qnn:[qnn_system_interface, 10]: initialize qnn system successfully

ggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.so

ggml-qnn:[load_backend, 788]: num_providers=1

ggml-qnn:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, major=2, QNN_API_VERSION_MINOR=14, minor=14,

ggml-qnn:[load_backend, 814]: find a valid qnn interface

ggml-qnn:[qnn_init, 248]: device property is not supported

ggml-qnn:[qnn_init, 299]: create QNN device successfully

ggml-qnn:[ggml_backend_qnn_init, 449]: qnn device name QNN-CPU llama_kv_cache_init: CPU KV buffer size = 896.00 MiB llama_new_context_with_model: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB llama_new_context_with_model: CPU output buffer size = 0.58 MiB Segmentation fault ` 2. 希望使用npu而不是cpu/gpu进行推理 但是如上面打印的日志,传入ggml_backend_qnn_init方法的device参数是0不是2,如何才能使用NPU呢?

是否可以给出这两个问题的排查的方向或者指导。 另外,ggml_backend_registry_init中的代码似乎没走到,这里的qnn代码有什么作用。

llama-cli 有参数选择使用deviceid, 参数是 -mg, 可以设置为2

@chraac
Copy link
Author

chraac commented Oct 12, 2024

Hi @Pateo-sunnyhuang , 你好,感谢关注,现在mat_mul的支持还没完成,这里还在进行,所以你这里提到还是用cpu的情况是正常的,另外qnn的so是通过dlopen动态加载的,如果不复制过去,那qnn backend会返回失败,这时候应该会fallback到cpu backend PS:Android的loader是google自己维护的,记得以前好像不会去LD_LIBRARY_PATH里面查找动态库,不知道现在改没有

感谢回复,关于推理过程我加了一些日志,发现两个问题 1. 使用cpu推理,是因为在llama_new_context_with_model方法中,ggml_backend_qnn_init没有被调用 在我推理的qwen2.5-1.5b大模型的时候,qnn model->n_gpu_layers=0, model->main_gpu=0,所以if (model->n_gpu_layers > 0)这个条件无法满足,当我去除这个条件时,发现程序运行出错,以下是console的部分日志 ` ......................................................................... llama_new_context_with_model: n_ctx = 32768 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: qnn model->n_gpu_layers=0, model->main_gpu=0 ggml-qnn:[ggml_backend_qnn_init, 390]: ggml_backend_qnn_init(0, (null)) ggml-qnn:[ggml_backend_qnn_init, 393]: extend_lib_search_path is nullptr, will use /data/local/tmp/ as default ggml-qnn:[ggml_backend_qnn_init, 429]: QNN-CPU backend setenv successfully

ggml-qnn:[load_system, 751]: find a valid qnn system interface

ggml-qnn:[qnn_system_interface, 10]: initialize qnn system successfully

ggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.so

ggml-qnn:[load_backend, 788]: num_providers=1

ggml-qnn:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, major=2, QNN_API_VERSION_MINOR=14, minor=14,

ggml-qnn:[load_backend, 814]: find a valid qnn interface

ggml-qnn:[qnn_init, 248]: device property is not supported

ggml-qnn:[qnn_init, 299]: create QNN device successfully

ggml-qnn:[ggml_backend_qnn_init, 449]: qnn device name QNN-CPU llama_kv_cache_init: CPU KV buffer size = 896.00 MiB llama_new_context_with_model: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB llama_new_context_with_model: CPU output buffer size = 0.58 MiB Segmentation fault ` 2. 希望使用npu而不是cpu/gpu进行推理 但是如上面打印的日志,传入ggml_backend_qnn_init方法的device参数是0不是2,如何才能使用NPU呢?

是否可以给出这两个问题的排查的方向或者指导。 另外,ggml_backend_registry_init中的代码似乎没走到,这里的qnn代码有什么作用。

Hi,你好!不好意思,最近比较忙,回复比较慢:

针对第一点这个情况,upstream的backend registry最近一直在重构,所以这里一直有变动,我也在根据新的接口适配,具体可以看upstream的这个project:https://github.com/users/ggerganov/projects/12

第二点这个,现在我主要精力还是集中在让 mat_mul 能通过 test-backend-ops 测试,这个test其实和llama-cli一样也是通过registry创建的backend,所以用法应该和test-backend-ops一致,具体可以参考 @scguang301 的comment

另外如果关注 mat_mul 支持的进度,可以看我的fork的这个PR: chraac#2

zhouwg pushed a commit that referenced this pull request Jan 29, 2025
* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: ggerganov#8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2)

---------

Co-authored-by: 0cc4m <[email protected]>
@myan-o
Copy link

myan-o commented Feb 6, 2025

It's been a while. I'm looking forward to seeing it finished. How far along are you now?

@chraac
Copy link
Author

chraac commented Feb 6, 2025

It's been a while. I'm looking forward to seeing it finished. How far along are you now?

Hey, this PR is outdated and far more behind current progress, for more update and for latest refactoring, can have a look on my fork: https://github.com/chraac/llama.cpp

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

It's been a while. I'm looking forward to seeing it finished. How far along are you now?

Hey, this PR is outdated and far more behind current progress, for more update and for latest refactoring, can have a look on my fork: https://github.com/chraac/llama.cpp

I download your https://github.com/chraac/llama.cpp but build failed,which QNN SDK are you using?

cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAG
S=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=/opt/qcom/aistack/qnn/2.20.0.240223  .. && make -j4
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/utils.cpp:334:14: error: use of undeclared identifier 'QNN_GRAPH_ERROR_INVALID_CONTEXT'
        case QNN_GRAPH_ERROR_INVALID_CONTEXT:
             ^
1 error generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:174: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/utils.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:88:9: error: use of undeclared identifier 'QNN_OP_RMS_NORM'; did you mean 'GGML_OP_RMS_NORM'?
        QNN_OP_RMS_NORM,               // qnn_op_name
        ^~~~~~~~~~~~~~~
        GGML_OP_RMS_NORM
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:454:9: note: 'GGML_OP_RMS_NORM' declared here
        GGML_OP_RMS_NORM,
        ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:91:9: error: use of undeclared identifier 'QNN_OP_RMS_NORM_PARAM_EPSILON'
        QNN_OP_RMS_NORM_PARAM_EPSILON, // qnn_param_name
        ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:192:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_NONE].calc_dims_func == nullptr, "GGML_OP_NONE should not have calc_dims_func function");
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:192:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:193:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_ADD].calc_dims_func == element_wise_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:193:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:195:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_MUL_MAT].calc_dims_func == mat_mul_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:195:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:197:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_LOG].calc_dims_func == element_wise_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:197:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:199:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_COUNT + GGML_UNARY_OP_GELU].input_param_count == 1,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:199:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:201:15: error: no matching function for call to 'size'
static_assert(std::size(kOpCaps) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
              ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:215:25: error: no matching function for call to 'size'
    static_assert(_op < std::size(kOpCaps));
                        ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:232:25: error: no matching function for call to 'size'
    static_assert(_op < std::size(kOpCaps));
                        ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:380:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:388:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:395:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:402:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:234:21: error: constexpr variable 'op_caps' must be initialized by a constant expression
    constexpr auto &op_caps = kOpCaps[_op];
                    ^         ~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:271:5: note: in instantiation of function template specialization '(anonymous namespace)::op_constructor_with_type_param<23UL, float, qnn::ggml_qnn_rmsnorm_op_config>' requested here
    op_constructor_with_type_param<GGML_OP_RMS_NORM, float, qnn::ggml_qnn_rmsnorm_op_config>, // GGML_OP_RMS_NORM
    ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:234:31: note: indexing of array without known bound is not allowed in a constant expression
    constexpr auto &op_caps = kOpCaps[_op];
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:235:19: error: static assertion expression is not an integral constant expression
    static_assert(op_caps.qnn_op_name != nullptr);
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:235:19: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
16 errors generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:132: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/op-config-caps.cpp.o] Error 1
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-impl.cpp:192:22: error: use of undeclared identifier 'QNN_OP_RMS_NORM_PARAM_AXES'
    add_tensor_param(QNN_OP_RMS_NORM_PARAM_AXES, {1}, 1, reinterpret_cast<const uint8_t *>(kAxes), QNN_DATATYPE_UINT_32,
                     ^
1 error generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:146: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/op-config-impl.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1786: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

@chraac
Copy link
Author

chraac commented Feb 6, 2025

It's been a while. I'm looking forward to seeing it finished. How far along are you now?

Hey, this PR is outdated and far more behind current progress, for more update and for latest refactoring, can have a look on my fork: https://github.com/chraac/llama.cpp

I download your https://github.com/chraac/llama.cpp but build failed,which QNN SDK are you using?

cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAG
S=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=/opt/qcom/aistack/qnn/2.20.0.240223  .. && make -j4
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/utils.cpp:334:14: error: use of undeclared identifier 'QNN_GRAPH_ERROR_INVALID_CONTEXT'
        case QNN_GRAPH_ERROR_INVALID_CONTEXT:
             ^
1 error generated.

Hi, thought you should use 2.29.0.241129 or above regarding this error log, or my builder here:
https://github.com/chraac/llama-cpp-qnn-builder

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

It's been a while. I'm looking forward to seeing it finished. How far along are you now?

Hey, this PR is outdated and far more behind current progress, for more update and for latest refactoring, can have a look on my fork: https://github.com/chraac/llama.cpp

I download your https://github.com/chraac/llama.cpp but build failed,which QNN SDK are you using?

cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAG
S=-march=armv8.4a+dotprod -DGGML_QNN=ON -DGGML_QNN_SDK_PATH=/opt/qcom/aistack/qnn/2.20.0.240223  .. && make -j4
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/utils.cpp:334:14: error: use of undeclared identifier 'QNN_GRAPH_ERROR_INVALID_CONTEXT'
        case QNN_GRAPH_ERROR_INVALID_CONTEXT:
             ^
1 error generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:174: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/utils.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:88:9: error: use of undeclared identifier 'QNN_OP_RMS_NORM'; did you mean 'GGML_OP_RMS_NORM'?
        QNN_OP_RMS_NORM,               // qnn_op_name
        ^~~~~~~~~~~~~~~
        GGML_OP_RMS_NORM
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:454:9: note: 'GGML_OP_RMS_NORM' declared here
        GGML_OP_RMS_NORM,
        ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:91:9: error: use of undeclared identifier 'QNN_OP_RMS_NORM_PARAM_EPSILON'
        QNN_OP_RMS_NORM_PARAM_EPSILON, // qnn_param_name
        ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:192:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_NONE].calc_dims_func == nullptr, "GGML_OP_NONE should not have calc_dims_func function");
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:192:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:193:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_ADD].calc_dims_func == element_wise_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:193:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:195:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_MUL_MAT].calc_dims_func == mat_mul_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:195:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:197:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_LOG].calc_dims_func == element_wise_op_dims,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:197:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:199:15: error: static assertion expression is not an integral constant expression
static_assert(kOpCaps[GGML_OP_COUNT + GGML_UNARY_OP_GELU].input_param_count == 1,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:199:15: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:201:15: error: no matching function for call to 'size'
static_assert(std::size(kOpCaps) == (GGML_OP_COUNT + GGML_UNARY_OP_COUNT),
              ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:215:25: error: no matching function for call to 'size'
    static_assert(_op < std::size(kOpCaps));
                        ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:232:25: error: no matching function for call to 'size'
    static_assert(_op < std::size(kOpCaps));
                        ^~~~~~~~~
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:380:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:388:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:395:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:402:28: error: no matching function for call to 'size'
    GGML_ASSERT(op_index < std::size(kOpCaps));
                           ^~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/../include/ggml.h:267:30: note: expanded from macro 'GGML_ASSERT'
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
                             ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:28:16: note: candidate template ignored: substitution failure [with _Cont = qnn_op_caps_t[]]: member reference base type 'const (anonymous namespace)::qnn_op_caps_t[]' is not a structure or union
constexpr auto size(const _Cont& __c)
               ^
/home/weiguo/cdeos/kantv/prebuilts/toolchain/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/__iterator/size.h:35:18: note: candidate template ignored: could not match 'const _Tp[_Sz]' against 'const const qnn_op_caps_t[]'
constexpr size_t size(const _Tp (&)[_Sz]) noexcept { return _Sz; }
                 ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:234:21: error: constexpr variable 'op_caps' must be initialized by a constant expression
    constexpr auto &op_caps = kOpCaps[_op];
                    ^         ~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:271:5: note: in instantiation of function template specialization '(anonymous namespace)::op_constructor_with_type_param<23UL, float, qnn::ggml_qnn_rmsnorm_op_config>' requested here
    op_constructor_with_type_param<GGML_OP_RMS_NORM, float, qnn::ggml_qnn_rmsnorm_op_config>, // GGML_OP_RMS_NORM
    ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:234:31: note: indexing of array without known bound is not allowed in a constant expression
    constexpr auto &op_caps = kOpCaps[_op];
                              ^
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:235:19: error: static assertion expression is not an integral constant expression
    static_assert(op_caps.qnn_op_name != nullptr);
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:235:19: note: initializer of 'kOpCaps' is unknown
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-caps.cpp:32:31: note: declared here
constexpr const qnn_op_caps_t kOpCaps[] = {
                              ^
16 errors generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:132: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/op-config-caps.cpp.o] Error 1
/home/weiguo/llama.cpp-dev-refactoring/ggml/src/ggml-qnn/op-config-impl.cpp:192:22: error: use of undeclared identifier 'QNN_OP_RMS_NORM_PARAM_AXES'
    add_tensor_param(QNN_OP_RMS_NORM_PARAM_AXES, {1}, 1, reinterpret_cast<const uint8_t *>(kAxes), QNN_DATATYPE_UINT_32,
                     ^
1 error generated.
make[2]: *** [ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/build.make:146: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/op-config-impl.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1786: ggml/src/ggml-qnn/CMakeFiles/ggml-qnn.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Hi, thought you should use 2.29.0.241129 or above regarding this error log, or my builder here: https://github.com/chraac/llama-cpp-qnn-builder

/opt/qcom/aistack/qairt/2.31.0.250130 works.thanks.

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

Congratulations what you did with ggml-qnn backend in the https://github.com/chraac/llama.cpp, I'd like to see that your PR can be approved in the upstream llama.cpp.of course, at the same time I hope Qualcomm can submit an official ggml-qnn backend PR to upstream llama.cpp.

You seemed to have absorbed/borrowed all the codes related to ggml-qnn backend in project kantv which similar to DeepSeek-R1 and OpenAI-o1,but mostly I'd like to continue to execute on my roadmap (put everything in one single source file and enable it works as expected before refine codes or code reconstruction) before to succeed at final mission.So your PR will not be merged in this forked repo.

Thanks for your understanding.

BTW,at this time I have to say that you may have already brought some troubles to me(I have to give response to your some meaningless comments in that PR otherwise the busy-working owner of upstream llama.cpp might misunderstood me, although I hope I can got some meaningful help/guidance from domain experts at that time) in the locked ggml-qnn backend PR in upstream although I think you were trying to provide help to that PR.of course,I hope this is just my personal misunderstanding.

@zhouwg zhouwg closed this Feb 6, 2025
@chraac
Copy link
Author

chraac commented Feb 6, 2025

Congratulations what you did with ggml-qnn backend in the https://github.com/chraac/llama.cpp, I'd like to see that your PR can be approved in the upstream llama.cpp.of course, at the same time I hope Qualcomm can submit an official ggml-qnn backend PR to upstream llama.cpp.

You seemed to have absorbed/borrowed all the codes related to ggml-qnn backend in project kantv which similar to DeepSeek-R1 and OpenAI-o1,but mostly I'd like to continue to execute on my roadmap (put everything in one single source file before refine codes)before to succeed at final mission.So your PR will not be merged in this forked repo.

Thanks for your understanding.

BTW,at this time I have to say that you may have already brought some troubles(I have to give response to your some meaningless comments in that PR otherwise the busy-working owner of upstream llama.cpp might misunderstand me) to me in the locked ggml-qnn backend PR in upstream although I think you were trying to provide help to that PR.of course,I hope this is just my personal misunderstanding.

Hi, first wanna say sorry for not continuing with this PR on your fork, since i realized we'd need to completely redo some parts, plus it's missing some key features (like transpose before matmul) that we need to pass the basic test-backend-ops tests. thats why i ended up making my own fork from your branch.
Since then, I've moved pretty far ahead and my version is really different from yours now, so I think we can go ahead and close this PR.
Also, I saw you made your own unit tests for add/mul stuff - just wanted to mention that these are actually already covered by test-backend-ops upstream, so we probably don't need them.
Quick update on my fork - we've passed the test-backend-ops tests using the qnn backend. I'm focusing on making it run faster now, which will probably need some extra work.

my fork: https://github.com/chraac/llama.cpp
builder: https://github.com/chraac/llama-cpp-qnn-builder

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

Congratulations to your "pretty far ahead" and I think you should made such decision in last year before I was blocked in upstream.

Sincerely,I hope you can submit a standalone PR in upstream llama.cpp so I can learn something from the relative discussion.

@chraac
Copy link
Author

chraac commented Feb 6, 2025

Congratulations to your "pretty far ahead" and I think you should made such decision in last year before I was blocked in upstream.

Sincerely,I hope you can submit a standalone PR in upstream llama.cpp so I can learn something from the relative discussion.

Thanks!
will create a PR in upstream after the refactoring finished, now forcing on boost the perf....

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

Congratulations what you did with ggml-qnn backend in the https://github.com/chraac/llama.cpp, I'd like to see that your PR can be approved in the upstream llama.cpp.of course, at the same time I hope Qualcomm can submit an official ggml-qnn backend PR to upstream llama.cpp.
You seemed to have absorbed/borrowed all the codes related to ggml-qnn backend in project kantv which similar to DeepSeek-R1 and OpenAI-o1,but mostly I'd like to continue to execute on my roadmap (put everything in one single source file before refine codes)before to succeed at final mission.So your PR will not be merged in this forked repo.
Thanks for your understanding.
BTW,at this time I have to say that you may have already brought some troubles(I have to give response to your some meaningless comments in that PR otherwise the busy-working owner of upstream llama.cpp might misunderstand me) to me in the locked ggml-qnn backend PR in upstream although I think you were trying to provide help to that PR.of course,I hope this is just my personal misunderstanding.

Hi, first wanna say sorry for not continuing with this PR on your fork, since i realized we'd need to completely redo some parts, plus it's missing some key features (like transpose before matmul) that we need to pass the basic test-backend-ops tests. thats why i ended up making my own fork from your branch. Since then, I've moved pretty far ahead and my version is really different from yours now, so I think we can go ahead and close this PR. Also, I saw you made your own unit tests for add/mul stuff - just wanted to mention that these are actually already covered by test-backend-ops upstream, so we probably don't need them. Quick update on my fork - we've passed the test-backend-ops tests using the qnn backend. I'm focusing on making it run faster now, which will probably need some extra work.

my fork: https://github.com/chraac/llama.cpp builder: https://github.com/chraac/llama-cpp-qnn-builder

Also, I saw you made your own unit tests for add/mul stuff - just wanted to mention that these are actually already covered by test-backend-ops upstream, so we probably don't need them.

certainly I know test-backend-ops.cpp from the author of backend subsystem in upstream llama.cpp after I finished a clean-room implementation of real-time AI subtitle for English online-TV via the great/exceptional whisper.cpp successfully from 03/05/2024 to 03/16/2024 in less then two weeks.

what you learnt/borrowed from project kantv is all done/reverse-engineering by this simple self-made command line qnn-ut program for Android phone.pls don't thanks me again because I learnt them from Qualcomm's source codes.

your behavior seems similar to DeepSeek-R1(they learnt/borrowed from OpenAI-o1) and OpenAI-o1(they learnt from Google's great paper "attention is all you need"). I personally think your behavior in upstream PR is not a good manner in open source community:in other words, I'll don't file any technical comments to a specific PR if I'm not a real domain technical expert because this manner will brings massive troubles to the author of PR and waste time/efforts of a specific project.

@zhouwg
Copy link
Owner

zhouwg commented Feb 6, 2025

Congratulations to your "pretty far ahead" and I think you should made such decision in last year before I was blocked in upstream.
Sincerely,I hope you can submit a standalone PR in upstream llama.cpp so I can learn something from the relative discussion.

Thanks! will create a PR in upstream after the refactoring finished, now forcing on boost the perf....

best wishes for your PR in the upstream llama.cpp then I can learn something meaningful stuff from relative discussion.

I think your existing implementation can be considered as a code-reconstruction of my previous implementation after I carefully checked your implementation on 02/07/2025. so I will continue to execute on zhouwg/kantv#246 (put everything in one single source file and enable it works pretty good before refine codes or code reconstruction) before to succeed at final mission.

at the same time,I think such these behaviors are completely inappropriate(learning something from my open-source project or PR in the upstream llama.cpp are greatly welcomed, but pls don't brought some troubles to me in the upstream PR and waste my time in the upstream PR and finally cause I was blocked in the upstream llama.cpp community, which I already mentioned in ggerganov#6869 (comment)).

Finally, I think I'm an open-minded programmer and I hope your standalone PR can be seen in the upstream llama.cpp community and best wishes for your standalone PR.

Repository owner locked as off-topic and limited conversation to collaborators Feb 6, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Repository owner deleted a comment from chraac Feb 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants