ggml: Add run-time detection of neon, i8mm and sve #9331

eddnjjn · 2024-09-06T05:55:14Z

This patch adds run-time detection of the Arm instructions set features Arm® Neon™, i8mm and sve for Linux and Apple build targets. The run-time detection is enabled for aarch64 builds and done in ggml_init. The data is stored in a global struct instance to be later used by the ggml_cpu_has_* functions.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggerganov · 2024-09-09T07:35:27Z

ggml/src/ggml.c

+#if defined(__aarch64__)
+ ggml_init_aarch64_features();
 #endif


For clarity, this call should be within the is_first_call section above

Adds run-time detection of the Arm instructions set features neon, i8mm and sve for Linux and Apple build targets.

eddnjjn · 2024-09-18T06:36:22Z

Thanks for the review @ggerganov . I've rebased the patch and addressed your comment by moving the invocation of ggml_init_aarch64_features to the is_first_call section. I also updated ggml_init_aarch64_features to not check for first invocation since this is done in ggml_init.

Please let me know if you have additional comments.

ggerganov · 2024-09-19T08:38:08Z

ggml/src/ggml.c

+#if defined(__aarch64__)
+ ggml_init_aarch64_features();
+#endif
+


This looks incorrect because ARM NEON presence is now associated with __aarch64__, but this is not always the case AFAIK. For example, here we have support for __ARM_NEON && !__aarch64__, such as Raspberry Pi:

llama.cpp/ggml/src/ggml-cpu-impl.h

Lines 132 to 167 in 64c6af3

#if defined(__ARM_NEON)

// if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example:

//

// $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/

//

#include <arm_neon.h>

#ifdef _MSC_VER

typedef uint16_t ggml_fp16_internal_t;

#define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) }

#else

typedef __fp16 ggml_fp16_internal_t;

#define ggml_vld1q_u32(w,x,y,z) { (w), (x), (y), (z) }

#endif // _MSC_VER

#if !defined(__aarch64__)

// 32-bit ARM compatibility

// vaddlvq_s16

// vpaddq_s16

// vpaddq_s32

// vaddvq_s32

// vaddvq_f32

// vmaxvq_f32

// vcvtnq_s32_f32

// vzip1_u8

// vzip2_u8

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 6, 2024

ggerganov approved these changes Sep 9, 2024

View reviewed changes

ggml: Added run-time detection of neon, i8mm and sve

aab436c

Adds run-time detection of the Arm instructions set features neon, i8mm and sve for Linux and Apple build targets.

eddnjjn force-pushed the cpu-runtime-feature-detection branch from 8324367 to aab436c Compare September 18, 2024 06:12

ggerganov reviewed Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: Add run-time detection of neon, i8mm and sve #9331

ggml: Add run-time detection of neon, i8mm and sve #9331

eddnjjn commented Sep 6, 2024

ggerganov Sep 9, 2024

eddnjjn commented Sep 18, 2024

ggerganov Sep 19, 2024

	#if defined(__ARM_NEON)

	// if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example:
	//
	// $ ln -sfn /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/include/arm_neon.h ./src/
	//
	#include <arm_neon.h>

	#ifdef _MSC_VER

	typedef uint16_t ggml_fp16_internal_t;

	#define ggml_vld1q_u32(w,x,y,z) { ((w) + ((uint64_t)(x) << 32)), ((y) + ((uint64_t)(z) << 32)) }

	#else

	typedef __fp16 ggml_fp16_internal_t;

	#define ggml_vld1q_u32(w,x,y,z) { (w), (x), (y), (z) }

	#endif // _MSC_VER

	#if !defined(__aarch64__)

	// 32-bit ARM compatibility

	// vaddlvq_s16
	// vpaddq_s16
	// vpaddq_s32
	// vaddvq_s32
	// vaddvq_f32
	// vmaxvq_f32
	// vcvtnq_s32_f32
	// vzip1_u8
	// vzip2_u8

ggml: Add run-time detection of neon, i8mm and sve #9331

Are you sure you want to change the base?

ggml: Add run-time detection of neon, i8mm and sve #9331

Conversation

eddnjjn commented Sep 6, 2024

ggerganov Sep 9, 2024

Choose a reason for hiding this comment

eddnjjn commented Sep 18, 2024

ggerganov Sep 19, 2024

Choose a reason for hiding this comment