Move +dotprod AArch64 intrinsics behind compiler options by MDevereau · Pull Request #2862 · google/highway

MDevereau · 2026-02-11T10:23:03Z

Move +dotprod AArch64 intrinsics behind compiler options. Also remove their reliance on NEON_BF16 to be enabled on other NEON targets.

johnplatts · 2026-02-11T23:07:48Z

@MDevereau and @jan-wassenberg - Should DotProd and I8MM be enabled by default for the NEON_BF16 target on Apple OS's as all Apple Silicon AArch64 CPU's that support the AArch64 BF16 extension (Apple A15 and later and Apple M2 and later) are known to have support for both DotProd and I8MM according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td?

All Apple Silicon Macs (but not the iPhone X/XR/XS or earlier or 8th generation iPad or earlier) have support for the AArch64 DotProd extension, including Apple Silicon Macs with the Apple M1 CPU, according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td.

Should HWY_NEON be the default baseline target for Apple OS's as all Apple Silicon CPU's support the AArch64 AES extension (even the Apple A7 CPU found on the first Apple devices with AArch64 CPU's such as the iPhone 5S, iPad Mini 2, and original iPad Air) according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td (but there are some non-Apple Silicon AArch64 CPU's out there that do not support the AArch64 AES extension)?

jan-wassenberg · 2026-02-12T17:36:56Z

hwy/ops/arm_neon-inl.h

 // ------------------------------ SumOfMulQuadAccumulate
-
-#if HWY_TARGET == HWY_NEON_BF16
+#ifdef __ARM_FEATURE_DOTPROD


Unfortunately, we are rarely able to add flags to builds.
Hence we'd like to detect what's available at runtime and then dispatch to various targets.
We can certainly discuss what features map to which targets, and there's perhaps also space for another target.

What's the main driver for you here: do we want to add dot to NEON, or remove it from NEON_BF16?

Depending on the answer: would it work if we defined(feature) || TARGET==NEON_BF16?

We want to remove its dependance on NEON_BF16.

We have seen issues with certain HWCAPS being reported as unavailable in certain circumstances, which can result in the entire target of NEON_BF16 being denied because of the absence of i8mm. There are also cores such as the Apple A14 which (according to LLVM) have dotprod and aes, but not bf16 and i8mm which contributes to the problem of some cores being gated out of their full instruction set.

defined(feature) || TARGET==NEON_BF16 sounds like it would work.

Rather than create another target based flag combinations, would it make sense to have a target for specific cores? That way, a baseline NEON target could have all features optional for customisation, but a neoverse-v2 core target could still use runtime detection to enable its full instruction set.

Got it. I believe John has already fixed the issue of the A14.

It sounds like defined(feature) || TARGET==NEON_BF16 will help but not be a full solution for what you want (allow bf16 without i8mm). We'd at least have to add another target.

We're open to suggestions on what combination of features to group into a target. On x86 in practice we also have avx2=Haswell, avx3=Skylake, avx3_DL=Icelake, and an explicit SPR and Zen4+ target.
The Arm bitfield has 3 free spots below HWY_SVE2_128, and arguably 2 more which we could reclaim by rearranging/compacting gaps around HWY_NEON_BF16. The last time we did such a reordering, I do not recall any reported issues.

Just bear in mind that a lower bit value means better, and the lowest one reported as supported will be used. This leads to an interesting question: if we have i8mm\BF16, and BF16\i8mm, which is better?

jan-wassenberg · 2026-02-12T17:42:52Z

@MDevereau and @jan-wassenberg - Should DotProd and I8MM be enabled by default for the NEON_BF16 target on Apple OS's as all Apple Silicon AArch64 CPU's that support the AArch64 BF16 extension (Apple A15 and later and Apple M2 and later) are known to have support for both DotProd and I8MM according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td?

Thanks for checking that. Makes sense to me. I'm happy to enable as much as we can without requiring build flags, which as mentioned are problematic in our setting.

All Apple Silicon Macs (but not the iPhone X/XR/XS or earlier or 8th generation iPad or earlier) have support for the AArch64 DotProd extension, including Apple Silicon Macs with the Apple M1 CPU, according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td.

Also sounds good to me.

Should HWY_NEON be the default baseline target for Apple OS's as all Apple Silicon CPU's support the AArch64 AES extension (even the Apple A7 CPU found on the first Apple devices with AArch64 CPU's such as the iPhone 5S, iPad Mini 2, and original iPad Air) according to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64Processors.td (but there are some non-Apple Silicon AArch64 CPU's out there that do not support the AArch64 AES extension)?

This as well :)

igorginzburg · 2026-02-23T17:59:32Z

A neon target with i8mm would be very helpful for scann, https://github.com/google-research/google-research/tree/master/scann. Either NEON_BF16 or a new target.

i8mm is a big win there, but we don't have any easy way to set build flags (-march +i8mm).

jan-wassenberg · 2026-02-25T16:42:14Z

ACK, thanks. @MDevereau any thoughts on which targets we should add?

MDevereau · 2026-02-26T12:08:40Z

II don't have any particular targets in mind, at least not for now. This PR was mainly about making +dotprod intrinsics available from both NEON and NEON_BF16 individually for testing different combinations, and to streamline it to be similar to how SVE incorporates its bf16 instructions. If this isn't worth the trade-off for additional flags and performance then I'll close it with no further asks.

jan-wassenberg · 2026-02-26T17:23:02Z

Got it, thanks. We're happy to enable this change/your PR in modified form, if you change the if(feature) to defined(feature) || TARGET==NEON_BF16 as mentioned.

Move +dotprod AArch64 intrinsics behind compiler options

a85f781

Move +dotprod AArch64 intrinsics behind compiler options. Also remove their reliance on NEON_BF16 to be enabled on other NEON targets.

jan-wassenberg reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move +dotprod AArch64 intrinsics behind compiler options#2862

Move +dotprod AArch64 intrinsics behind compiler options#2862
MDevereau wants to merge 1 commit intogoogle:masterfrom
MDevereau:dotprod

MDevereau commented Feb 11, 2026

Uh oh!

johnplatts commented Feb 11, 2026

Uh oh!

jan-wassenberg Feb 12, 2026

Uh oh!

MDevereau Feb 23, 2026

Uh oh!

jan-wassenberg Feb 23, 2026

Uh oh!

jan-wassenberg commented Feb 12, 2026

Uh oh!

igorginzburg commented Feb 23, 2026

Uh oh!

jan-wassenberg commented Feb 25, 2026

Uh oh!

MDevereau commented Feb 26, 2026

Uh oh!

jan-wassenberg commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MDevereau commented Feb 11, 2026

Uh oh!

johnplatts commented Feb 11, 2026

Uh oh!

jan-wassenberg Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

MDevereau Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg commented Feb 12, 2026

Uh oh!

igorginzburg commented Feb 23, 2026

Uh oh!

jan-wassenberg commented Feb 25, 2026

Uh oh!

MDevereau commented Feb 26, 2026

Uh oh!

jan-wassenberg commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants