Updated the get mask utility for ~10% performance gain on apple silicon by lsawade · Pull Request #1668 · PrincetonUniversity/SPECFEMPP

lsawade · 2026-02-25T20:14:04Z

Description

The mask is created for every single chunk/gll combination but for 99% of chunks the mask is true for all!

Problem

Every masked SIMD kernel call constructed its mask via a per-lane lambda, even for the ~99% of chunks that are full and trivially all-true:

// runs on every chunk — opaque to optimizer, can't constant-fold
mask_type mask([&](std::size_t lane) { return int(lane) < number_elements; });

This appeared hot in 13 call sites across field I/O, Jacobians, and boundary conditions.

Fix

Added get_mask<simd_type>() to point::index and point::assembly_index to fast-path the common case:

if (number_elements >= simd_type::size())
    return mask_type(true);  // single vmov.i32 on NEON; branch is ~always taken
return mask_type([&](std::size_t lane) { return int(lane) < number_elements; });

All 13 sites updated. Partial-chunk behavior is unchanged.

Result


Runtime (Apple M3, 2D fluid-solid benchmark)	~10% faster
Correctness impact	None

Issue Number

If there is an issue created for these changes, link it here

Checklist

Please make sure to check developer documentation on specfem docs.

I ran the code through pre-commit to check style
THE DOCUMENTATION BUILDS WITHOUT WARNINGS/ERRORS
I have added labels to the PR (see right hand side of the PR page)
My code passes all the integration tests
I have added sufficient unittests to test my changes
I have added/updated documentation for the changes I am proposing
I have updated CMakeLists to ensure my code builds
My code builds across all platforms

Updated the get mask utility for 10% performance gain

f428da4

lsawade requested review from Rohit-Kakodkar and icui and removed request for Rohit-Kakodkar February 25, 2026 20:14

lsawade changed the title ~~Updated the get mask utility for ~10% performance gain~~ Updated the get mask utility for ~10% performance gain on apple silicon Feb 25, 2026

lsawade requested a review from Rohit-Kakodkar February 27, 2026 12:30

Rohit-Kakodkar approved these changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated the get mask utility for ~10% performance gain on apple silicon#1668

Updated the get mask utility for ~10% performance gain on apple silicon#1668
lsawade wants to merge 1 commit intodevelfrom
simd-all-true-mask

lsawade commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lsawade commented Feb 25, 2026

Description

Problem

Fix

Result

Issue Number

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants