TinyASM performance tweaks.

A recent experiment by @KarsKnook showed that the petsc ASM backend performing much better than TinyASM. These were in parallel on a node with other jobs being run, so exact numbers are not necessarily reliable (particularly the scatter and sf calls), but the concerning thing is that TinyASM wasn't any faster in the second solve, whereas PETSc was. This implies either that TinyASM isn't caching something it could cache, or the non-cachable computations are much more inefficient in TinyASM.

[flamegraph_lungs_simple.speedscope.json](https://github.com/user-attachments/files/22043243/flamegraph_lungs_simple.speedscope.json)
[flamegraph_lungs_simple_tinyasm.speedscope.json](https://github.com/user-attachments/files/22043242/flamegraph_lungs_simple_tinyasm.speedscope.json)

Two potential places where TinyASM is losing out are:

1. In PCSetup, the patch inverse is calculated with `getrf` and `getri`, then in PCApply the patch solution is calculated with a matvec. It may be more efficient to instead only calculate the factorisation in PCSetup with `getrf`, then in PCApply use `getrs` to calculate the factorised solve.
https://github.com/firedrakeproject/firedrake/blob/7e2ec2abfcd3079162364f422a0d87d4426853c1/tinyasm/tinyasm.cpp#L99

2. In PCApply there is a branch on the patch size to either use a handcoded matvec loop for patches with size<6, or use BLAS `gemv` for larger patches. This branch is on every single patch separately because each patch can be a different size.
   - branching in a tight loop should generally be avoided.
   -  for many problems the patches will almost all be almost identical sizes, so picking BLAS/handcoded could be a preprocessing step based on the average patch size, lifting the branch outside the tight loop.
   - The handcoded loop still has a runtime extent, so the compiler probably won't even unroll it anyway.
   - It's unclear if this was benchmarked to see if picking BLAS or handcoded does make a performance difference for small matrices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TinyASM performance tweaks. #4528

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TinyASM performance tweaks. #4528

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions