Single precision computations to improve speed

Have the authors of this repo considered using FP32 for the majority of the computations, instead of FP64? FP64 is the _de facto_ standard in scientific computing, but I've always been able to get by with FP32, provided proper considerations are taken. There are often opportunities to compute bulk integrals in FP32, then accumulate into FP64, or only perform critical parts of a program in FP64.

I believe this is appropriate and may be taken seriously, after finding multiple commits in the repo's history, about 2% or 30% gains in various places, due to small changes in the code. Mixed precision is a very important part of the design space that deserves consideration. I estimate a **1.5x improvement in whole-program execution time** on vector ALUs where FP32 throughput is 2x that of FP64 (I don't know how Fortran vectorizes the code; I typically work in languages with explicit vector types like `SIMD8<Float32>`).

As context, I did an analysis of the bottlenecks contributing to latency for 1 singlepoint with GFN2-xTB. I found that linear algebra kernels from external libraries consume 70% of the time (provided the integrals are properly parallelized with OpenMP). I am attempting to link xTB to a custom linear algebra library that casts the FP64 inputs to FP32 and does eigendecomposition in FP32, with faster speed. I expect issues, but with careful analysis the bugs or threshold violations can be fixed, making the program work.

https://github.com/philipturner/swift-xtb/blob/f887f2ca1dfad9c2c284764eb6377a7889004f60/Documentation/Archive/Experiments/OptimizingDiagonalization/OptimizingDiagonalization.md

However, I cannot rewrite the entire xTB codebase from scratch just to get the integrals ported to FP32. I expect that once <s>`dsyevd_`</s> `ssyevd_` is vastly sped up, integrals may be a non-negligible contributor to latency. In any case, every component of the program should be considered for latency reduction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Single precision computations to improve speed #1315

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Single precision computations to improve speed #1315

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions