You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When profiling with perf, a huge amount of time (40-60% of the entire transform) seems to be spent in the very first "narrow SIMD" pass, where the stride isn't large enough to fill an entire SIMD vector. Right now there is an optimization for radix-4 on AVX, but even with that the performance is underwhelming.
The text was updated successfully, but these errors were encountered:
The prime factor algorithm is the name for a specific algorithm where no twiddle factors are necessary, because the inputs are reordered based on some number theory formulas. (Worth looking into btw - it could be faster because you significantly reduce the number of floating point operations, or slower because you have to do re-indexing).
A more accurate name for the algorithm in fourier might be mixed radix or cooley-tukey.
Right, I knew I had seen that name somewhere. I was looking for a more generic name since the distinction between Cooley-Tukey and Stockham autosort generally doesn't matter, I definitely meant mixed-radix! Thanks
calebzulawski
changed the title
Squeeze extra performance out of prime-factor algorithm
Squeeze extra performance out of mixed-radix algorithm
Jan 22, 2020
The PFA algorithm does interest me, though I wonder if the effort/complexity is worth it compared to the existing Bluestein's algorithm implementation.
When profiling with
perf
, a huge amount of time (40-60% of the entire transform) seems to be spent in the very first "narrow SIMD" pass, where the stride isn't large enough to fill an entire SIMD vector. Right now there is an optimization for radix-4 on AVX, but even with that the performance is underwhelming.The text was updated successfully, but these errors were encountered: