You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The /cpu/self/opt/* backends should implement their own version of diagonal/full assembly that assembles by element. A lot of the pieces are all there in the code, but spread out.
Current:
Assemble QFunction
for (elem in l-vec) Assemble Operator element
New:
for (elem in l-vec) {
Assemble QFunction element
Assemble Operator element
}
This is very similar to our approach with the operator application, except we would probably want to keep the block size set a 1 for simplicity. Then we can set /cpu/self/opt/serial as the operator fallback for /cpu/self/opt/blocked.
This would hopefully significantly decrease the assembly memory footprint (and speed things up) for the Opt, AVX, and XSMM backends.
The text was updated successfully, but these errors were encountered:
The
/cpu/self/opt/*
backends should implement their own version of diagonal/full assembly that assembles by element. A lot of the pieces are all there in the code, but spread out.Current:
New:
This is very similar to our approach with the operator application, except we would probably want to keep the block size set a 1 for simplicity. Then we can set
/cpu/self/opt/serial
as the operator fallback for/cpu/self/opt/blocked
.This would hopefully significantly decrease the assembly memory footprint (and speed things up) for the Opt, AVX, and XSMM backends.
The text was updated successfully, but these errors were encountered: