You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm benchmarking the library and looking to use its FP16 implementation on a modern AVX512 CPU and I'm seeing embarrassingly more performance on the standard FP32 BLAS GEMM. Is this normal?
For context, here are some measures I took with different matrix sizes:
Note that I'm using the provided FP16Benchmark and comparing the output GFlops for each size.
The text was updated successfully, but these errors were encountered:
Hey!
I'm benchmarking the library and looking to use its FP16 implementation on a modern AVX512 CPU and I'm seeing embarrassingly more performance on the standard FP32 BLAS GEMM. Is this normal?
For context, here are some measures I took with different matrix sizes:
Note that I'm using the provided
FP16Benchmark
and comparing the output GFlops for each size.The text was updated successfully, but these errors were encountered: