FP16 slower than BLAS_FP32 on Xeon Gold 6252 #2668

lucasreis1 · 2024-06-03T21:57:17Z

Hey!

I'm benchmarking the library and looking to use its FP16 implementation on a modern AVX512 CPU and I'm seeing embarrassingly more performance on the standard FP32 BLAS GEMM. Is this normal?

For context, here are some measures I took with different matrix sizes:

Note that I'm using the provided FP16Benchmark and comparing the output GFlops for each size.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 slower than BLAS_FP32 on Xeon Gold 6252 #2668

FP16 slower than BLAS_FP32 on Xeon Gold 6252 #2668

lucasreis1 commented Jun 3, 2024

FP16 slower than BLAS_FP32 on Xeon Gold 6252 #2668

FP16 slower than BLAS_FP32 on Xeon Gold 6252 #2668

Comments

lucasreis1 commented Jun 3, 2024