You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I benchmarked the sgemm (cblas_sgemm) implementation of NVPL BLAS against that of OpenBLAS and I noticed perf regressions on matrices of certain sizes. (Data and Code below)
Setup configuration:
Arm Neoverse V2 machine with 16 cores, all of which were utilized during the benchmark.
#!/usr/bin/python
import os
import sys
import time
import numpy
from numpy.random import randn
def run_sgemm(l):
M = 2
while M <= 64:
N = 2
while N <= 64:
K = 2
while K <= 64:
A = randn(M,K).astype('float32')
B = randn(K,N).astype('float32')
start = time.time();
for i in range(0,l):
ref = numpy.dot(A,B)
end = time.time()
timediff = (end -start)
mflops = ( 2*N*N*N) *l / timediff
mflops *= 1e-6
size = "%dx%d, %dx%d" % (M,K,K,N)
print("%14s :\t%20f MFlops\t%20f ms" % (size,mflops,timediff*1e3 ))
K *= 2 # Multiplicative increment for K
N *= 2 # Multiplicative increment for N
M *= 2 # Multiplicative increment for M
if __name__ == "__main__":
N=2
NMAX=16
NINC=2
LOOPS=1000
z=0
for arg in sys.argv:
if z == 1:
N = int(arg)
elif z == 2:
NMAX = int(arg)
elif z == 3:
NINC = int(arg)
elif z == 4:
LOOPS = int(arg)
z = z + 1
if 'OPENBLAS_LOOPS' in os.environ:
p = os.environ['OPENBLAS_LOOPS']
if p:
LOOPS = int(p);
print("From: %d To: %d Step=%d Loops=%d" % (N, NMAX, NINC, LOOPS))
print("\tSIZE\t\t\tFlops\t\t\t\t\tTime")
run_sgemm(LOOPS)
The text was updated successfully, but these errors were encountered:
Hi community,
I benchmarked the sgemm (cblas_sgemm) implementation of NVPL BLAS against that of OpenBLAS and I noticed perf regressions on matrices of certain sizes. (Data and Code below)
Setup configuration:
I'd like to know if these perf numbers are expected, or if there are any configs or flags that can be enabled to tune the performance of NVPL BLAS.
Note that for other configurations of matrix sizes, NVPL performed better, and even had a lower median latency overall.
Perf comparison between OpenBLAS and NVPL BLAS
Code for sgemm on NVPL BLAS
A modified version of the sgemm.c
Code for sgemm on OpenBLAS
A modified version of the sgemm.py
The text was updated successfully, but these errors were encountered: