Releases · ProjectPhysX/OpenCL-Benchmark

16 Nov 11:28

ProjectPhysX

v1.6

1ece450

OpenCL-Benchmark v1.6 Latest

Latest

automatically use zero-copy buffers on CPUs/iGPUs to reduce memory footprint
bandwidth kernels now write non-zero data, to avoid hardware optimizations for zero-initialized buffers

Assets 4

18 Aug 09:37

ProjectPhysX

v1.5

7b264f9

OpenCL-Benchmark v1.5

enabled benchmarking FP16 vector arithmetic on Nvidia Pascal and newer GPUs with Nvidia driver 520 or newer
removed wait() call at the end of the benchmark on Linux

 |----------------.------------------------------------------------------------|
 | Device ID      | 9                                                          |
 | Device Name    | NVIDIA GeForce RTX 2080 Ti                                 |
 | Device Vendor  | NVIDIA Corporation                                         |
 | Device Driver  | 525.89.02 (Linux)                                          |
 | OpenCL Version | OpenCL C 1.2                                               |
 | Compute Units  | 68 at 1545 MHz (4352 cores, 13.448 TFLOPs/s)               |
 | Memory, Cache  | 11011 MB, 2176 KB global / 48 KB local                     |
 | Buffer Limits  | 2752 MB global, 64 KB constant                             |
 |----------------'------------------------------------------------------------|
 | Info: OpenCL C code successfully compiled.                                  |
 | FP64  compute                                         0.517 TFLOPs/s (1/24) |
 | FP32  compute                                        16.597 TFLOPs/s ( 1x ) |
-| FP16  compute                                          not supported        |
+| FP16  compute                                        33.054 TFLOPs/s ( 2x ) |
 | INT64 compute                                         3.563  TIOPs/s (1/4 ) |
 | INT32 compute                                        16.385  TIOPs/s ( 1x ) |
 | INT16 compute                                        13.286  TIOPs/s ( 1x ) |
 | INT8  compute                                        10.502  TIOPs/s (2/3 ) |
 | Memory Bandwidth ( coalesced read      )                        532.76 GB/s |
 | Memory Bandwidth ( coalesced      write)                        548.88 GB/s |
 | Memory Bandwidth (misaligned read      )                        534.43 GB/s |
 | Memory Bandwidth (misaligned      write)                        157.78 GB/s |
 | PCIe   Bandwidth (send                 )                         12.86 GB/s |
 | PCIe   Bandwidth (   receive           )                         12.99 GB/s |
 | PCIe   Bandwidth (        bidirectional)            (Gen4 x16)    6.30 GB/s |
 |-----------------------------------------------------------------------------|

Assets 4

03 Aug 06:32

ProjectPhysX

v1.4

c7e8987

OpenCL-Benchmark v1.4

updated OpenCL-Wrapper
GPU Driver and OpenCL Runtime installation instructions will be printed to console if no OpenCL devices are available

Assets 4

02 May 20:05

ProjectPhysX

v1.3

677d52f

OpenCL-Benchmark v1.3

workaround for Nvidia driver bug: enqueueFillBuffer is broken for large buffers on Nvidia GPUs
fixed slow numeric drift issues
fixed terrible performance on ARM GPUs by macro-replacing fused-multiply-add (fma) with a*b+c
added automatic OS detection in make.sh

Assets 4

07 Dec 19:11

ProjectPhysX

v1.2

0296687

OpenCL-Benchmark v1.2

corrected TFlops/s estimate for Intel Data Center GPU Max series
made correction of wrong memory reporting on Intel Arc more robust
made CPU/GPU buffer initialization significantly faster with std::fill and enqueueFillBuffer
added operating system info to OpenCL device driver version printout
bug fix in print_message() function in utilities.hpp