Releases · ROCm/rocBLAS

03 Dec 19:49

rocm-ci

rocm-6.3.0

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.0 Latest

Latest

Added

Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments

Changed

amdclang is used as the default compiler instead of hipcc
Internal performance scripts use amd-smi instead of the deprecated rocm-smi

Optimized

Improved performance of Level 2 gbmv
Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU

Resolved issues

Fixed stbsv_strided_batched_64 Fortran binding

Upcoming changes

rocblas_Xgemm_kernel_name APIs are deprecated

Assets 2

06 Nov 19:55

rocm-ci

rocm-6.2.4

3171316

rocBLAS 4.2.4 for ROCm 6.2.4

Additions

GFX1151 Support

Assets 2

27 Sep 16:01

rocm-ci

rocm-6.2.2

c6de034

rocBLAS 4.2.1 for ROCm 6.2.2

rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

Assets 2

20 Sep 19:58

rocm-ci

rocm-6.2.1

c6de034

rocBLAS 4.2.1 for ROCm 6.2.1

Removals

Remove Device_Memory_Allocation.pdf link in documentation

Fixes

Fixed error/warn message during rocblas_set_stream() call

Assets 2

02 Aug 16:15

rocm-ci

rocm-6.2.0

54f305c

rocBLAS 4.2.0 for ROCm 6.2.0

Additions

Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
Benchmark class for common timing code
An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types

Optimizations

Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU

Changes

Linux AOCL dependency updated to release 4.2 gcc build
Windows vcpkg dependencies updated to release 2024.02.14
Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40

Deprecations

rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt

Assets 2

04 Jun 16:53

rocm-ci

rocm-6.1.2

8443539

rocBLAS 4.1.2 for ROCm 6.1.2

Fixes

Fixes BF16 TT get_solutions

Optimizations

Tune gfx942 BBS TN, TT

Assets 2

08 May 18:00

rocm-ci

rocm-6.1.1

5b85f2d

rocBLAS 4.1.0 for ROCm 6.1.1

rocBLAS code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.

Assets 2

16 Apr 19:10

rocm-ci

rocm-6.1.0

cefa4a9

rocBLAS 4.1.0 for ROCm 6.1.0

Additions

Level 1 and Level 1 Extension functions have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments.
Cache flush timing for gemm_ex.

Changes

Some Level 2 function argument names have changed 'm' to 'n' to match legacy BLAS, there was no change in implementation.
Standardized the use of non-blocking streams for copying results from device to host.

Fixes

Fixed host-pointer mode reductions for non-blocking streams.

Assets 2

31 Jan 20:12

rocm-ci

rocm-6.0.2

88df972

rocBLAS 4.0.0 for ROCm 6.0.2

rocBLAS code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.

Assets 2

15 Dec 18:30

rocm-ci

rocm-6.0.0

88df972

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Addition of beta API rocblas_gemm_batched_ex3 and rocblas_gemm_strided_batched_ex3
Added input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and gemv_strided_batched
Added rocblas_status_excluded_from_build to be used when calling functions which require Tensile when using rocBLAS built without Tensile
Added system for async kernel launches setting a failure rocblas_status based on hipPeekAtLastError discrepancy

Optimized

Trsm performance for small sizes m < 32 && n < 32

Deprecated

In a future release atomic operations will be disabled by default so results will be repeatable. Atomic operations can always be enabled or disabled using the function rocblas_set_atomics_mode. Enabling atomic operations can improve performance.

Removed

rocblas_gemm_ext2 API function is removed
in-place trmm API from Legacy BLAS is removed. It is replaced by an API that supports both in-place and out-of-place trmm
int8x4 support is removed. int8 support is unchanged
The #define STDC_WANT_IEC_60559_TYPES_EXT has been removed from rocblas-types.h. Users who want ISO/IEC TS 18661-3:2015 functionality must define STDC_WANT_IEC_60559_TYPES_EXT before including float.h, math.h, and rocblas.h
The default build removes device code for gfx803 architecture from the fat binary

Fixed

Make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimension or increment potentially causing overflow:
- Level2: gbmv, gemv, hbmv, sbmv, spmv, tbmv, tpmv, tbsv, tpsv
Lazy loading to support heterogeneous architecture setup and load appropriate tensile library files based on the device's architecture
Guard against no-op kernel launches resulting in potential hipGetLastError

Changed

Default verbosity of rocblas-test reduced. To see all tests set environment variable GTEST_LISTENER=PASS_LINE_IN_LOG

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Optimized

Resolved issues

Upcoming changes

Additions

Removals

Fixes

Additions

Optimizations

Changes

Deprecations

Fixes

Optimizations

Additions

Changes

Fixes

Added

Optimized

Deprecated

Removed

Fixed

Changed

Releases: ROCm/rocBLAS

rocBLAS 4.3.0 for ROCm 6.3.0

Added

Changed

Optimized

Resolved issues

Upcoming changes

rocBLAS 4.2.4 for ROCm 6.2.4

Additions

rocBLAS 4.2.1 for ROCm 6.2.2

rocBLAS 4.2.1 for ROCm 6.2.1

Removals

Fixes

rocBLAS 4.2.0 for ROCm 6.2.0

Additions

Optimizations

Changes

Deprecations

rocBLAS 4.1.2 for ROCm 6.1.2

Fixes

Optimizations

rocBLAS 4.1.0 for ROCm 6.1.1

rocBLAS 4.1.0 for ROCm 6.1.0

Additions

Changes

Fixes

rocBLAS 4.0.0 for ROCm 6.0.2

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Optimized

Deprecated

Removed

Fixed

Changed