Skip to content

Releases: ROCm/rocFFT

rocFFT 1.0.31 for ROCm 6.3.0

03 Dec 19:49
3806d68
Compare
Choose a tag to compare

Added

  • rocfft-test now includes a --smoketest option.

  • Support for the gfx1151, gfx1200, and gfx1201 architectures.

  • Implemented experimental APIs to allow computing FFTs on data
    distributed across multiple MPI ranks. These APIS can be enabled with the
    ROCFFT_MPI_ENABLE CMake option. This option defaults to OFF.

    When ROCFFT_MPI_ENABLE is ON:

    • rocfft_plan_description_set_comm can be called to provide an
      MPI communicator to a plan description, which can then be passed
      to rocfft_plan_create. Each rank calls
      rocfft_field_add_brick to specify the layout of data bricks on
      that rank.

    • An MPI library with ROCm acceleration enabled is required at
      build time and at runtime.

Changed

  • Compilation uses amdclang++ instead of hipcc.
  • CLI11 replaces Boost Program Options as the command line parser for clients and samples.
  • Building with the address sanitizer option sets xnack+ on relevant GPU
    architectures and address-sanitizer support is added to runtime-compiled
    kernels.

rocFFT 1.0.30 for ROCm 6.2.4

06 Nov 19:55
7b4fa44
Compare
Choose a tag to compare

Added

  • GFX1151 Support

Optimized

  • Implemented 1D kernels for factorizable sizes > 1024 and < 2048.

Resolved issues

  • Fixed plan creation failure on some even-length real-complex transforms that use Bluestein's algorithm.

rocFFT 1.0.29 for ROCm 6.2.2

27 Sep 16:01
65aaf84
Compare
Choose a tag to compare

rocFFT code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

rocFFT 1.0.29 for ROCm 6.2.1

20 Sep 19:58
65aaf84
Compare
Choose a tag to compare

Optimizations

  • Implemented 1D kernels for factorizable sizes < 1024

rocFFT 1.0.28 for ROCm 6.2.0

02 Aug 16:15
7a8c475
Compare
Choose a tag to compare

Optimizations

  • Implemented multi-device transform for 3D pencil decomposition. Contiguous dimensions on input and output bricks
    are transformed locally, with global transposes to make remaining dimensions contiguous.

Changes

  • Randomly generated accuracy tests are now disabled by default; these can be enabled using
    the --nrand option (which defaults to 0).

rocFFT 1.0.27 for ROCm 6.1.2

04 Jun 16:53
30044d1
Compare
Choose a tag to compare

rocFFT code for ROCm 6.1.2 did not change. The library was rebuilt for the updated ROCm 6.1.2 stack.

rocFFT 1.0.27 for ROCm 6.1.1

08 May 18:00
30044d1
Compare
Choose a tag to compare

Fixes

  • Fixed kernel launch failure on execute of very large odd-length real-complex transforms.

Additions

  • Enable multi-gpu testing on systems without direct GPU-interconnects

rocFFT 1.0.26 for ROCm 6.1.0

16 Apr 19:10
30044d1
Compare
Choose a tag to compare

Changes

  • Multi-device FFTs now allow batch greater than 1
  • Multi-device, real-complex FFTs are now supported
  • rocFFT now statically links libstdc++ when only std::experimental::filesystem is available (to guard
    against ABI incompatibilities with newer libstdc++ libraries that include std::filesystem)

rocFFT 1.0.25 for ROCm 6.0.2

31 Jan 20:13
0ec78f1
Compare
Choose a tag to compare

rocFFT code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.

rocFFT 1.0.25 for ROCm 6.0.0

15 Dec 18:30
b9926b5
Compare
Choose a tag to compare

Added

  • Implemented experimental APIs to allow computing FFTs on data distributed across multiple devices in a single process.

    rocfft_field is a new type that can be added to a plan description, to describe layout of FFT input or output. rocfft_field_add_brick can be called one or more times to describe a brick decomposition of an FFT field, where each brick can be assigned a different device.

    These interfaces are still experimental and subject to change. We are interested to hear feedback on them. Questions and concerns may be raised by opening issues on the rocFFT issue tracker.

    Note that at this time, multi-device FFTs have several limitations:

    • Real-complex (forward or inverse) FFTs are not currently supported.
    • Planar format fields are not currently supported.
    • Batch (i.e. number_of_transforms provided to rocfft_plan_create) must be 1.
    • The FFT input is gathered to the current device at execute time, so all of the FFT data must fit on that device.

    We expect these limitations to be removed in future releases.

Optimizations

  • Improved performance of some small 2D/3D real FFTs supported by 2D_SINGLE kernel. gfx90a gets more optimization
    by offline tuning.
  • Removed an extra kernel launch from even-length real-complex FFTs that use callbacks.

Changed

  • Built kernels in solution-map to library kernel cache.

  • Real forward transforms (real-to-complex) no longer overwrite input. rocFFT still may overwrite real inverse (complex-to-real) input, as this allows for faster performance.

  • rocfft-rider and dyna-rocfft-rider have been renamed to rocfft-bench and dyna-rocfft-bench, controlled by the
    BUILD_CLIENTS_BENCH CMake option. Links for the old file names are installed, and the old
    BUILD_CLIENTS_RIDER CMake option is accepted for compatibility but both will be removed in a future release.

  • Binaries in debug builds no longer have a "-d" suffix.

Fixed

  • rocFFT now correctly handles load callbacks that convert data from a smaller data type (e.g. 16-bit integers -> 32-bit float).