Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Commit

Permalink
Update CHANGELOG.md and README.md for 1.16.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
alliepiper committed Feb 8, 2022
1 parent 3691562 commit 6155a90
Show file tree
Hide file tree
Showing 2 changed files with 120 additions and 2 deletions.
119 changes: 118 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,121 @@
# CUB 1.15.0
# CUB 1.16.0

## Summary

CUB 1.16.0 is a major release providing several improvements to the device scope
algorithms. `DeviceRadixSort` now supports large (64-bit indexed) input data. A
new `UniqueByKey` algorithm has been added to `DeviceSelect`.
`DeviceAdjacentDifference` provides new `SubtractLeft` and `SubtractRight`
functionality.

This release also deprecates several obsolete APIs, including type traits
and `BlockAdjacentDifference` algorithms. Many bugfixes and documentation
updates are also included.

### 64-bit Offsets in `DeviceRadixSort` Public APIs

Users frequently want to process large datasets using CUB’s device-scope
algorithms, but the current public APIs limit input data sizes to those that can
be indexed by a 32-bit integer. Beginning with this release, CUB is updating
these APIs to support 64-bit offsets, as discussed in NVIDIA/cub#212.

The device-scope algorithms will be updated with 64-bit offset support
incrementally, starting with the `cub::DeviceRadixSort` family of algorithms.
Thanks to @canonizer for contributing this functionality.

### New `DeviceSelect::UniqueByKey` Algorithm

`cub::DeviceSelect` now provides a `UniqueByKey` algorithm, which has been
ported from Thrust. Thanks to @zasdfgbnm for this contribution.

### New `DeviceAdjacentDifference` Algorithms

The new `cub::DeviceAdjacentDifference` interface, also ported from Thrust,
provides `SubtractLeft` and `SubtractRight` algorithms as CUB kernels.

## Deprecation Notices

### Synchronous CUDA Dynamic Parallelism Support

**A future version of CUB will change the `debug_synchronous` behavior of
device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).**

This will only affect calls to CUB device-scope algorithms launched from
device-side code with `debug_synchronous = true`. Such invocations will continue
to print extra debugging information, but they will no longer synchronize after
kernel launches.

### Deprecated Traits

CUB provided a variety of metaprogramming type traits in order to support C++03.
Since C++14 is now required, these traits have been deprecated in favor of their
STL equivalents, as shown below:

| Deprecated CUB Trait | Replacement STL Trait |
|-----------------------|-----------------------|
| cub::If | std::conditional |
| cub::Equals | std::is_same |
| cub::IsPointer | std::is_pointer |
| cub::IsVolatile | std::is_volatile |
| cub::RemoveQualifiers | std::remove_cv |
| cub::EnableIf | std::enable_if |

CUB now uses the STL traits internally, resulting in a ~6% improvement in
compile time.

### Misnamed `cub::BlockAdjacentDifference` APIs

The algorithms in `cub::BlockAdjacentDifference` have been deprecated, as their
names did not clearly describe their intent. The `FlagHeads` method is
now `SubtractLeft`, and `FlagTails` has been replaced by `SubtractRight`.

## Breaking Changes

- NVIDIA/cub#331: Deprecate the misnamed `BlockAdjacentDifference::FlagHeads`
and `FlagTails` methods. Use the new `SubtractLeft` and `SubtractRight`
methods instead.
- NVIDIA/cub#364: Deprecate some obsolete type traits. These should be replaced
by the equivalent traits in `<type_traits>` as described above.

## New Features

- NVIDIA/cub#331: Port the `thrust::adjacent_difference` kernel and expose it
as `cub::DeviceAdjacentDifference`.
- NVIDIA/cub#405: Port the `thrust::unique_by_key` kernel and expose it
as `cub::DeviceSelect::UniqueByKey`. Thanks to @zasdfgbnm for this
contribution.

## Enhancements

- NVIDIA/cub#340: Allow 64-bit offsets in `DeviceRadixSort` public APIs. Thanks
to @canonizer for this contribution.
- NVIDIA/cub#400: Implement a significant reduction in `DeviceMergeSort`
compilation time.
- NVIDIA/cub#415: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in
Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.

## Bug Fixes

- NVIDIA/cub#381: Fix shared memory alignment in `dyn_smem` example.
- NVIDIA/cub#393: Fix some collisions with the `min`/`max` macros defined
in `windows.h`.
- NVIDIA/cub#404: Fix bad cast in `util_device`.
- NVIDIA/cub#410: Fix CDP issues in `DeviceSegmentedSort`.
- NVIDIA/cub#411: Ensure that the `nv_exec_check_disable` pragma is only used on
nvcc.
- NVIDIA/cub#418: Fix `-Wsizeof-array-div` warning on gcc 11. Thanks to
@robertmaynard for this contribution.
- NVIDIA/cub#420: Fix new uninitialized variable warning in `DiscardIterator` on
gcc 10.
- NVIDIA/cub#423: Fix some collisions with the `small` macro defined
in `windows.h`.
- NVIDIA/cub#426: Fix some issues with version handling in CUB’s CMake packages.
- NVIDIA/cub#430: Remove documentation for `DeviceSpmv` parameters that are
absent from public APIs.
- NVIDIA/cub#432: Remove incorrect documentation for `DeviceScan` algorithms
that guaranteed run-to-run deterministic results for floating-point addition.

# CUB 1.15.0 (NVIDIA HPC SDK 22.1, CUDA Toolkit 11.6)

## Summary

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ See the [changelog](CHANGELOG.md) for details about specific releases.
| CUB Release | Included In |
| ------------------------- | --------------------------------------- |
| 1.15.0 | TBD |
| 1.16.0 | TBD |
| 1.15.0 | NVIDIA HPC SDK 22.1 & CUDA Toolkit 11.6 |
| 1.14.0 | NVIDIA HPC SDK 21.9 |
| 1.13.1 | CUDA Toolkit 11.5 |
| 1.13.0 | NVIDIA HPC SDK 21.7 |
Expand Down

0 comments on commit 6155a90

Please sign in to comment.