You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 21, 2024. It is now read-only.
List of individual changes:
- Fixed test errors
- OffsetT == unsigned long long for the 64-bit case
- using std::{is_same,conditional}
- using "portion" consistently for 2^28-2^30-sized chunks of the input array
- HasEnoughMemory() takes overwrite into account.
- moved checking for enough memory earlier.
- added a CTA_SYNC() to the histogram kernel
- disabled tests with NumItemsT != int for segmented sort
- testing with 4.5 bln. items
- tests for different NumItemsT
- NumItemsT for all device sorting functions
- wrapped ChooseOffsetT into namespace detail
- fixed typos
- templatized the type of num_items in 2 methods of DeviceRadixSort
- tuned radix sort with 64-bit OffsetT for V100
- tuned for 64-bit OffsetT for A100
- separate tuning parameters for 64-bit OffsetT
- improved downsweep policy for GP100
- option for 64-bit num_items with 32-bit shared memory histogram counters.
- introduced PartOffsetT into Onesweep kernel.
- OffsetT is now only used for offsets into the whole array
(e.g. bin counts or global read/write offsets)
- PartOffsetT is used for offsets that do not exceed a single part
(e.g. decoupled look-back, block index, number of items inside a part)
- this fixes problems when OffsetT is unsigned, and also contributes
towards supporting 64-bit num_items
0 commit comments