Implement `cudax::async_mdarray` #3095

miscco · 2024-12-09T16:43:28Z

This is a derivation of the current mdarray proposal.

To cater to our heterogeneous use cases we require an environment to be passed to the constructor of the mdspan.

This replaces the container template argument in the current proposal. In contrast to cudax::async_vector
cudax::async_mdarray does not offer any APIs that change it size.

cudax/include/cuda/experimental/__container/async_mdarray.cuh

miscco · 2024-12-09T18:37:34Z

cudax/test/containers/async_mdarray/constructor.cu

+  cudax::stream stream{};
+  Env env{Resource{}, stream};
+
+  SECTION("Construction with explicit size")


@pciolkosz : We should also consider CTAD and an make_mdarray function

github-actions · 2024-12-09T18:38:51Z

🟨 CI finished in 1h 53m: Pass: 98%/168 | Total: 1d 15h | Avg: 14m 03s | Max: 1h 31m | Hits: 10%/22034

🟨 cudax: Pass: 88%/26 | Total: 2h 32m | Avg: 5m 51s | Max: 19m 20s

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  86%/22  | Total:  2h 19m | Avg:  6m 20s | Max: 19m 20s
  🟩 arm64              Pass: 100%/4   | Total: 12m 57s | Avg:  3m 14s | Max:  3m 18s
🔍 jobs: Build 🔍
  🔍 Build              Pass:  87%/24  | Total:  1h 53m | Avg:  4m 44s | Max: 11m 23s
  🟩 Test               Pass: 100%/2   | Total: 38m 37s | Avg: 19m 18s | Max: 19m 20s
🟨 ctk
  🟨 12.0               Pass:  33%/3   | Total: 17m 53s | Avg:  5m 57s | Max: 11m 23s
  🟩 12.5               Pass: 100%/2   | Total: 18m 53s | Avg:  9m 26s | Max:  9m 52s
  🟨 12.6               Pass:  95%/21  | Total:  1h 55m | Avg:  5m 30s | Max: 19m 20s
🟨 cudacxx
  🟨 nvcc12.0           Pass:  33%/3   | Total: 17m 53s | Avg:  5m 57s | Max: 11m 23s
  🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 53s | Avg:  9m 26s | Max:  9m 52s
  🟨 nvcc12.6           Pass:  95%/21  | Total:  1h 55m | Avg:  5m 30s | Max: 19m 20s
🟨 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
  🟩 Clang10            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 48s | Avg:  3m 48s | Max:  3m 48s
  🟩 Clang12            Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s
  🟩 Clang13            Pass: 100%/1   | Total:  3m 46s | Avg:  3m 46s | Max:  3m 46s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 42s | Avg:  3m 42s | Max:  3m 42s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 35s | Avg:  3m 35s | Max:  3m 35s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
  🟩 Clang18            Pass: 100%/4   | Total: 29m 54s | Avg:  7m 28s | Max: 19m 17s
  🟥 GCC9               Pass:   0%/1   | Total:  3m 07s | Avg:  3m 07s | Max:  3m 07s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
  🟩 GCC11              Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s
  🟩 GCC12              Pass: 100%/2   | Total: 23m 17s | Avg: 11m 38s | Max: 19m 20s
  🟩 GCC13              Pass: 100%/4   | Total: 12m 47s | Avg:  3m 11s | Max:  3m 18s
  🟥 MSVC14.36          Pass:   0%/1   | Total: 11m 23s | Avg: 11m 23s | Max: 11m 23s
  🟥 MSVC14.39          Pass:   0%/1   | Total: 11m 21s | Avg: 11m 21s | Max: 11m 21s
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 53s | Avg:  9m 26s | Max:  9m 52s
🟨 cxx_family
  🟩 Clang              Pass: 100%/13  | Total:  1h 03m | Avg:  4m 53s | Max: 19m 17s
  🟨 GCC                Pass:  88%/9   | Total: 47m 16s | Avg:  5m 15s | Max: 19m 20s
  🟥 MSVC               Pass:   0%/2   | Total: 22m 44s | Avg: 11m 22s | Max: 11m 23s
  🟩 NVHPC              Pass: 100%/2   | Total: 18m 53s | Avg:  9m 26s | Max:  9m 52s
🟨 cudacxx_family
  🟨 nvcc               Pass:  88%/26  | Total:  2h 32m | Avg:  5m 51s | Max: 19m 20s
🟨 gpu
  🟨 v100               Pass:  88%/26  | Total:  2h 32m | Avg:  5m 51s | Max: 19m 20s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
  🟩 90a                Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
🟨 std
  🟨 17                 Pass:  83%/6   | Total: 25m 12s | Avg:  4m 12s | Max:  9m 01s
  🟨 20                 Pass:  90%/20  | Total:  2h 07m | Avg:  6m 21s | Max: 19m 20s

🟩 libcudacxx: Pass: 100%/48 | Total: 9h 18m | Avg: 11m 38s | Max: 41m 05s | Hits: 3%/9746

🟩 cpu
  🟩 amd64              Pass: 100%/46  | Total:  9h 11m | Avg: 11m 59s | Max: 41m 05s | Hits:   3%/9746  
  🟩 arm64              Pass: 100%/2   | Total:  7m 12s | Avg:  3m 36s | Max:  3m 50s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total: 53m 54s | Avg:  7m 42s | Max: 35m 31s | Hits:   3%/2213  
  🟩 12.5               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 27s
  🟩 12.6               Pass: 100%/39  | Total:  7h 20m | Avg: 11m 17s | Max: 41m 05s | Hits:   3%/7533  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 03m | Avg: 15m 54s | Max: 20m 14s
  🟩 nvcc11.1           Pass: 100%/7   | Total: 53m 54s | Avg:  7m 42s | Max: 35m 31s | Hits:   3%/2213  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 27s
  🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 16m | Avg: 10m 46s | Max: 41m 05s | Hits:   3%/7533  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 03m | Avg: 15m 54s | Max: 20m 14s
  🟩 nvcc               Pass: 100%/44  | Total:  8h 14m | Avg: 11m 14s | Max: 41m 05s | Hits:   3%/9746  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 15m 42s | Avg:  3m 55s | Max:  4m 48s
  🟩 Clang10            Pass: 100%/1   | Total:  4m 40s | Avg:  4m 40s | Max:  4m 40s
  🟩 Clang11            Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s
  🟩 Clang12            Pass: 100%/1   | Total:  3m 58s | Avg:  3m 58s | Max:  3m 58s
  🟩 Clang13            Pass: 100%/1   | Total:  4m 10s | Avg:  4m 10s | Max:  4m 10s
  🟩 Clang14            Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
  🟩 Clang15            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
  🟩 Clang18            Pass: 100%/8   | Total:  1h 40m | Avg: 12m 32s | Max: 24m 26s
  🟩 GCC6               Pass: 100%/2   | Total:  5m 42s | Avg:  2m 51s | Max:  2m 52s
  🟩 GCC7               Pass: 100%/2   | Total:  6m 32s | Avg:  3m 16s | Max:  3m 24s
  🟩 GCC8               Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
  🟩 GCC9               Pass: 100%/3   | Total:  9m 18s | Avg:  3m 06s | Max:  3m 38s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 07s | Avg:  4m 07s | Max:  4m 07s
  🟩 GCC13              Pass: 100%/10  | Total:  2h 17m | Avg: 13m 45s | Max: 29m 55s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 24m 05s | Avg: 24m 05s | Max: 24m 05s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 35m 31s | Avg: 35m 31s | Max: 35m 31s | Hits:   3%/2213  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 35m 54s | Avg: 35m 54s | Max: 35m 54s | Hits:   3%/2462  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 15m | Avg: 37m 31s | Max: 41m 05s | Hits:   3%/5071  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 27s
🟩 cxx_family
  🟩 Clang              Pass: 100%/20  | Total:  2h 30m | Avg:  7m 30s | Max: 24m 26s
  🟩 GCC                Pass: 100%/21  | Total:  2h 53m | Avg:  8m 16s | Max: 29m 55s
  🟩 Intel              Pass: 100%/1   | Total: 24m 05s | Avg: 24m 05s | Max: 24m 05s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 26m | Avg: 36m 37s | Max: 41m 05s | Hits:   3%/9746  
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 27s
🟩 gpu
  🟩 v100               Pass: 100%/48  | Total:  9h 18m | Avg: 11m 38s | Max: 41m 05s | Hits:   3%/9746  
🟩 jobs
  🟩 Build              Pass: 100%/41  | Total:  6h 52m | Avg: 10m 03s | Max: 41m 05s | Hits:   3%/9746  
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 35m | Avg: 23m 55s | Max: 29m 55s
  🟩 Test               Pass: 100%/2   | Total: 48m 41s | Avg: 24m 20s | Max: 24m 26s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 11m 47s | Avg: 11m 47s | Max: 11m 47s
  🟩 90a                Pass: 100%/2   | Total: 16m 10s | Avg:  8m 05s | Max: 12m 13s
🟩 std
  🟩 11                 Pass: 100%/6   | Total: 32m 37s | Avg:  5m 26s | Max: 17m 04s
  🟩 14                 Pass: 100%/5   | Total:  1h 14m | Avg: 14m 56s | Max: 35m 31s | Hits:   3%/2213  
  🟩 17                 Pass: 100%/13  | Total:  3h 23m | Avg: 15m 37s | Max: 35m 54s | Hits:   3%/4924  
  🟩 20                 Pass: 100%/23  | Total:  4h 06m | Avg: 10m 42s | Max: 41m 05s | Hits:   3%/2609

🟩 thrust: Pass: 100%/46 | Total: 13h 22m | Avg: 17m 27s | Max: 1h 31m | Hits: 20%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 33m 21s | Avg: 16m 40s | Max: 26m 08s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total: 13h 12m | Avg: 18m 01s | Max:  1h 31m | Hits:  20%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  5m 22s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 41m | Avg: 14m 27s | Max:  1h 13m | Hits:   0%/1852  
  🟩 12.5               Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 31m
  🟩 12.6               Pass: 100%/37  | Total:  9h 00m | Avg: 14m 35s | Max:  1h 11m | Hits:  25%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  5m 03s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 41m | Avg: 14m 27s | Max:  1h 13m | Hits:   0%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 31m
  🟩 nvcc12.6           Pass: 100%/35  | Total:  8h 50m | Avg: 15m 09s | Max:  1h 11m | Hits:  25%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  5m 03s
  🟩 nvcc               Pass: 100%/44  | Total: 13h 12m | Avg: 18m 01s | Max:  1h 31m | Hits:  20%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 20m 24s | Avg:  5m 06s | Max:  6m 33s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 26s | Avg:  6m 26s | Max:  6m 26s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 04s | Avg:  5m 04s | Max:  5m 04s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 07s | Avg:  5m 07s | Max:  5m 07s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 40s | Avg:  5m 40s | Max:  5m 40s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
  🟩 Clang18            Pass: 100%/7   | Total: 46m 42s | Avg:  6m 40s | Max: 14m 05s
  🟩 GCC6               Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  5m 13s
  🟩 GCC7               Pass: 100%/2   | Total:  9m 57s | Avg:  4m 58s | Max:  5m 07s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
  🟩 GCC9               Pass: 100%/3   | Total: 14m 21s | Avg:  4m 47s | Max:  5m 24s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 51m | Avg: 13m 57s | Max: 40m 49s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 54m 35s | Avg: 54m 35s | Max: 54m 35s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 13m | Avg:  1h 13m | Max:  1h 13m | Hits:   0%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 11m | Avg:  1h 11m | Max:  1h 11m | Hits:   0%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 42m | Avg: 54m 15s | Max:  1h 10m | Hits:  33%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 31m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  1h 51m | Avg:  5m 51s | Max: 14m 05s
  🟩 GCC                Pass: 100%/19  | Total:  2h 47m | Avg:  8m 49s | Max: 40m 49s
  🟩 Intel              Pass: 100%/1   | Total: 54m 35s | Avg: 54m 35s | Max: 54m 35s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 07m | Avg:  1h 01m | Max:  1h 13m | Hits:  20%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 41m | Avg:  1h 20m | Max:  1h 31m
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total: 13h 22m | Avg: 17m 27s | Max:  1h 31m | Hits:  20%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total: 11h 50m | Avg: 17m 45s | Max:  1h 31m | Hits:   0%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 38m 17s | Avg: 12m 45s | Max: 22m 32s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 53m 58s | Avg: 17m 59s | Max: 26m 08s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 24s | Avg:  4m 24s | Max:  4m 24s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 23m 00s | Avg:  4m 36s | Max:  5m 16s
  🟩 14                 Pass: 100%/4   | Total:  1h 30m | Avg: 22m 41s | Max:  1h 13m | Hits:   0%/1852  
  🟩 17                 Pass: 100%/12  | Total:  6h 04m | Avg: 30m 23s | Max:  1h 31m | Hits:   0%/3704  
  🟩 20                 Pass: 100%/23  | Total:  4h 50m | Avg: 12m 38s | Max:  1h 10m | Hits:  49%/3704

🟩 cub: Pass: 100%/45 | Total: 13h 28m | Avg: 17m 57s | Max: 1h 23m | Hits: 0%/3028

🟩 cpu
  🟩 amd64              Pass: 100%/43  | Total: 13h 17m | Avg: 18m 32s | Max:  1h 23m | Hits:   0%/3028  
  🟩 arm64              Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 43s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 27m | Avg: 12m 33s | Max:  1h 01m | Hits:   0%/757   
  🟩 12.5               Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 02m
  🟩 12.6               Pass: 100%/36  | Total:  9h 54m | Avg: 16m 30s | Max:  1h 23m | Hits:   0%/2271  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 33s | Avg:  4m 16s | Max:  4m 18s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 27m | Avg: 12m 33s | Max:  1h 01m | Hits:   0%/757   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 02m
  🟩 nvcc12.6           Pass: 100%/34  | Total:  9h 45m | Avg: 17m 13s | Max:  1h 23m | Hits:   0%/2271  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 33s | Avg:  4m 16s | Max:  4m 18s
  🟩 nvcc               Pass: 100%/43  | Total: 13h 19m | Avg: 18m 35s | Max:  1h 23m | Hits:   0%/3028  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 21m 02s | Avg:  5m 15s | Max:  6m 02s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 06s | Avg:  5m 06s | Max:  5m 06s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 04s | Avg:  5m 04s | Max:  5m 04s
  🟩 Clang14            Pass: 100%/1   | Total:  6m 07s | Avg:  6m 07s | Max:  6m 07s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 02m | Avg:  8m 56s | Max: 20m 29s
  🟩 GCC6               Pass: 100%/2   | Total:  8m 38s | Avg:  4m 19s | Max:  4m 35s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 30s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s
  🟩 GCC9               Pass: 100%/3   | Total: 14m 02s | Avg:  4m 40s | Max:  5m 43s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
  🟩 GCC13              Pass: 100%/8   | Total:  2h 52m | Avg: 21m 37s | Max:  1h 23m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 59m 54s | Avg: 59m 54s | Max: 59m 54s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:   0%/757   
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m | Hits:   0%/757   
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 13m | Hits:   0%/1514  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 02m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 09m | Avg:  6m 47s | Max: 20m 29s
  🟩 GCC                Pass: 100%/19  | Total:  3h 47m | Avg: 11m 59s | Max:  1h 23m
  🟩 Intel              Pass: 100%/1   | Total: 59m 54s | Avg: 59m 54s | Max: 59m 54s
  🟩 MSVC               Pass: 100%/4   | Total:  4h 25m | Avg:  1h 06m | Max:  1h 13m | Hits:   0%/3028  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 02m
🟩 gpu
  🟩 v100               Pass: 100%/45  | Total: 13h 28m | Avg: 17m 57s | Max:  1h 23m | Hits:   0%/3028  
🟩 jobs
  🟩 Build              Pass: 100%/39  | Total: 10h 19m | Avg: 15m 53s | Max:  1h 13m | Hits:   0%/3028  
  🟩 DeviceLaunch       Pass: 100%/1   | Total:  1h 23m | Avg:  1h 23m | Max:  1h 23m
  🟩 GraphCapture       Pass: 100%/1   | Total: 23m 42s | Avg: 23m 42s | Max: 23m 42s
  🟩 HostLaunch         Pass: 100%/2   | Total: 35m 50s | Avg: 17m 55s | Max: 17m 59s
  🟩 TestGPU            Pass: 100%/2   | Total: 44m 58s | Avg: 22m 29s | Max: 24m 29s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 12s | Avg:  4m 12s | Max:  4m 12s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 23m 08s | Avg:  4m 37s | Max:  6m 02s
  🟩 14                 Pass: 100%/4   | Total:  1h 17m | Avg: 19m 28s | Max:  1h 01m | Hits:   0%/757   
  🟩 17                 Pass: 100%/12  | Total:  5h 05m | Avg: 25m 26s | Max:  1h 13m | Hits:   0%/1514  
  🟩 20                 Pass: 100%/24  | Total:  6h 41m | Avg: 16m 44s | Max:  1h 23m | Hits:   0%/757

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 16s | Avg: 5m 38s | Max: 9m 08s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  9m 08s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
  🟩 Test               Pass: 100%/1   | Total:  9m 08s | Avg:  9m 08s | Max:  9m 08s

🟩 python: Pass: 100%/1 | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 168)

#	Runner
124	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`

mhoemmen · 2024-12-09T19:12:49Z

cudax/include/cuda/experimental/__container/heterogeneous_iterator.cuh

+//!
+//! @endrst
+//! @tparam _Tp The underlying type of the elements the \c heterogeneous_iterator points at.
+//! @tparam _IsConst Boolean, if false the \c heterogeneous_iterator allows mutating the element pointed to.


Should we consider making this an enum class instead of a bool, so that users can say (e.g.,) heterogeneous_iterator<T, access::read_write, Properties...>? This would help make code more self-documenting and avoid possible confusion with other template parameters in Properties....

mhoemmen · 2024-12-09T19:27:06Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+                "mdspan's Extents template parameter must be a specialization of _CUDA_VSTD::extents.");
+
+  // At least one of the properties must signal an execution space
+  static_assert(_CUDA_VMR::__contains_execution_space_property<_Properties...>,


Just a suggestion: Kokkos::View also needs to extract properties from a parameter pack. It takes the approach of inheriting from a "ViewTraits" class that extracts out the needed properties and does compile-time error checking. Taking this approach in mdarray would have two advantages.

It would process the parameter pack all in one place, which might improve build times.

It might would make the class definition shorter and easier to read.

We can definitely move that out later, but for the PoC I would keep it as is

mhoemmen · 2024-12-09T19:27:50Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+      ? (__is_host_only ? cudaMemcpyHostToHost : cudaMemcpyHostToDevice)
+      : (__is_host_only ? cudaMemcpyDeviceToHost : cudaMemcpyDeviceToDevice);
+
+  //! @brief Helper to return an async_resource_ref to the currently used resource. Used to grow the async_mdarray


Regarding "[u]sed to grow the async_mdarray," are you planning to make async_mdarray resizable after construction?

This is a remnant from copying the code from vector.

Currently if we want to be able to assign one mdarray to another we need the ability to reallocate storage because they might have different sizes / memory resources

So I would say we do not want to make it resizable but want to be able t assign

mhoemmen · 2024-12-09T19:32:59Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  {
+    if (__other.size() != 0)
+    {
+      this->__copy_same(__other.__unwrapped_begin(), __other.__unwrapped_end(), __unwrapped_begin());


If the layout is not exhaustive, then this will copy elements that do not belong to the input. If the layout is not unique, then this may copy elements multiple times. While creating an mdarray with a nonexhaustive layout is a bit weird, nothing here prevents users from doing that.

In general, mdarray (as defined in P1684) cannot be implemented without an mdspan copy algorithm.

mhoemmen · 2024-12-09T19:35:51Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+
+  //! @brief Constructs an empty async_mdarray using an environment
+  //! @param __env The environment providing the needed information
+  //! @note No memory is allocated.


If "[n]o memory is allocated," then does this mean that the extents are all zero? If so, then what would happen if all extents are compile-time constants, e.g., extents<int, 3, 4>?

So currently all constructors are taken from the one dimensional vector case.

In case we have fully static dimensions we should SFINAE that constructor away

mhoemmen · 2024-12-09T19:36:56Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @param __mr The memory resource to allocate the async_mdarray with.
+  //! @param __size The size of the async_mdarray. Defaults to zero
+  //! @note If `__size == 0` then no memory is allocated.
+  _CCCL_HIDE_FROM_ABI explicit async_mdarray(const __env_t& __env, const size_type __size)


What does it mean to initialize a multidimensional array with a single integer size?

Note that layouts don't have to be exhaustive or unique, so we may have required_span_size() (the actual required allocation size) not equal to size() (the product of the extents).

Btw, I'm guessing that this is a left-over constructor from implementing an "async_vector."

This is a constructor that only initializes a one dimensional mdspan with dynamic extents

Normally there wouldn't be a separate constructor for the rank-1 case. The constructor would just take a constrained pack of index types, like this.

template<class... OtherIndexTypes> require( (std::is_convertible_v<OtherIndexTypes, index_type> && ...) && (std::is_nothrow_constructible_v<index_type, OtherIndexTypes> && ...) && std::is_constructible_v<extents_type, OtherIndexTypes...> && std::is_constructible_v<mapping_type, extents_type> ) explicit constexpr async_mdarray(const __ent_t& __env, OtherIndexTypes... exts);

mhoemmen · 2024-12-09T19:47:46Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @param __mr The memory resource to allocate the async_mdarray with.
+  //! @param __ilist The initializer_list being copied into the async_mdarray.
+  //! @note If `__ilist.size() == 0` then no memory is allocated
+  _CCCL_HIDE_FROM_ABI async_mdarray(const __env_t& __env, _CUDA_VSTD::initializer_list<_Tp> __ilist)


If we want construction from an initializer_list, should we consider construction from a rank()-nested initializer_list, so that

a. users would not need to guess what a flat initializer_list<T> means for a multidimensional array (e.g., does its order depend on the layout?);

b. users could express multidimensional arrays' initial values directly in C++ code; and

c. we could provide a deduction guide that deduces the rank automatically?

Please see the relevant section of P3308R0, "Consider adding construction from nested initializer list,", for details.

I am a bit doubtfull we want that before it has made its way into the standard.

Also I am sceptical if this is possible with dynamic extents due to potential conflicts with list initialization of the stored type.

On the other hand direct-list initialization like here is super useful on its own, so I would definitely want to keep that in

cudax/include/cuda/experimental/__container/async_mdarray.cuh

mhoemmen · 2024-12-09T20:06:46Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+
+  //! @}
+
+  //! @addtogroup iterators


This iterator range does not coincide with the set of elements of the mdarray if the layout mapping is nonexhaustive.

There's no single definition of an iterator range if the layout mapping is nonunique. It depends on whether users want to do read-only access or read-and-write access.

The P1684 mdarray design does not include iterators at all. Therefore, given this and the complexities of defining an iterator range for general layouts, should we consider just not providing begin and end?

mhoemmen · 2024-12-09T20:07:29Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @{
+  //! @brief Returns a reference to the \p __n 'th element of the async_mdarray
+  //! @param __n The index of the element we want to access
+  _CCCL_NODISCARD _CCCL_HIDE_FROM_ABI reference operator[](const size_type __n) noexcept


This looks left over from an async_vector implementation.

cudax/include/cuda/experimental/__container/async_mdarray.cuh

mhoemmen · 2024-12-09T20:13:54Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @brief Replaces the stored stream
+  //! @param __new_stream the new stream
+  //! @note Always synchronizes with the old stream
+  _CCCL_HIDE_FROM_ABI constexpr void change_stream(::cuda::stream_ref __new_stream)


Changing the stream after construction feels like changing the allocator after construction -- it's not something that one can do to vector, and it makes reasoning about code that takes async_mdarray& difficult.

Should we consider an alternate design, that makes it easier to move from this async_mdarray into a new async_mdarray with a different stream?

I believe that is different than changing the allocator. The stream is just something that tells the container where future work happens.

(Re)allocation and future work are coupled through copy assignment.

std::vector has the invariant that it always has the same allocator throughout its lifetime.

If (re)allocation and future work are coupled, then should we also consider adding the invariant that a container always has the same allocator and stream throughout its lifetime?

Move construction is relatively cheap and can preserve existing allocations and values.

Therefore, should we consider a design in which users can "change" the allocator and/or stream by move-constructing a new container with the desired allocator and/or stream? This would change the allocator and/or stream for future operations; the current allocation wouldn't necessarily change.

Here's an example of (5).

env e{allocator, stream}; // whatever the syntax may be async_mdarray<float, std::dims<2, int>> A{env, num_rows, num_cols}; // ... code ... async_mdarray<float, std::dims<2, int>> B = /* from somewhere */; // ... more code ... env e2{new_allocator, new_stream}; // either or both may change async_mdarray<float, std::dims<2, int>> A2{e2, std::move(A)}; // Assignment from B uses A2's new allocator and new stream. A2 = B;

yes that is true but I do not see any issue here

That is not correct, there is a whole machinery in allocator_traits that manages how allocators behave during assignment / construction. If you assign a vector that is not always equal you may need to reallocate

I still dont see how future work and reallocation are coupled. We can happily change streams as long as it is ensured that the streams are properly synchronized

We currently only allow the user to change the stream (and the execution policy). As long as every work is in proper stream order there is no reason to put additional constraints in.

In general we really want the ability to change the stream of a container, because we never know where we are getting it from. It can realistically happen that we get 3 containers allocated and prepared on different streams and we want to launch an algorithm that takes all of them. In that case we do not only want to synchronize but also ensure that the stream that is internally used is the right one.

Otherwise every operation on the container would need to take a stream as a required argument and we really do not want that

cudax/include/cuda/experimental/__container/async_mdarray.cuh

mhoemmen · 2024-12-09T20:18:53Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @brief Replaces the currently used execution policy
+  //! @param __new_policy the new policy
+  _CCCL_HIDE_FROM_ABI constexpr void set_execution_policy(__policy_t __new_policy) noexcept
+  {
+    __policy_ = __new_policy;
+  }
+


Changing the execution policy or stream after construction feels like changing the allocator after construction -- it's not something that one can do to vector, and it makes reasoning about code that takes async_mdarray& difficult.

Should we consider an alternate design, that makes it easier to move from this async_mdarray into a new async_mdarray with a different execution policy?

Suggested change

//! @brief Replaces the currently used execution policy

//! @param __new_policy the new policy

_CCCL_HIDE_FROM_ABI constexpr void set_execution_policy(__policy_t __new_policy) noexcept

{

__policy_ = __new_policy;

}

I do not follow you here.

The execution policy determines where future operations on the mdarray happen. In what regard would that be problematic?

I wrote this above; I'll just repeat it here for clarity.

(Re)allocation and future work are coupled through copy assignment.

std::vector has the invariant that it always has the same allocator throughout its lifetime.

If (re)allocation and future work are coupled, then should we also consider adding the invariant that a container always has the same allocator and stream throughout its lifetime?

Move construction is relatively cheap and can preserve existing allocations and values.

Therefore, should we consider a design in which users can "change" the allocator and/or stream by move-constructing a new container with the desired allocator and/or stream? This would change the allocator and/or stream for future operations; the current allocation wouldn't necessarily change.

Here's an example of (5).

env e{allocator, stream}; // whatever the syntax may be async_mdarray<float, std::dims<2>> A{env, num_rows, num_cols}; // ... code ... async_mdarray<float, std::dims<2>> B = /* from somewhere */; // ... more code ... env e2{new_allocator, new_stream}; // either or both may change async_mdarray<float, std::dims<2>> A2{e2, std::move(A)}; // Assignment from B uses A2's new allocator and new stream. A2 = B;

This would change the allocator and/or stream for future operations; the current allocation wouldn't necessarily change.

This would be like changing the scheduler in std::execution: it affects things that happen after the change. I find this to be a reasonable interpretation of the name "async_mdarray."

The execution policy determines where future operations on the mdarray happen. In what regard would that be problematic?

One feature of std::vector is that I can't change its allocator after construction. In async_mdarray, copy assignment couples (re)allocation and asynchronous execution (copying or fill). In a hypothetical async_vector, both copy assignment and resizing would couple allocation and asynchronous execution. (Re)allocation also might depend on the stream, so users might reasonably expect that if they could change the stream, they could also change the allocator.

cudax/include/cuda/experimental/__container/async_mdarray.cuh

pciolkosz · 2024-12-10T03:06:08Z

Seeing the above comments from Mark I was wondering if it wouldn't be better to start from his mdarray prototype posted on slack and add some constructors / whatever we think should be carried over from vector? I think I agree with the above comments that some APIs that were fine for vector become tricky for mdarray. I think we should start with the smallest usable API set and then consider what would be good to add, instead of starting with a larger set carried over from vector and figure out how to make it work for mdarray

cudax/test/containers/async_mdarray/constructor.cu

miscco · 2024-12-09T18:44:57Z

cudax/test/containers/async_mdarray/constructor.cu

+      CHECK(vec.data() == nullptr);
+    }
+
+    { // from env and size, no alllocation


We also want a constructor that takes an mdspan

miscco · 2024-12-09T18:46:00Z

cudax/test/containers/async_mdarray/conversion.cu

+  using MatchingResource = typename extract_properties<TestType>::matching_resource;
+  Env matching_env{MatchingResource{resource}, stream};
+
+  SECTION("cudax::async_mdarray construction with matching async_mdarray")


We also want explicit conversion to mdspan

miscco · 2024-12-10T07:16:21Z

cudax/include/cuda/experimental/__container/async_mdarray.cuh

+  //! @brief Replaces the stored stream
+  //! @param __new_stream the new stream
+  //! @note Always synchronizes with the old stream
+  _CCCL_HIDE_FROM_ABI constexpr void change_stream(::cuda::stream_ref __new_stream)


I believe that is different than changing the allocator. The stream is just something that tells the container where future work happens.

This is a derivation of the current `mdarray` proposal. To cater to our heterogeneous use cases we require an environment to be passed to the constructor of the mdspan. This replaces the container template argument in the current proposal. In contrast to `cudax::async_vector` `cudax::async_mdarray` does not offer any APIs that change it size.

github-actions · 2024-12-12T15:13:00Z

🟨 CI finished in 1h 44m: Pass: 98%/168 | Total: 1d 18h | Avg: 15m 00s | Max: 1h 36m | Hits: 13%/22094

🟨 cudax: Pass: 88%/26 | Total: 2h 43m | Avg: 6m 17s | Max: 18m 01s

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  86%/22  | Total:  2h 28m | Avg:  6m 44s | Max: 18m 01s
  🟩 arm64              Pass: 100%/4   | Total: 15m 09s | Avg:  3m 47s | Max:  3m 53s
🔍 jobs: Build 🔍
  🔍 Build              Pass:  87%/24  | Total:  2h 07m | Avg:  5m 18s | Max: 12m 24s
  🟩 Test               Pass: 100%/2   | Total: 36m 00s | Avg: 18m 00s | Max: 18m 01s
🟨 ctk
  🟨 12.0               Pass:  33%/3   | Total: 20m 12s | Avg:  6m 44s | Max: 12m 24s
  🟩 12.5               Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 16s
  🟨 12.6               Pass:  95%/21  | Total:  2h 05m | Avg:  5m 57s | Max: 18m 01s
🟨 cudacxx
  🟨 nvcc12.0           Pass:  33%/3   | Total: 20m 12s | Avg:  6m 44s | Max: 12m 24s
  🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 16s
  🟨 nvcc12.6           Pass:  95%/21  | Total:  2h 05m | Avg:  5m 57s | Max: 18m 01s
🟨 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
  🟩 Clang10            Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
  🟩 Clang11            Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s
  🟩 Clang12            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
  🟩 Clang13            Pass: 100%/1   | Total:  4m 35s | Avg:  4m 35s | Max:  4m 35s
  🟩 Clang14            Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
  🟩 Clang15            Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 49s | Avg:  4m 49s | Max:  4m 49s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 52s | Avg:  4m 52s | Max:  4m 52s
  🟩 Clang18            Pass: 100%/4   | Total: 29m 55s | Avg:  7m 28s | Max: 18m 01s
  🟥 GCC9               Pass:   0%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s
  🟩 GCC10              Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
  🟩 GCC11              Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
  🟩 GCC12              Pass: 100%/2   | Total: 22m 31s | Avg: 11m 15s | Max: 17m 59s
  🟩 GCC13              Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  3m 53s
  🟥 MSVC14.36          Pass:   0%/1   | Total: 12m 24s | Avg: 12m 24s | Max: 12m 24s
  🟥 MSVC14.39          Pass:   0%/1   | Total: 12m 01s | Avg: 12m 01s | Max: 12m 01s
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 16s
🟨 cxx_family
  🟩 Clang              Pass: 100%/13  | Total:  1h 10m | Avg:  5m 25s | Max: 18m 01s
  🟨 GCC                Pass:  88%/9   | Total: 50m 28s | Avg:  5m 36s | Max: 17m 59s
  🟥 MSVC               Pass:   0%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 24s
  🟩 NVHPC              Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 16s
🟨 cudacxx_family
  🟨 nvcc               Pass:  88%/26  | Total:  2h 43m | Avg:  6m 17s | Max: 18m 01s
🟨 gpu
  🟨 v100               Pass:  88%/26  | Total:  2h 43m | Avg:  6m 17s | Max: 18m 01s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  3m 41s | Avg:  3m 41s | Max:  3m 41s
  🟩 90a                Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
🟨 std
  🟨 17                 Pass:  83%/6   | Total: 28m 05s | Avg:  4m 40s | Max:  8m 57s
  🟨 20                 Pass:  90%/20  | Total:  2h 15m | Avg:  6m 46s | Max: 18m 01s

🟩 libcudacxx: Pass: 100%/48 | Total: 11h 22m | Avg: 14m 13s | Max: 38m 49s | Hits: 11%/9770

🟩 cpu
  🟩 amd64              Pass: 100%/46  | Total: 10h 58m | Avg: 14m 18s | Max: 38m 49s | Hits:  11%/9770  
  🟩 arm64              Pass: 100%/2   | Total: 24m 35s | Avg: 12m 17s | Max: 20m 55s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 09m | Avg:  9m 56s | Max: 31m 42s | Hits:  12%/2228  
  🟩 12.5               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 08s | Max: 36m 22s
  🟩 12.6               Pass: 100%/39  | Total:  9h 04m | Avg: 13m 58s | Max: 38m 49s | Hits:  11%/7542  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 22s | Max: 21m 45s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 09m | Avg:  9m 56s | Max: 31m 42s | Hits:  12%/2228  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 08m | Avg: 34m 08s | Max: 36m 22s
  🟩 nvcc12.6           Pass: 100%/35  | Total:  7h 59m | Avg: 13m 41s | Max: 38m 49s | Hits:  11%/7542  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 22s | Max: 21m 45s
  🟩 nvcc               Pass: 100%/44  | Total: 10h 17m | Avg: 14m 01s | Max: 38m 49s | Hits:  11%/9770  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 14m 42s | Avg:  3m 40s | Max:  4m 32s
  🟩 Clang10            Pass: 100%/1   | Total:  4m 35s | Avg:  4m 35s | Max:  4m 35s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s
  🟩 Clang12            Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
  🟩 Clang13            Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
  🟩 Clang14            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
  🟩 Clang15            Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 11s | Avg:  4m 11s | Max:  4m 11s
  🟩 Clang17            Pass: 100%/1   | Total:  4m 33s | Avg:  4m 33s | Max:  4m 33s
  🟩 Clang18            Pass: 100%/8   | Total:  1h 56m | Avg: 14m 32s | Max: 24m 03s
  🟩 GCC6               Pass: 100%/2   | Total:  5m 35s | Avg:  2m 47s | Max:  2m 49s
  🟩 GCC7               Pass: 100%/2   | Total:  6m 23s | Avg:  3m 11s | Max:  3m 17s
  🟩 GCC8               Pass: 100%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
  🟩 GCC9               Pass: 100%/3   | Total: 45m 29s | Avg: 15m 09s | Max: 22m 58s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 42s | Avg:  3m 42s | Max:  3m 42s
  🟩 GCC11              Pass: 100%/1   | Total: 20m 13s | Avg: 20m 13s | Max: 20m 13s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
  🟩 GCC13              Pass: 100%/10  | Total:  2h 58m | Avg: 17m 53s | Max: 28m 54s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 23m 34s | Avg: 23m 34s | Max: 23m 34s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 31m 42s | Avg: 31m 42s | Max: 31m 42s | Hits:  12%/2228  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 34m 34s | Avg: 34m 34s | Max: 34m 34s | Hits:  11%/2465  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 15m | Avg: 37m 56s | Max: 38m 49s | Hits:  10%/5077  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 08m | Avg: 34m 08s | Max: 36m 22s
🟩 cxx_family
  🟩 Clang              Pass: 100%/20  | Total:  3h 00m | Avg:  9m 01s | Max: 24m 03s
  🟩 GCC                Pass: 100%/21  | Total:  4h 28m | Avg: 12m 46s | Max: 28m 54s
  🟩 Intel              Pass: 100%/1   | Total: 23m 34s | Avg: 23m 34s | Max: 23m 34s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 22m | Avg: 35m 32s | Max: 38m 49s | Hits:  11%/9770  
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 08s | Max: 36m 22s
🟩 gpu
  🟩 v100               Pass: 100%/48  | Total: 11h 22m | Avg: 14m 13s | Max: 38m 49s | Hits:  11%/9770  
🟩 jobs
  🟩 Build              Pass: 100%/41  | Total:  8h 50m | Avg: 12m 55s | Max: 38m 49s | Hits:  11%/9770  
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 45m | Avg: 26m 24s | Max: 28m 54s
  🟩 Test               Pass: 100%/2   | Total: 45m 03s | Avg: 22m 31s | Max: 24m 03s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s
  🟩 90a                Pass: 100%/2   | Total: 15m 13s | Avg:  7m 36s | Max: 11m 35s
🟩 std
  🟩 11                 Pass: 100%/6   | Total: 58m 07s | Avg:  9m 41s | Max: 22m 58s
  🟩 14                 Pass: 100%/5   | Total:  1h 08m | Avg: 13m 37s | Max: 31m 42s | Hits:  12%/2228  
  🟩 17                 Pass: 100%/13  | Total:  4h 11m | Avg: 19m 22s | Max: 37m 03s | Hits:  11%/4930  
  🟩 20                 Pass: 100%/23  | Total:  5h 02m | Avg: 13m 09s | Max: 38m 49s | Hits:  10%/2612

🟩 thrust: Pass: 100%/46 | Total: 12h 31m | Avg: 16m 20s | Max: 1h 16m | Hits: 20%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 18m 47s | Avg:  9m 23s | Max: 12m 22s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total: 12h 22m | Avg: 16m 51s | Max:  1h 16m | Hits:  20%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  5m 01s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 36m | Avg: 13m 43s | Max:  1h 09m | Hits:   0%/1852  
  🟩 12.5               Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 14m
  🟩 12.6               Pass: 100%/37  | Total:  8h 32m | Avg: 13m 51s | Max:  1h 16m | Hits:  25%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 44s | Avg:  5m 22s | Max:  5m 26s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 36m | Avg: 13m 43s | Max:  1h 09m | Hits:   0%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 14m
  🟩 nvcc12.6           Pass: 100%/35  | Total:  8h 21m | Avg: 14m 20s | Max:  1h 16m | Hits:  25%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 44s | Avg:  5m 22s | Max:  5m 26s
  🟩 nvcc               Pass: 100%/44  | Total: 12h 20m | Avg: 16m 50s | Max:  1h 16m | Hits:  20%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 20m 59s | Avg:  5m 14s | Max:  6m 10s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 10s | Avg:  5m 10s | Max:  5m 10s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 03s | Avg:  5m 03s | Max:  5m 03s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
  🟩 Clang18            Pass: 100%/7   | Total: 47m 37s | Avg:  6m 48s | Max: 13m 55s
  🟩 GCC6               Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 36s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 30s | Avg:  5m 15s | Max:  5m 43s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
  🟩 GCC9               Pass: 100%/3   | Total: 14m 54s | Avg:  4m 58s | Max:  6m 01s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
  🟩 GCC12              Pass: 100%/1   | Total:  6m 21s | Avg:  6m 21s | Max:  6m 21s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 16m | Avg:  9m 31s | Max: 27m 48s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 55m 24s | Avg: 55m 24s | Max: 55m 24s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:   0%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:   0%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 53m | Avg: 57m 44s | Max:  1h 16m | Hits:  33%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 14m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  1h 53m | Avg:  5m 57s | Max: 13m 55s
  🟩 GCC                Pass: 100%/19  | Total:  2h 13m | Avg:  7m 02s | Max: 27m 48s
  🟩 Intel              Pass: 100%/1   | Total: 55m 24s | Avg: 55m 24s | Max: 55m 24s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 06m | Avg:  1h 01m | Max:  1h 16m | Hits:  20%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 14m
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total: 12h 31m | Avg: 16m 20s | Max:  1h 16m | Hits:  20%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total: 10h 58m | Avg: 16m 28s | Max:  1h 16m | Hits:   0%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 38m 40s | Avg: 12m 53s | Max: 23m 25s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 54m 05s | Avg: 18m 01s | Max: 27m 48s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 23s | Avg:  4m 23s | Max:  4m 23s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 22m 54s | Avg:  4m 34s | Max:  5m 49s
  🟩 14                 Pass: 100%/4   | Total:  1h 25m | Avg: 21m 28s | Max:  1h 09m | Hits:   0%/1852  
  🟩 17                 Pass: 100%/12  | Total:  5h 14m | Avg: 26m 13s | Max:  1h 16m | Hits:   0%/3704  
  🟩 20                 Pass: 100%/23  | Total:  5h 09m | Avg: 13m 26s | Max:  1h 13m | Hits:  49%/3704

🟩 cub: Pass: 100%/45 | Total: 14h 35m | Avg: 19m 26s | Max: 1h 36m | Hits: 3%/3064

🟩 cpu
  🟩 amd64              Pass: 100%/43  | Total: 14h 25m | Avg: 20m 07s | Max:  1h 36m | Hits:   3%/3064  
  🟩 arm64              Pass: 100%/2   | Total:  9m 34s | Avg:  4m 47s | Max:  4m 57s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  1h 29m | Avg: 12m 45s | Max:  1h 03m | Hits:   3%/766   
  🟩 12.5               Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
  🟩 12.6               Pass: 100%/36  | Total: 10h 42m | Avg: 17m 51s | Max:  1h 36m | Hits:   3%/2298  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 36s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  1h 29m | Avg: 12m 45s | Max:  1h 03m | Hits:   3%/766   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
  🟩 nvcc12.6           Pass: 100%/34  | Total: 10h 33m | Avg: 18m 38s | Max:  1h 36m | Hits:   3%/2298  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 36s
  🟩 nvcc               Pass: 100%/43  | Total: 14h 26m | Avg: 20m 08s | Max:  1h 36m | Hits:   3%/3064  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 21m 24s | Avg:  5m 21s | Max:  6m 10s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 35s | Avg:  6m 35s | Max:  6m 35s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 10s | Avg:  5m 10s | Max:  5m 10s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 40m | Avg: 14m 18s | Max: 41m 53s
  🟩 GCC6               Pass: 100%/2   | Total:  8m 28s | Avg:  4m 14s | Max:  4m 19s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 32s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
  🟩 GCC9               Pass: 100%/3   | Total: 13m 53s | Avg:  4m 37s | Max:  5m 22s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
  🟩 GCC13              Pass: 100%/8   | Total:  3h 10m | Avg: 23m 50s | Max:  1h 36m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:   3%/766   
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 04m | Avg:  1h 04m | Max:  1h 04m | Hits:   3%/766   
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 07m | Hits:   3%/1532  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 46m | Avg:  8m 44s | Max: 41m 53s
  🟩 GCC                Pass: 100%/19  | Total:  4h 06m | Avg: 12m 57s | Max:  1h 36m
  🟩 Intel              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
  🟩 MSVC               Pass: 100%/4   | Total:  4h 18m | Avg:  1h 04m | Max:  1h 07m | Hits:   3%/3064  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m
🟩 gpu
  🟩 v100               Pass: 100%/45  | Total: 14h 35m | Avg: 19m 26s | Max:  1h 36m | Hits:   3%/3064  
🟩 jobs
  🟩 Build              Pass: 100%/39  | Total: 10h 30m | Avg: 16m 09s | Max:  1h 16m | Hits:   3%/3064  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 34m 00s | Avg: 34m 00s | Max: 34m 00s
  🟩 GraphCapture       Pass: 100%/1   | Total: 17m 47s | Avg: 17m 47s | Max: 17m 47s
  🟩 HostLaunch         Pass: 100%/2   | Total: 54m 54s | Avg: 27m 27s | Max: 33m 56s
  🟩 TestGPU            Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 36m
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 35s | Avg:  4m 35s | Max:  4m 35s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 24m 00s | Avg:  4m 48s | Max:  6m 05s
  🟩 14                 Pass: 100%/4   | Total:  1h 19m | Avg: 19m 48s | Max:  1h 03m | Hits:   3%/766   
  🟩 17                 Pass: 100%/12  | Total:  5h 07m | Avg: 25m 35s | Max:  1h 16m | Hits:   3%/1532  
  🟩 20                 Pass: 100%/24  | Total:  7h 44m | Avg: 19m 22s | Max:  1h 36m | Hits:   3%/766

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 15s | Avg: 4m 37s | Max: 7m 08s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  7m 08s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
  🟩 Test               Pass: 100%/1   | Total:  7m 08s | Avg:  7m 08s | Max:  7m 08s

🟩 python: Pass: 100%/1 | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 168)

#	Runner
124	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`

github-actions · 2024-12-12T16:59:09Z

🟨 CI finished in 1h 17m: Pass: 98%/168 | Total: 1d 00h | Avg: 8m 47s | Max: 36m 38s | Hits: 91%/22428

🟨 libcudacxx: Pass: 97%/48 | Total: 8h 39m | Avg: 10m 49s | Max: 36m 38s | Hits: 80%/9770

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  97%/46  | Total:  8h 32m | Avg: 11m 08s | Max: 36m 38s | Hits:  80%/9770  
  🟩 arm64              Pass: 100%/2   | Total:  7m 03s | Avg:  3m 31s | Max:  3m 48s
🔍 ctk: 12.6 🔍
  🟩 11.1               Pass: 100%/7   | Total: 50m 40s | Avg:  7m 14s | Max: 20m 18s | Hits:  98%/2228  
  🟩 12.5               Pass: 100%/2   | Total: 38m 34s | Avg: 19m 17s | Max: 30m 19s
  🔍 12.6               Pass:  97%/39  | Total:  7h 10m | Avg: 11m 01s | Max: 36m 38s | Hits:  75%/7542  
🔍 cudacxx: nvcc12.6 🔍
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 03m | Avg: 15m 49s | Max: 20m 48s
  🟩 nvcc11.1           Pass: 100%/7   | Total: 50m 40s | Avg:  7m 14s | Max: 20m 18s | Hits:  98%/2228  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 38m 34s | Avg: 19m 17s | Max: 30m 19s
  🔍 nvcc12.6           Pass:  97%/35  | Total:  6h 06m | Avg: 10m 28s | Max: 36m 38s | Hits:  75%/7542  
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 03m | Avg: 15m 49s | Max: 20m 48s
  🔍 nvcc               Pass:  97%/44  | Total:  7h 36m | Avg: 10m 22s | Max: 36m 38s | Hits:  80%/9770  
🔍 cxx: GCC13 🔍
  🟩 Clang9             Pass: 100%/4   | Total: 15m 12s | Avg:  3m 48s | Max:  4m 38s
  🟩 Clang10            Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
  🟩 Clang12            Pass: 100%/1   | Total: 18m 23s | Avg: 18m 23s | Max: 18m 23s
  🟩 Clang13            Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
  🟩 Clang14            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
  🟩 Clang15            Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s
  🟩 Clang17            Pass: 100%/1   | Total: 11m 18s | Avg: 11m 18s | Max: 11m 18s
  🟩 Clang18            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 11s | Max: 21m 58s
  🟩 GCC6               Pass: 100%/2   | Total:  5m 06s | Avg:  2m 33s | Max:  2m 35s
  🟩 GCC7               Pass: 100%/2   | Total:  6m 42s | Avg:  3m 21s | Max:  3m 37s
  🟩 GCC8               Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
  🟩 GCC9               Pass: 100%/3   | Total: 22m 22s | Avg:  7m 27s | Max: 16m 13s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 56s | Avg:  3m 56s | Max:  3m 56s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 35s | Avg:  3m 35s | Max:  3m 35s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
  🔍 GCC13              Pass:  90%/10  | Total:  2h 34m | Avg: 15m 28s | Max: 33m 37s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 23m 15s | Avg: 23m 15s | Max: 23m 15s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 20m 18s | Avg: 20m 18s | Max: 20m 18s | Hits:  98%/2228  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 13m 26s | Avg: 13m 26s | Max: 13m 26s | Hits:  99%/2465  
  🟩 MSVC14.39          Pass: 100%/2   | Total: 50m 12s | Avg: 25m 06s | Max: 36m 38s | Hits:  63%/5077  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 38m 34s | Avg: 19m 17s | Max: 30m 19s
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/20  | Total:  2h 49m | Avg:  8m 27s | Max: 21m 58s
  🔍 GCC                Pass:  95%/21  | Total:  3h 24m | Avg:  9m 44s | Max: 33m 37s
  🟩 Intel              Pass: 100%/1   | Total: 23m 15s | Avg: 23m 15s | Max: 23m 15s
  🟩 MSVC               Pass: 100%/4   | Total:  1h 23m | Avg: 20m 59s | Max: 36m 38s | Hits:  80%/9770  
  🟩 NVHPC              Pass: 100%/2   | Total: 38m 34s | Avg: 19m 17s | Max: 30m 19s
🔍 jobs: NVRTC 🔍
  🟩 Build              Pass: 100%/41  | Total:  5h 57m | Avg:  8m 43s | Max: 36m 38s | Hits:  80%/9770  
  🔍 NVRTC              Pass:  75%/4   | Total:  2h 00m | Avg: 30m 05s | Max: 33m 37s
  🟩 Test               Pass: 100%/2   | Total: 39m 21s | Avg: 19m 40s | Max: 21m 58s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
🔍 std: 14 🔍
  🟩 11                 Pass: 100%/6   | Total: 44m 08s | Avg:  7m 21s | Max: 29m 07s
  🔍 14                 Pass:  80%/5   | Total:  1h 02m | Avg: 12m 35s | Max: 31m 48s | Hits:  98%/2228  
  🟩 17                 Pass: 100%/13  | Total:  2h 30m | Avg: 11m 32s | Max: 33m 37s | Hits:  99%/4930  
  🟩 20                 Pass: 100%/23  | Total:  4h 20m | Avg: 11m 18s | Max: 36m 38s | Hits:  30%/2612  
🟨 gpu
  🟨 v100               Pass:  97%/48  | Total:  8h 39m | Avg: 10m 49s | Max: 36m 38s | Hits:  80%/9770  
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 12m 10s | Avg: 12m 10s | Max: 12m 10s
  🟩 90a                Pass: 100%/2   | Total: 16m 46s | Avg:  8m 23s | Max: 13m 09s

🟨 cudax: Pass: 96%/26 | Total: 2h 28m | Avg: 5m 42s | Max: 25m 25s | Hits: 86%/334

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  95%/22  | Total:  2h 14m | Avg:  6m 07s | Max: 25m 25s | Hits:  86%/334   
  🟩 arm64              Pass: 100%/4   | Total: 13m 29s | Avg:  3m 22s | Max:  3m 28s
🔍 ctk: 12.0 🔍
  🔍 12.0               Pass:  66%/3   | Total: 16m 50s | Avg:  5m 36s | Max: 10m 04s | Hits:  86%/167   
  🟩 12.5               Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 16s
  🟩 12.6               Pass: 100%/21  | Total:  1h 59m | Avg:  5m 40s | Max: 25m 25s | Hits:  86%/167   
🔍 cudacxx: nvcc12.0 🔍
  🔍 nvcc12.0           Pass:  66%/3   | Total: 16m 50s | Avg:  5m 36s | Max: 10m 04s | Hits:  86%/167   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 16s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 59m | Avg:  5m 40s | Max: 25m 25s | Hits:  86%/167   
🚨 cxx: GCC9 🚨
  🟩 Clang9             Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
  🟩 Clang10            Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
  🟩 Clang12            Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
  🟩 Clang13            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 36s | Avg:  3m 36s | Max:  3m 36s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 38s | Avg:  3m 38s | Max:  3m 38s
  🟩 Clang18            Pass: 100%/4   | Total: 29m 00s | Avg:  7m 15s | Max: 18m 38s
  🔥 GCC9               Pass:   0%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
  🟩 GCC12              Pass: 100%/2   | Total: 29m 27s | Avg: 14m 43s | Max: 25m 25s
  🟩 GCC13              Pass: 100%/4   | Total: 13m 10s | Avg:  3m 17s | Max:  3m 28s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 04s | Avg: 10m 04s | Max: 10m 04s | Hits:  86%/167   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 31s | Avg: 10m 31s | Max: 10m 31s | Hits:  86%/167   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 16s
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/13  | Total:  1h 02m | Avg:  4m 46s | Max: 18m 38s
  🔍 GCC                Pass:  88%/9   | Total: 53m 24s | Avg:  5m 56s | Max: 25m 25s
  🟩 MSVC               Pass: 100%/2   | Total: 20m 35s | Avg: 10m 17s | Max: 10m 31s | Hits:  86%/334   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 16s
🔍 jobs: Build 🔍
  🔍 Build              Pass:  95%/24  | Total:  1h 44m | Avg:  4m 20s | Max: 10m 31s | Hits:  86%/334   
  🟩 Test               Pass: 100%/2   | Total: 44m 03s | Avg: 22m 01s | Max: 25m 25s
🔍 std: 17 🔍
  🔍 17                 Pass:  83%/6   | Total: 22m 51s | Avg:  3m 48s | Max:  6m 16s
  🟩 20                 Pass: 100%/20  | Total:  2h 05m | Avg:  6m 16s | Max: 25m 25s | Hits:  86%/334   
🟨 cudacxx_family
  🟨 nvcc               Pass:  96%/26  | Total:  2h 28m | Avg:  5m 42s | Max: 25m 25s | Hits:  86%/334   
🟨 gpu
  🟨 v100               Pass:  96%/26  | Total:  2h 28m | Avg:  5m 42s | Max: 25m 25s | Hits:  86%/334   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  3m 09s | Avg:  3m 09s | Max:  3m 09s
  🟩 90a                Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s

🟩 thrust: Pass: 100%/46 | Total: 6h 24m | Avg: 8m 21s | Max: 36m 35s | Hits: 99%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 19m 16s | Avg:  9m 38s | Max: 13m 14s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total:  6h 14m | Avg:  8m 31s | Max: 36m 35s | Hits:  99%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  5m 15s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total: 43m 33s | Avg:  6m 13s | Max: 17m 27s | Hits:  99%/1852  
  🟩 12.5               Pass: 100%/2   | Total: 29m 37s | Avg: 14m 48s | Max: 15m 26s
  🟩 12.6               Pass: 100%/37  | Total:  5h 11m | Avg:  8m 24s | Max: 36m 35s | Hits:  99%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 15s
  🟩 nvcc11.1           Pass: 100%/7   | Total: 43m 33s | Avg:  6m 13s | Max: 17m 27s | Hits:  99%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 37s | Avg: 14m 48s | Max: 15m 26s
  🟩 nvcc12.6           Pass: 100%/35  | Total:  5h 01m | Avg:  8m 36s | Max: 36m 35s | Hits:  99%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 15s
  🟩 nvcc               Pass: 100%/44  | Total:  6h 14m | Avg:  8m 30s | Max: 36m 35s | Hits:  99%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 19m 54s | Avg:  4m 58s | Max:  5m 50s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 42s | Avg:  6m 42s | Max:  6m 42s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 08s | Avg:  5m 08s | Max:  5m 08s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
  🟩 Clang18            Pass: 100%/7   | Total: 48m 20s | Avg:  6m 54s | Max: 14m 49s
  🟩 GCC6               Pass: 100%/2   | Total:  8m 40s | Avg:  4m 20s | Max:  4m 33s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 19s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 17s | Avg:  5m 17s | Max:  5m 17s
  🟩 GCC9               Pass: 100%/3   | Total: 14m 04s | Avg:  4m 41s | Max:  5m 19s
  🟩 GCC10              Pass: 100%/1   | Total: 36m 35s | Avg: 36m 35s | Max: 36m 35s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 40s | Avg:  5m 40s | Max:  5m 40s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 03m | Avg:  7m 53s | Max: 13m 36s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 52s | Avg:  6m 52s | Max:  6m 52s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 17m 27s | Avg: 17m 27s | Max: 17m 27s | Hits:  99%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 14m 59s | Avg: 14m 59s | Max: 14m 59s | Hits:  99%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total: 54m 10s | Avg: 18m 03s | Max: 21m 25s | Hits:  99%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 37s | Avg: 14m 48s | Max: 15m 26s
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  1h 52m | Avg:  5m 54s | Max: 14m 49s
  🟩 GCC                Pass: 100%/19  | Total:  2h 29m | Avg:  7m 51s | Max: 36m 35s
  🟩 Intel              Pass: 100%/1   | Total:  6m 52s | Avg:  6m 52s | Max:  6m 52s
  🟩 MSVC               Pass: 100%/5   | Total:  1h 26m | Avg: 17m 19s | Max: 21m 25s | Hits:  99%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total: 29m 37s | Avg: 14m 48s | Max: 15m 26s
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total:  6h 24m | Avg:  8m 21s | Max: 36m 35s | Hits:  99%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  5h 05m | Avg:  7m 38s | Max: 36m 35s | Hits:  99%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 36m 54s | Avg: 12m 18s | Max: 21m 25s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 41m 39s | Avg: 13m 53s | Max: 14m 49s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 55s | Avg:  4m 55s | Max:  4m 55s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 22m 14s | Avg:  4m 26s | Max:  5m 23s
  🟩 14                 Pass: 100%/4   | Total: 33m 09s | Avg:  8m 17s | Max: 17m 27s | Hits:  99%/1852  
  🟩 17                 Pass: 100%/12  | Total:  1h 37m | Avg:  8m 06s | Max: 16m 08s | Hits:  99%/3704  
  🟩 20                 Pass: 100%/23  | Total:  3h 32m | Avg:  9m 14s | Max: 36m 35s | Hits:  99%/3704

🟩 cub: Pass: 100%/45 | Total: 6h 27m | Avg: 8m 36s | Max: 30m 50s | Hits: 99%/3064

🟩 cpu
  🟩 amd64              Pass: 100%/43  | Total:  6h 17m | Avg:  8m 46s | Max: 30m 50s | Hits:  99%/3064  
  🟩 arm64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  5m 01s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total: 41m 18s | Avg:  5m 54s | Max: 15m 00s | Hits:  99%/766   
  🟩 12.5               Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 28s
  🟩 12.6               Pass: 100%/36  | Total:  5h 26m | Avg:  9m 03s | Max: 30m 50s | Hits:  99%/2298  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 47s | Avg:  4m 23s | Max:  4m 37s
  🟩 nvcc11.1           Pass: 100%/7   | Total: 41m 18s | Avg:  5m 54s | Max: 15m 00s | Hits:  99%/766   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 28s
  🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 17m | Avg:  9m 19s | Max: 30m 50s | Hits:  99%/2298  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 47s | Avg:  4m 23s | Max:  4m 37s
  🟩 nvcc               Pass: 100%/43  | Total:  6h 18m | Avg:  8m 47s | Max: 30m 50s | Hits:  99%/3064  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 21m 58s | Avg:  5m 29s | Max:  6m 38s
  🟩 Clang10            Pass: 100%/1   | Total:  7m 13s | Avg:  7m 13s | Max:  7m 13s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 12m | Avg: 10m 22s | Max: 30m 50s
  🟩 GCC6               Pass: 100%/2   | Total:  8m 22s | Avg:  4m 11s | Max:  4m 16s
  🟩 GCC7               Pass: 100%/2   | Total: 10m 27s | Avg:  5m 13s | Max:  5m 19s
  🟩 GCC8               Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
  🟩 GCC9               Pass: 100%/3   | Total: 14m 29s | Avg:  4m 49s | Max:  5m 42s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 50m | Avg: 13m 51s | Max: 30m 49s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 39s | Avg:  6m 39s | Max:  6m 39s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 00s | Avg: 15m 00s | Max: 15m 00s | Hits:  99%/766   
  🟩 MSVC14.29          Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s | Hits:  99%/766   
  🟩 MSVC14.39          Pass: 100%/2   | Total: 26m 16s | Avg: 13m 08s | Max: 13m 31s | Hits:  99%/1532  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 28s
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 19m | Avg:  7m 21s | Max: 30m 50s
  🟩 GCC                Pass: 100%/19  | Total:  2h 46m | Avg:  8m 46s | Max: 30m 49s
  🟩 Intel              Pass: 100%/1   | Total:  6m 39s | Avg:  6m 39s | Max:  6m 39s
  🟩 MSVC               Pass: 100%/4   | Total: 54m 25s | Avg: 13m 36s | Max: 15m 00s | Hits:  99%/3064  
  🟩 NVHPC              Pass: 100%/2   | Total: 19m 47s | Avg:  9m 53s | Max: 10m 28s
🟩 gpu
  🟩 v100               Pass: 100%/45  | Total:  6h 27m | Avg:  8m 36s | Max: 30m 50s | Hits:  99%/3064  
🟩 jobs
  🟩 Build              Pass: 100%/39  | Total:  4h 09m | Avg:  6m 23s | Max: 15m 00s | Hits:  99%/3064  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 09s | Avg: 19m 09s | Max: 19m 09s
  🟩 GraphCapture       Pass: 100%/1   | Total: 18m 46s | Avg: 18m 46s | Max: 18m 46s
  🟩 HostLaunch         Pass: 100%/2   | Total: 38m 32s | Avg: 19m 16s | Max: 21m 36s
  🟩 TestGPU            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 49s | Max: 30m 50s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 07s | Avg:  4m 07s | Max:  4m 07s
🟩 std
  🟩 11                 Pass: 100%/5   | Total: 23m 57s | Avg:  4m 47s | Max:  6m 11s
  🟩 14                 Pass: 100%/4   | Total: 31m 13s | Avg:  7m 48s | Max: 15m 00s | Hits:  99%/766   
  🟩 17                 Pass: 100%/12  | Total:  1h 26m | Avg:  7m 12s | Max: 13m 09s | Hits:  99%/1532  
  🟩 20                 Pass: 100%/24  | Total:  4h 05m | Avg: 10m 13s | Max: 30m 50s | Hits:  99%/766

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 58s | Avg: 4m 59s | Max: 7m 52s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 58s | Avg:  4m 59s | Max:  7m 52s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
  🟩 Test               Pass: 100%/1   | Total:  7m 52s | Avg:  7m 52s | Max:  7m 52s

🟩 python: Pass: 100%/1 | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 168)

#	Runner
124	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`

miscco requested review from a team as code owners December 9, 2024 16:43

miscco requested review from alliepiper, gonidelis, wmaxey, ericniebler and pciolkosz December 9, 2024 16:43

miscco commented Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

miscco commented Dec 9, 2024

View reviewed changes

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

mhoemmen reviewed Dec 9, 2024

View reviewed changes

cudax/include/cuda/experimental/__container/async_mdarray.cuh Outdated Show resolved Hide resolved

miscco commented Dec 10, 2024

View reviewed changes

miscco force-pushed the mdarray_cudax branch from 48595bb to cf41313 Compare December 10, 2024 15:52

miscco added 7 commits December 12, 2024 14:26

Make cudax depend on thrust

5eab35b

Implement cudax::heterogeneous_iterator

4f1221a

Address some review comments

4e8a71b

Implement conversion to mdspan

568956e

Fix equality of mapping

51963bc

Move towards constructors that take an extent

91b059a

miscco force-pushed the mdarray_cudax branch from cf41313 to 91b059a Compare December 12, 2024 13:26

miscco added 2 commits December 12, 2024 16:38

Implement all constructors through mapping ones

630d2a4

Drop the assignment of initializer_list

578e4d4

Implement cudax::async_mdarray #3095

Are you sure you want to change the base?

Implement cudax::async_mdarray #3095

Conversation

miscco commented Dec 9, 2024

miscco Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Dec 9, 2024

🟨 cudax: Pass: 88%/26 | Total: 2h 32m | Avg: 5m 51s | Max: 19m 20s

🟩 libcudacxx: Pass: 100%/48 | Total: 9h 18m | Avg: 11m 38s | Max: 41m 05s | Hits: 3%/9746

🟩 thrust: Pass: 100%/46 | Total: 13h 22m | Avg: 17m 27s | Max: 1h 31m | Hits: 20%/9260

🟩 cub: Pass: 100%/45 | Total: 13h 28m | Avg: 17m 57s | Max: 1h 23m | Hits: 0%/3028

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 11m 16s | Avg: 5m 38s | Max: 9m 08s

🟩 python: Pass: 100%/1 | Total: 28m 19s | Avg: 28m 19s | Max: 28m 19s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 168)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miscco Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhoemmen Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

mhoemmen Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhoemmen Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pciolkosz commented Dec 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 12, 2024

🟨 cudax: Pass: 88%/26 | Total: 2h 43m | Avg: 6m 17s | Max: 18m 01s

🟩 libcudacxx: Pass: 100%/48 | Total: 11h 22m | Avg: 14m 13s | Max: 38m 49s | Hits: 11%/9770

🟩 thrust: Pass: 100%/46 | Total: 12h 31m | Avg: 16m 20s | Max: 1h 16m | Hits: 20%/9260

🟩 cub: Pass: 100%/45 | Total: 14h 35m | Avg: 19m 26s | Max: 1h 36m | Hits: 3%/3064

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 15s | Avg: 4m 37s | Max: 7m 08s

🟩 python: Pass: 100%/1 | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 168)

github-actions bot commented Dec 12, 2024

🟨 libcudacxx: Pass: 97%/48 | Total: 8h 39m | Avg: 10m 49s | Max: 36m 38s | Hits: 80%/9770

🟨 cudax: Pass: 96%/26 | Total: 2h 28m | Avg: 5m 42s | Max: 25m 25s | Hits: 86%/334

🟩 thrust: Pass: 100%/46 | Total: 6h 24m | Avg: 8m 21s | Max: 36m 35s | Hits: 99%/9260

🟩 cub: Pass: 100%/45 | Total: 6h 27m | Avg: 8m 36s | Max: 30m 50s | Hits: 99%/3064

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 58s | Avg: 4m 59s | Max: 7m 52s

🟩 python: Pass: 100%/1 | Total: 28m 55s | Avg: 28m 55s | Max: 28m 55s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 168)

Implement `cudax::async_mdarray` #3095

Implement `cudax::async_mdarray` #3095

miscco Dec 9, 2024 •

edited

Loading

miscco Dec 10, 2024 •

edited

Loading

mhoemmen Dec 10, 2024 •

edited

Loading

mhoemmen Dec 9, 2024 •

edited

Loading

mhoemmen Dec 10, 2024 •

edited

Loading