Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cuda::std::bit_cast works with vector types #3183

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Dec 17, 2024

We want to make sure we can use bit_cast with suitable vector or sum types like

  • cuda::std::array
  • c-arrays
  • cuda vector types like float2

Unfortunately we cannot use it with the extended floating point vector types, because they are not trivially copyable and also not trivially copy assignable

@miscco miscco requested review from a team as code owners December 17, 2024 11:27
We want to make sure we can use `bit_cast` with suitable vector or sum types like
* `cuda::std::array`
* c-arrays
* cuda vector types like float2

Unfortunately we cannot use it with the extended floating point vector types, because they are not trivially copyable and also not trivially copy assignable
@miscco miscco force-pushed the extent_bit_cast_array branch from 25dce73 to 0f52938 Compare December 17, 2024 11:52
Copy link
Contributor

🟨 CI finished in 1h 40m: Pass: 87%/170 | Total: 2d 14h | Avg: 22m 07s | Max: 1h 06m | Hits: 62%/20274
  • 🟨 libcudacxx: Pass: 56%/48 | Total: 13h 19m | Avg: 16m 38s | Max: 1h 03m | Hits: 51%/7578

    🔍 jobs: Build 🔍
      🔍 Build              Pass:  48%/41  | Total: 10h 22m | Avg: 15m 10s | Max: 32m 56s | Hits:  51%/7578  
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 27m | Avg: 21m 55s | Max: 27m 21s
      🟩 Test               Pass: 100%/2   | Total:  1h 27m | Avg: 43m 34s | Max:  1h 03m
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
    🟨 ctk
      🟥 11.1               Pass:   0%/7   | Total:  1h 53m | Avg: 16m 16s | Max: 28m 47s
      🟩 12.5               Pass: 100%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 16s
      🟨 12.6               Pass:  64%/39  | Total: 10h 21m | Avg: 15m 55s | Max:  1h 03m | Hits:  51%/7578  
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/4   | Total:  1h 04m | Avg: 16m 05s | Max: 20m 57s
      🟥 nvcc11.1           Pass:   0%/7   | Total:  1h 53m | Avg: 16m 16s | Max: 28m 47s
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 16s
      🟨 nvcc12.6           Pass:  71%/35  | Total:  9h 16m | Avg: 15m 54s | Max:  1h 03m | Hits:  51%/7578  
    🟨 cxx
      🟥 Clang9             Pass:   0%/4   | Total:  1h 00m | Avg: 15m 02s | Max: 23m 08s
      🟩 Clang10            Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
      🟩 Clang11            Pass: 100%/1   | Total: 22m 08s | Avg: 22m 08s | Max: 22m 08s
      🟩 Clang12            Pass: 100%/1   | Total:  6m 51s | Avg:  6m 51s | Max:  6m 51s
      🟩 Clang13            Pass: 100%/1   | Total: 10m 29s | Avg: 10m 29s | Max: 10m 29s
      🟩 Clang14            Pass: 100%/1   | Total: 19m 57s | Avg: 19m 57s | Max: 19m 57s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 33s | Avg:  4m 33s | Max:  4m 33s
      🟩 Clang16            Pass: 100%/1   | Total: 21m 36s | Avg: 21m 36s | Max: 21m 36s
      🟩 Clang17            Pass: 100%/1   | Total: 20m 55s | Avg: 20m 55s | Max: 20m 55s
      🟨 Clang18            Pass:  37%/8   | Total:  2h 08m | Avg: 16m 03s | Max: 23m 38s
      🟥 GCC6               Pass:   0%/2   | Total: 30m 21s | Avg: 15m 10s | Max: 22m 22s
      🟥 GCC7               Pass:   0%/2   | Total: 18m 27s | Avg:  9m 13s | Max: 15m 11s
      🟥 GCC8               Pass:   0%/1   | Total: 20m 22s | Avg: 20m 22s | Max: 20m 22s
      🟥 GCC9               Pass:   0%/3   | Total: 31m 38s | Avg: 10m 32s | Max: 18m 07s
      🟥 GCC10              Pass:   0%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s
      🟩 GCC12              Pass: 100%/1   | Total: 22m 32s | Avg: 22m 32s | Max: 22m 32s
      🟨 GCC13              Pass:  90%/10  | Total:  3h 04m | Avg: 18m 26s | Max:  1h 03m
      🟥 Intel2023.2.0      Pass:   0%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s
      🟥 MSVC14.16          Pass:   0%/1   | Total: 28m 47s | Avg: 28m 47s | Max: 28m 47s
      🟩 MSVC14.29          Pass: 100%/1   | Total: 32m 56s | Avg: 32m 56s | Max: 32m 56s | Hits:  31%/2477  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 49m 17s | Avg: 24m 38s | Max: 31m 45s | Hits:  60%/5101  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 16s
    🟨 cxx_family
      🟨 Clang              Pass:  55%/20  | Total:  4h 59m | Avg: 14m 59s | Max: 23m 38s
      🟨 GCC                Pass:  52%/21  | Total:  5h 18m | Avg: 15m 10s | Max:  1h 03m
      🟥 Intel              Pass:   0%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s
      🟨 MSVC               Pass:  75%/4   | Total:  1h 51m | Avg: 27m 45s | Max: 32m 56s | Hits:  51%/7578  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 16s
    🟨 gpu
      🟨 v100               Pass:  56%/48  | Total: 13h 19m | Avg: 16m 38s | Max:  1h 03m | Hits:  51%/7578  
    🟨 cpu
      🟨 amd64              Pass:  58%/46  | Total: 12h 48m | Avg: 16m 42s | Max:  1h 03m | Hits:  51%/7578  
      🟥 arm64              Pass:   0%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 20m 23s
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/4   | Total:  1h 04m | Avg: 16m 05s | Max: 20m 57s
      🟨 nvcc               Pass:  61%/44  | Total: 12h 14m | Avg: 16m 42s | Max:  1h 03m | Hits:  51%/7578  
    🟨 sm
      🟥 90                 Pass:   0%/1   | Total: 12m 51s | Avg: 12m 51s | Max: 12m 51s
      🟨 90a                Pass:  50%/2   | Total: 16m 46s | Avg:  8m 23s | Max: 12m 42s
    🟨 std
      🟨 11                 Pass:  16%/6   | Total:  1h 37m | Avg: 16m 10s | Max: 23m 08s
      🟨 14                 Pass:  20%/5   | Total:  1h 30m | Avg: 18m 03s | Max: 28m 47s
      🟨 17                 Pass:  53%/13  | Total:  3h 32m | Avg: 16m 19s | Max: 32m 56s | Hits:  31%/4954  
      🟨 20                 Pass:  73%/23  | Total:  6h 37m | Avg: 17m 17s | Max:  1h 03m | Hits:  88%/2624  
    
  • 🟩 cub: Pass: 100%/47 | Total: 1d 05h | Avg: 37m 09s | Max: 1h 06m | Hits: 73%/3124

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 03h | Avg: 36m 57s | Max:  1h 06m | Hits:  73%/3124  
      🟩 arm64              Pass: 100%/2   | Total:  1h 23m | Avg: 41m 53s | Max: 42m 16s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  4h 10m | Avg: 35m 48s | Max: 54m 17s | Hits:  74%/781   
      🟩 12.5               Pass: 100%/2   | Total:  1h 43m | Avg: 51m 54s | Max: 54m 28s
      🟩 12.6               Pass: 100%/38  | Total: 23h 12m | Avg: 36m 38s | Max:  1h 06m | Hits:  73%/2343  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 56m | Avg: 58m 25s | Max:  1h 06m
      🟩 nvcc11.1           Pass: 100%/7   | Total:  4h 10m | Avg: 35m 48s | Max: 54m 17s | Hits:  74%/781   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 43m | Avg: 51m 54s | Max: 54m 28s
      🟩 nvcc12.6           Pass: 100%/36  | Total: 21h 15m | Avg: 35m 25s | Max: 53m 17s | Hits:  73%/2343  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 25s | Max:  1h 06m
      🟩 nvcc               Pass: 100%/45  | Total:  1d 03h | Avg: 36m 13s | Max: 54m 28s | Hits:  73%/3124  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  2h 28m | Avg: 37m 13s | Max: 41m 49s
      🟩 Clang10            Pass: 100%/1   | Total: 41m 41s | Avg: 41m 41s | Max: 41m 41s
      🟩 Clang11            Pass: 100%/1   | Total: 39m 13s | Avg: 39m 13s | Max: 39m 13s
      🟩 Clang12            Pass: 100%/1   | Total: 38m 13s | Avg: 38m 13s | Max: 38m 13s
      🟩 Clang13            Pass: 100%/1   | Total: 38m 09s | Avg: 38m 09s | Max: 38m 09s
      🟩 Clang14            Pass: 100%/1   | Total: 39m 40s | Avg: 39m 40s | Max: 39m 40s
      🟩 Clang15            Pass: 100%/1   | Total: 37m 54s | Avg: 37m 54s | Max: 37m 54s
      🟩 Clang16            Pass: 100%/1   | Total: 40m 31s | Avg: 40m 31s | Max: 40m 31s
      🟩 Clang17            Pass: 100%/1   | Total: 38m 38s | Avg: 38m 38s | Max: 38m 38s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 37m | Avg: 39m 40s | Max:  1h 06m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 06m | Avg: 33m 04s | Max: 33m 34s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 17m | Avg: 38m 43s | Max: 39m 43s
      🟩 GCC8               Pass: 100%/1   | Total: 39m 15s | Avg: 39m 15s | Max: 39m 15s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 44m | Avg: 34m 52s | Max: 41m 13s
      🟩 GCC10              Pass: 100%/1   | Total: 38m 23s | Avg: 38m 23s | Max: 38m 23s
      🟩 GCC11              Pass: 100%/1   | Total: 40m 16s | Avg: 40m 16s | Max: 40m 16s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 40m 53s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 31m | Avg: 26m 28s | Max: 43m 45s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 42m 18s | Avg: 42m 18s | Max: 42m 18s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 17s | Avg: 54m 17s | Max: 54m 17s | Hits:  74%/781   
      🟩 MSVC14.29          Pass: 100%/1   | Total: 51m 23s | Avg: 51m 23s | Max: 51m 23s | Hits:  74%/781   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 11s | Max: 53m 17s | Hits:  72%/1562  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 54s | Max: 54m 28s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 12h 20m | Avg: 38m 58s | Max:  1h 06m
      🟩 GCC                Pass: 100%/21  | Total: 10h 49m | Avg: 30m 56s | Max: 43m 45s
      🟩 Intel              Pass: 100%/1   | Total: 42m 18s | Avg: 42m 18s | Max: 42m 18s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 30m | Avg: 52m 30s | Max: 54m 17s | Hits:  73%/3124  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 43m | Avg: 51m 54s | Max: 54m 28s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 07s | Avg: 15m 33s | Max: 16m 02s
      🟩 v100               Pass: 100%/45  | Total:  1d 04h | Avg: 38m 07s | Max:  1h 06m | Hits:  73%/3124  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 02h | Avg: 40m 21s | Max:  1h 06m | Hits:  73%/3124  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 16m 04s | Avg: 16m 04s | Max: 16m 04s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 12s | Avg: 17m 12s | Max: 17m 12s
      🟩 HostLaunch         Pass: 100%/3   | Total: 54m 24s | Avg: 18m 08s | Max: 19m 18s
      🟩 TestGPU            Pass: 100%/2   | Total: 44m 35s | Avg: 22m 17s | Max: 24m 41s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 07s | Avg: 15m 33s | Max: 16m 02s
      🟩 90a                Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  3h 01m | Avg: 36m 15s | Max: 41m 49s
      🟩 14                 Pass: 100%/4   | Total:  2h 44m | Avg: 41m 12s | Max: 54m 17s | Hits:  74%/781   
      🟩 17                 Pass: 100%/12  | Total:  8h 25m | Avg: 42m 08s | Max: 51m 23s | Hits:  74%/1562  
      🟩 20                 Pass: 100%/26  | Total: 14h 54m | Avg: 34m 24s | Max:  1h 06m | Hits:  71%/781   
    
  • 🟩 thrust: Pass: 100%/46 | Total: 17h 35m | Avg: 22m 56s | Max: 1h 04m | Hits: 67%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 30m 00s | Avg: 15m 00s | Max: 18m 45s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 16h 56m | Avg: 23m 06s | Max:  1h 04m | Hits:  67%/9260  
      🟩 arm64              Pass: 100%/2   | Total: 38m 27s | Avg: 19m 13s | Max: 23m 06s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  2h 03m | Avg: 17m 36s | Max: 56m 10s | Hits:  59%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 51s | Max:  1h 01m
      🟩 12.6               Pass: 100%/37  | Total: 13h 34m | Avg: 22m 00s | Max:  1h 04m | Hits:  69%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 26m 10s | Avg: 13m 05s | Max: 13m 27s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  2h 03m | Avg: 17m 36s | Max: 56m 10s | Hits:  59%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 57m | Avg: 58m 51s | Max:  1h 01m
      🟩 nvcc12.6           Pass: 100%/35  | Total: 13h 08m | Avg: 22m 30s | Max:  1h 04m | Hits:  69%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 26m 10s | Avg: 13m 05s | Max: 13m 27s
      🟩 nvcc               Pass: 100%/44  | Total: 17h 08m | Avg: 23m 23s | Max:  1h 04m | Hits:  67%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 55m 04s | Avg: 13m 46s | Max: 22m 51s
      🟩 Clang10            Pass: 100%/1   | Total: 23m 44s | Avg: 23m 44s | Max: 23m 44s
      🟩 Clang11            Pass: 100%/1   | Total: 17m 00s | Avg: 17m 00s | Max: 17m 00s
      🟩 Clang12            Pass: 100%/1   | Total: 16m 34s | Avg: 16m 34s | Max: 16m 34s
      🟩 Clang13            Pass: 100%/1   | Total: 17m 37s | Avg: 17m 37s | Max: 17m 37s
      🟩 Clang14            Pass: 100%/1   | Total: 22m 55s | Avg: 22m 55s | Max: 22m 55s
      🟩 Clang15            Pass: 100%/1   | Total: 21m 00s | Avg: 21m 00s | Max: 21m 00s
      🟩 Clang16            Pass: 100%/1   | Total: 18m 39s | Avg: 18m 39s | Max: 18m 39s
      🟩 Clang17            Pass: 100%/1   | Total: 25m 39s | Avg: 25m 39s | Max: 25m 39s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 45m | Avg: 15m 02s | Max: 20m 05s
      🟩 GCC6               Pass: 100%/2   | Total: 21m 11s | Avg: 10m 35s | Max: 15m 04s
      🟩 GCC7               Pass: 100%/2   | Total: 26m 20s | Avg: 13m 10s | Max: 15m 24s
      🟩 GCC8               Pass: 100%/1   | Total: 19m 39s | Avg: 19m 39s | Max: 19m 39s
      🟩 GCC9               Pass: 100%/3   | Total: 49m 38s | Avg: 16m 32s | Max: 22m 43s
      🟩 GCC10              Pass: 100%/1   | Total: 21m 34s | Avg: 21m 34s | Max: 21m 34s
      🟩 GCC11              Pass: 100%/1   | Total: 23m 43s | Avg: 23m 43s | Max: 23m 43s
      🟩 GCC12              Pass: 100%/1   | Total: 26m 40s | Avg: 26m 40s | Max: 26m 40s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 10m | Avg: 16m 15s | Max: 25m 03s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 36m 25s | Avg: 36m 25s | Max: 36m 25s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 10s | Avg: 56m 10s | Max: 56m 10s | Hits:  59%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 53m 23s | Avg: 53m 23s | Max: 53m 23s | Hits:  60%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 29m | Avg: 49m 43s | Max:  1h 04m | Hits:  72%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 51s | Max:  1h 01m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  5h 23m | Avg: 17m 01s | Max: 25m 39s
      🟩 GCC                Pass: 100%/19  | Total:  5h 18m | Avg: 16m 46s | Max: 26m 40s
      🟩 Intel              Pass: 100%/1   | Total: 36m 25s | Avg: 36m 25s | Max: 36m 25s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 18m | Avg: 51m 44s | Max:  1h 04m | Hits:  67%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 57m | Avg: 58m 51s | Max:  1h 01m
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 17h 35m | Avg: 22m 56s | Max:  1h 04m | Hits:  67%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 16h 16m | Avg: 24m 24s | Max:  1h 04m | Hits:  59%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 37m 32s | Avg: 12m 30s | Max: 22m 30s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 41m 16s | Avg: 13m 45s | Max: 17m 40s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  8m 54s | Avg:  8m 54s | Max:  8m 54s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 44m 10s | Avg:  8m 50s | Max: 13m 16s
      🟩 14                 Pass: 100%/4   | Total:  1h 49m | Avg: 27m 22s | Max: 56m 10s | Hits:  59%/1852  
      🟩 17                 Pass: 100%/12  | Total:  6h 05m | Avg: 30m 28s | Max:  1h 04m | Hits:  59%/3704  
      🟩 20                 Pass: 100%/23  | Total:  8h 25m | Avg: 21m 59s | Max:  1h 01m | Hits:  79%/3704  
    
  • 🟩 cudax: Pass: 100%/26 | Total: 2h 06m | Avg: 4m 52s | Max: 23m 35s | Hits: 92%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  1h 56m | Avg:  5m 16s | Max: 23m 35s | Hits:  92%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 34s | Avg:  2m 38s | Max:  2m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 14m 25s | Avg:  4m 48s | Max:  8m 27s | Hits:  92%/156   
      🟩 12.5               Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 01s
      🟩 12.6               Pass: 100%/21  | Total:  1h 42m | Avg:  4m 52s | Max: 23m 35s | Hits:  92%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 14m 25s | Avg:  4m 48s | Max:  8m 27s | Hits:  92%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 01s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 42m | Avg:  4m 52s | Max: 23m 35s | Hits:  92%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/26  | Total:  2h 06m | Avg:  4m 52s | Max: 23m 35s | Hits:  92%/312   
    🟩 cxx
      🟩 Clang9             Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang10            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 03s | Avg:  3m 03s | Max:  3m 03s
      🟩 Clang12            Pass: 100%/1   | Total:  2m 57s | Avg:  2m 57s | Max:  2m 57s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
      🟩 Clang18            Pass: 100%/4   | Total: 31m 58s | Avg:  7m 59s | Max: 23m 35s
      🟩 GCC9               Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 GCC10              Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s
      🟩 GCC12              Pass: 100%/2   | Total: 19m 42s | Avg:  9m 51s | Max: 16m 37s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 44s | Avg:  2m 41s | Max:  2m 58s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 27s | Avg:  8m 27s | Max:  8m 27s | Hits:  92%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 23s | Avg:  8m 23s | Max:  8m 23s | Hits:  92%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 01s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/13  | Total:  1h 00m | Avg:  4m 39s | Max: 23m 35s
      🟩 GCC                Pass: 100%/9   | Total: 39m 14s | Avg:  4m 21s | Max: 16m 37s
      🟩 MSVC               Pass: 100%/2   | Total: 16m 50s | Avg:  8m 25s | Max:  8m 27s | Hits:  92%/312   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 01s
    🟩 gpu
      🟩 v100               Pass: 100%/26  | Total:  2h 06m | Avg:  4m 52s | Max: 23m 35s | Hits:  92%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/24  | Total:  1h 26m | Avg:  3m 35s | Max:  8m 27s | Hits:  92%/312   
      🟩 Test               Pass: 100%/2   | Total: 40m 12s | Avg: 20m 06s | Max: 23m 35s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
      🟩 90a                Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s
    🟩 std
      🟩 17                 Pass: 100%/6   | Total: 19m 13s | Avg:  3m 12s | Max:  5m 01s
      🟩 20                 Pass: 100%/20  | Total:  1h 47m | Avg:  5m 22s | Max: 23m 35s | Hits:  92%/312   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 52s | Avg: 4m 56s | Max: 7m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  7m 50s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
      🟩 Test               Pass: 100%/1   | Total:  7m 50s | Avg:  7m 50s | Max:  7m 50s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 16s | Avg: 25m 16s | Max: 25m 16s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 170)

# Runner
125 linux-amd64-cpu16
19 linux-amd64-gpu-v100-latest-1
15 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato
Copy link
Contributor

fbusato commented Dec 17, 2024

great to see this extension for bit_cast!
AFAIK cuda vector types like float2 should be already trivially copyable. Please also consider std::pair

@miscco
Copy link
Collaborator Author

miscco commented Dec 17, 2024

great to see this extension for bit_cast! AFAIK cuda vector types like float2 should be already trivially copyable. Please also consider std::pair

They are, I was talking about __half2 and __nv_bfloat162

The issue with std::pair is that it is not triviallly copyable or trivially default constructible

@fbusato
Copy link
Contributor

fbusato commented Dec 17, 2024

I guess now you can replace the unsafe_bitcast function used in CUB
https://github.com/NVIDIA/cccl/blob/main/cub/cub/thread/thread_reduce.cuh#L179

@bernhardmgruber
Copy link
Contributor

I guess now you can replace the unsafe_bitcast function used in CUB https://github.com/NVIDIA/cccl/blob/main/cub/cub/thread/thread_reduce.cuh#L179

Please open an issue for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

3 participants