Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please allow using AdaptiveCpp (acpp) as a replacement for icpx #958

Open
zboszor opened this issue Nov 27, 2024 · 10 comments
Open

Please allow using AdaptiveCpp (acpp) as a replacement for icpx #958

zboszor opened this issue Nov 27, 2024 · 10 comments

Comments

@zboszor
Copy link

zboszor commented Nov 27, 2024

hipBLAS and other optional libraries in chipStar use the git submodule H4I-MKLShim which relies on the Intel icpx SYCL compiler.

However, Yocto with its cross-compiler framework, cannot easily use icpx, even with meta-intel present in the build.

Please, allow AdaptiveCpp to be used for compiling the SYCL sources, which may be more accessible.

@pvelesko
Copy link
Collaborator

@zboszor what about actual MKL? If that's not present in Yocto then there is no point in enabling adaptivecpp

@zboszor
Copy link
Author

zboszor commented Nov 28, 2024

MKL is present in meta-intel, and also the Intel Compute Runtime a.k.a. OpenCL. AdaptiveCpp can also use OpenCL, just like chipStar.

@zboszor
Copy link
Author

zboszor commented Nov 28, 2024

FWIW, I have already created a Yocto recipe for chipStar with only a few minimal patches needed. The samples do run with both Intel OpenCL and Level0.

@zboszor
Copy link
Author

zboszor commented Nov 29, 2024

I somewhat lied. Most of the samples do run, but a few segfaults, like fp16_half2_math and fp16_math. Is it worth reporting them, or are these known issues? I can run them through gdb if needed.

@pvelesko
Copy link
Collaborator

Please dump the traces here along with platform information.
I'll work on adaptivecpp support today

@zboszor
Copy link
Author

zboszor commented Nov 29, 2024

lscpu

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   12
  On-line CPU(s) list:    0-11
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel(R) Corporation
  Model name:             Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    BIOS Model name:      Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz To Be Filled By O.E.M. CPU @ 3.1GHz
    BIOS CPU family:      198
    CPU family:           6
    Model:                158
    Thread(s) per core:   2
    Core(s) per socket:   6
    Socket(s):            1
    Stepping:             10
    CPU(s) scaling MHz:   18%
    CPU max MHz:          4600.0000
    CPU min MHz:          800.0000
    BogoMIPS:             6399.96
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_per
                          fmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic
                           movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase
                           tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_
                          epp vnmi md_clear flush_l1d arch_capabilities
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    192 KiB (6 instances)
  L1i:                    192 KiB (6 instances)
  L2:                     1.5 MiB (6 instances)
  L3:                     12 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-11
Vulnerabilities:          
  Gather data sampling:   Mitigation; Microcode
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Mitigation; TSX disabled

lspci

00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #7 (rev f0)
00:1f.0 ISA bridge: Intel Corporation H310 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8822BE 802.11a/b/g/n/ac WiFi adapter

Output of fp16_half2_math

[root@localhost chip_spv_samples]# ./fp16_half2_math
CHIP info [TID 2554] [1732873171.179529349] : CHIP_PLATFORM=0
CHIP info [TID 2554] [1732873171.179582884] : CHIP_DEVICE_TYPE=gpu
CHIP info [TID 2554] [1732873171.179589354] : CHIP_DEVICE=0
CHIP info [TID 2554] [1732873171.179613632] : CHIP_BE=opencl
CHIP info [TID 2554] [1732873171.179620129] : CHIP_DUMP_SPIRV=off
CHIP info [TID 2554] [1732873171.179626729] : CHIP_JIT_FLAGS_OVERRIDE=
CHIP info [TID 2554] [1732873171.179633126] : CHIP_L0_COLLECT_EVENTS_TIMEOUT=0
CHIP info [TID 2554] [1732873171.179640503] : CHIP_L0_EVENT_TIMEOUT=0
CHIP info [TID 2554] [1732873171.179647408] : CHIP_SKIP_UNINIT=off
CHIP info [TID 2554] [1732873171.179653663] : CHIP_LAZY_JIT=on
CHIP info [TID 2554] [1732873171.179672833] : CHIP_OCL_DISABLE_QUEUE_PROFILING=off
CHIP info [TID 2554] [1732873171.179679604] : CHIP_OCL_USE_ALLOC_STRATEGY=off
CHIP info [TID 2554] [1732873171.179701060] : CHIP_MODULE_CACHE_DIR=/root/.cache/chipStar
CHIP info [TID 2554] [1732873171.211346954] : OpenCL Devices of type gpu with SPIR-V_1 support:
Intel(R) UHD Graphics 630  is supported.

CHIP info [TID 2554] [1732873171.215136096] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.215170432] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.215198277] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.215227768] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.215237756] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.215244688] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.215251927] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.215259405] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.215268269] : JIT Link flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.217844867] : Program LOG for device #0:Intel(R) UHD Graphics 630:
IGC: Internal Compiler Error: Segmentation violation

CHIP error [TID 2554] [1732873171.217870563] : hipErrorNotInitialized (Device library link step failed.) in /usr/src/debug/chipstar/1.2.1/src/backend/OpenCL/CHIPBackendOpenCL.cc:1184:compile

CHIP error [TID 2554] [1732873171.218710906] : Caught Error: hipErrorNotInitialized
HIP Runtime Error: hipErrorNotInitialized
Test sub failed at : 0 || x[i]: 98 sub y[i]: -28 || GPU: 0 CPU: 126
Test sub failed at : 0 || x[i]: 98 sub y[i]: -28 || GPU: 0 CPU: 126
Test sub failed at : 1 || x[i]: -41 sub y[i]: -55 || GPU: 0 CPU: 14
Test sub failed at : 1 || x[i]: -41 sub y[i]: -55 || GPU: 0 CPU: 14
Test sub failed at : 2 || x[i]: -1 sub y[i]: 68 || GPU: 0 CPU: -69
Test sub failed at : 2 || x[i]: -1 sub y[i]: 68 || GPU: 0 CPU: -69
Test sub failed at : 3 || x[i]: -12 sub y[i]: 78 || GPU: 0 CPU: -90
Test sub failed at : 3 || x[i]: -12 sub y[i]: 78 || GPU: 0 CPU: -90
Test sub errors: 127916
CHIP info [TID 2554] [1732873171.220731604] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.220740470] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.220747789] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.220767190] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.220773872] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.220779898] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.220803858] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2554] [1732873171.220827636] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2554] [1732873171.220850036] : JIT Link flags:  -cl-kernel-arg-info -cl-std=CL3.0
Segmentation fault (core dumped)

backtrace:

[root@localhost chip_spv_samples]# gdb ./fp16_half2_math
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-oe-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./fp16_half2_math...
(No debugging symbols found in ./fp16_half2_math)
(gdb) run
Starting program: /usr/bin/chip_spv_samples/fp16_half2_math 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
CHIP info [TID 2595] [1732873369.050304188] : CHIP_PLATFORM=0
CHIP info [TID 2595] [1732873369.050350797] : CHIP_DEVICE_TYPE=gpu
CHIP info [TID 2595] [1732873369.050356493] : CHIP_DEVICE=0
CHIP info [TID 2595] [1732873369.050361974] : CHIP_BE=opencl
CHIP info [TID 2595] [1732873369.050367204] : CHIP_DUMP_SPIRV=off
CHIP info [TID 2595] [1732873369.050372440] : CHIP_JIT_FLAGS_OVERRIDE=
CHIP info [TID 2595] [1732873369.050377649] : CHIP_L0_COLLECT_EVENTS_TIMEOUT=0
CHIP info [TID 2595] [1732873369.050386399] : CHIP_L0_EVENT_TIMEOUT=0
CHIP info [TID 2595] [1732873369.050391675] : CHIP_SKIP_UNINIT=off
CHIP info [TID 2595] [1732873369.050398458] : CHIP_LAZY_JIT=on
CHIP info [TID 2595] [1732873369.050404499] : CHIP_OCL_DISABLE_QUEUE_PROFILING=off
CHIP info [TID 2595] [1732873369.050410748] : CHIP_OCL_USE_ALLOC_STRATEGY=off
CHIP info [TID 2595] [1732873369.050418339] : CHIP_MODULE_CACHE_DIR=/root/.cache/chipStar
[New Thread 0x7fffe967a6c0 (LWP 2598)]
CHIP info [TID 2595] [1732873369.303598903] : OpenCL Devices of type gpu with SPIR-V_1 support:
Intel(R) UHD Graphics 630  is supported.

CHIP info [TID 2595] [1732873369.306880636] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2595] [1732873369.306899636] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2595] [1732873369.306907792] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2595] [1732873369.306913914] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2595] [1732873369.306921572] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2595] [1732873369.306929420] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2595] [1732873369.306937322] : JIT flags:  -cl-kernel-arg-info -cl-std=CL3.0
CHIP info [TID 2595] [1732873369.306944548] : Program LOG for device #0:Intel(R) UHD Graphics 630:


CHIP info [TID 2595] [1732873369.306954010] : JIT Link flags:  -cl-kernel-arg-info -cl-std=CL3.0

Thread 1 "fp16_half2_math" received signal SIGSEGV, Segmentation fault.
0x00007fffdd1324aa in ?? () from /usr/lib/libLLVM-15.so
(gdb) bt
#0  0x00007fffdd1324aa in ?? () from /usr/lib/libLLVM-15.so
#1  0x00007fffdd1496ba in llvm::BitcodeWriter::writeModule(llvm::Module const&, bool, llvm::ModuleSummaryIndex const*, bool, std::array<unsigned int, 5ul>*) () from /usr/lib/libLLVM-15.so
#2  0x00007fffdd14b2a1 in llvm::WriteBitcodeToFile(llvm::Module const&, llvm::raw_ostream&, bool, llvm::ModuleSummaryIndex const*, bool, std::array<unsigned int, 5ul>*) () from /usr/lib/libLLVM-15.so
#3  0x00007fffd50332c6 in ?? () from /usr/lib/libigc.so.1
#4  0x00007fffd50ad241 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int, void*) const () from /usr/lib/libigc.so.1
#5  0x00007fffd50af033 in IGC::IgcOclTranslationCtx<2ul>::TranslateImpl(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int, void*) () from /usr/lib/libigc.so.1
#6  0x00007ffff701a184 in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#7  0x00007ffff6f97eda in ?? () from /usr/lib/intel-opencl/libigdrcl.so
#8  0x00007ffff7862dc7 in clLinkProgram () from /usr/lib/libOpenCL.so.1
#9  0x00007ffff7f07faf in cl::linkProgram (inputPrograms=..., options=0x5555557687e0 " -cl-kernel-arg-info -cl-std=CL3.0", notifyFptr=0x0, data=0x0, err=<optimized out>) at include/CL/opencl.hpp:6976
#10 CHIPModuleOpenCL::compile (this=0x5555556dcc60, ChipDev=0x555555733510) at /usr/src/debug/chipstar/1.2.1/src/backend/OpenCL/CHIPBackendOpenCL.cc:1175
#11 0x00007ffff7f2259f in CHIPDeviceOpenCL::compile (this=0x555555733510, SrcMod=...) at src/backend/OpenCL/CHIPBackendOpenCL.hh:334
#12 0x00007ffff7e62cac in chipstar::Device::getOrCreateModule (this=0x555555733510, SrcMod=...) at /usr/src/debug/chipstar/1.2.1/src/CHIPBackend.cc:1051
#13 0x00007ffff7e6210a in chipstar::Device::getOrCreateModule (this=0x555555733510, Ptr=...) at /usr/src/debug/chipstar/1.2.1/src/CHIPBackend.cc:986
#14 0x00007ffff7ee02ee in chipstar::Device::prepareDeviceVariables (this=0x555555733510, Ptr=...) at src/CHIPBackend.cc:946
#15 hipLaunchKernelInternal (HostFunction=0x55555555fc38, GridDim=..., BlockDim=..., Args=0x7fffffffe230, SharedMem=0, Stream=0x0) at /usr/src/debug/chipstar/1.2.1/src/CHIPBindings.cc:5599
#16 0x00007ffff7ee001d in hipLaunchKernel (HostFunction=0x55555555fc38, GridDim=..., BlockDim=..., Args=0x7fffffffe230, SharedMem=<optimized out>, Stream=<optimized out>)
    at /usr/src/debug/chipstar/1.2.1/src/CHIPBindings.cc:5618
#17 0x00005555555576ee in ?? ()
#18 0x00007ffff796afdb in __libc_start_call_main (main=main@entry=0x555555557400, argc=argc@entry=1, argv=argv@entry=0x7fffffffe398)
    at /usr/src/debug/glibc/2.40+git/sysdeps/nptl/libc_start_call_main.h:58
#19 0x00007ffff796b099 in __libc_start_main_impl (main=0x555555557400, argc=1, argv=0x7fffffffe398, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe388)
    at /usr/src/debug/glibc/2.40+git/csu/libc-start.c:360
#20 0x0000555555556255 in ?? ()
(gdb) quit
A debugging session is active.

	Inferior 1 [process 2595] will be killed.

Quit anyway? (y or n) y

The crash is inside the Intel OpenCL driver and eventually in libLLVM-15.so.

I will try to rebuild it with LLVM 14, as LLVM 15 support in the Intel Graphics Compiler is considered beta quality.

@zboszor
Copy link
Author

zboszor commented Dec 3, 2024

Fixed the IGC / Compute Runtime build on my side with Yocto. Now it uses LLVM 14 and an updated version of the SPIRV-LLVM-Translator from the LLVM 14 compatible branch.

zboszor/meta-clang-revival#3

The previously crashing fp16 related tests work and report "passed". I tried running every HIP and CUDA samples and they all work.

@pvelesko
Copy link
Collaborator

pvelesko commented Dec 3, 2024

@zboszor ran into issues compiling AdaptiveCpp, will try again today

@pvelesko
Copy link
Collaborator

pvelesko commented Dec 4, 2024

@zboszor switching to acpp for compiling the MKLShim layer is not as trivial as I expected as all of the cmake export targets from MKL (which we do not control) expect icpx
This will take some more time but I'll try to look into it more next week.

# We use the SYCL version of MKL, so ensure that our compiler
# knows how to compile SYCL.
# TODO this *only* supports the Intel C++ compiler.
find_package(IntelSYCL REQUIRED)
if(IntelSYCL_FOUND)
    # Ensure that when we look for MKL, we import the
    # support for using its SYCL version.
    # Despite the name of the variable, we are only
    # after the SYCL version, not DPC++ support.
    set(DPCPP_COMPILER ON)
endif()

@zboszor
Copy link
Author

zboszor commented Dec 4, 2024

Take your time. Thank you very much for looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants