Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building for many CUDA archs leads to linker errors #1734

Open
lahwaacz opened this issue Nov 27, 2024 · 3 comments
Open

Building for many CUDA archs leads to linker errors #1734

lahwaacz opened this issue Nov 27, 2024 · 3 comments

Comments

@lahwaacz
Copy link
Contributor

While building a package for Arch Linux, I found that enabling all CUDA architectures (-DGINKGO_CUDA_ARCHITECTURES="All") leads to this error on the final link:

FAILED: lib/libginkgo_cuda.so.1.9.0
: && /usr/bin/c++ -fPIC -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection         -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/ginkgo-hpc-git/src=/usr/src/debug/ginkgo-hpc-git -flto=auto  -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto   -Wl,--dependency-file,cuda/CMakeFiles/ginkgo_cuda.dir/link.d -shared -Wl,-soname,libginkgo_cuda.so.1.9.0 -o lib/libginkgo_cuda.so.1.9.0 devices/cuda/CMakeFiles/ginkgo_cuda_device.dir/executor.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/device.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/exception.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/executor.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/memory.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/nvtx.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/scoped_device_id.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/stream.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/timer.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/version.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.7.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.9.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.10.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.11.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.12.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.14.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.15.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.17.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.18.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.19.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.20.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fft_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/preconditioner/batch_jacobi_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.7.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.9.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.10.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.0.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.1.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.0.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.1.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/lower_trs_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/upper_trs_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/base/device_matrix_data_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/base/index_set_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/absolute_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/fill_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/format_conversion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/precision_conversion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/reduce_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/distributed/partition_helpers_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/distributed/partition_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/coo_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/hybrid_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/permutation_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/scaled_permutation_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/sellp_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/sparsity_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/diagonal_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/multigrid/pgm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/preconditioner/jacobi_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/bicg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/bicgstab_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/cg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/cgs_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/common_gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/fcg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/gcr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/ir_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/dense_kernels.instantiate.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/batch_multi_vector_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/device_matrix_data_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/index_set_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/components/prefix_sum_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/index_map_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/matrix_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/partition_helpers_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/partition_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/vector_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/cholesky_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/factorization_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/ic_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/ilu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/lu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ic_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ict_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_approx_filter_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_filter_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_select_common.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_select_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_spgeam_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_sweep_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_dense_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/coo_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/dense_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/diagonal_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/sellp_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/sparsity_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/multigrid/pgm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/isai_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/sor_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/reorder/rcm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/cb_gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/idr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/multigrid_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/stop/criterion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/stop/residual_norm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.32.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.32.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.32.cpp.o -L/opt/cuda/targets/x86_64-linux/lib/stubs   -L/opt/cuda/targets/x86_64-linux/lib   -L/usr/lib/gcc/x86_64-pc-linux-gnu/13.3.1 -Wl,-rpath,/opt/cuda/targets/x86_64-linux/lib:/build/ginkgo-hpc-git/src/build-cuda/lib:  /opt/cuda/targets/x86_64-linux/lib/libcudart.so  /opt/cuda/targets/x86_64-linux/lib/libcublas.so  /opt/cuda/targets/x86_64-linux/lib/libcusparse.so  /opt/cuda/targets/x86_64-linux/lib/libcurand.so  /opt/cuda/targets/x86_64-linux/lib/libcufft.so  lib/libginkgo_device.so.1.9.0  -ldl  -ldl  /usr/lib/librt.a  /opt/cuda/targets/x86_64-linux/lib/libcublasLt.so  /opt/cuda/targets/x86_64-linux/lib/libculibos.a  /opt/cuda/targets/x86_64-linux/lib/libnvJitLink.so  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `std::_Function_handler<void (cublasContext*), gko::CudaExecutor::init_handles()::{lambda(cublasContext*)#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)':
/usr/include/c++/14.2.1/bits/std_function.h:274:(.text+0xfb): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `std::_Function_handler<void (cusparseContext*), gko::CudaExecutor::init_handles()::{lambda(cusparseContext*)#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)':
/usr/include/c++/14.2.1/bits/std_function.h:274:(.text+0x13b): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `nvtxEtiGetModuleFunctionTable_v3':
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:401:(.text+0x243): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:404:(.text+0x273): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:424:(.text+0x283): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:412:(.text+0x293): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:416:(.text+0x2a3): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:420:(.text+0x2b3): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `nvtxGetExportTable_v3':
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:443:(.text+0x2d7): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `gko::CudaExecutor::get_master()':
/usr/include/c++/14.2.1/ext/atomicity.h:52:(.text+0x332): additional relocation overflows omitted from the output
lib/libginkgo_cuda.so.1.9.0: PC-relative offset overflow in PLT entry for `_ZN3gko7kernels4cuda10run_kernelI17__nv_dl_wrapper_tI11__nv_dl_tagIPFvSt10shared_ptrIKNS_12CudaExecutorEEPKlPKNS_6matrix5DenseIfEEPSD_EXadL_ZNS1_5dense12symm_permuteIflEEvS8_PKT0_PKNSC_IT_EEPSP_EELj1EEJEEJRSF_RSA_RSG_EEEvS8_SO_NS_3dimILm2EmEEDpOT0_'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

For 1.8.0 we worked around it by omitting a few architectures:

# In general, we want to list all real archs (sm_XX) and the latest virtual arch (compute_XX) for future PTX compatibility.
# Valid values can be discovered from nvcc --help
# Compiling Ginkgo for all real architectures triggers linker limits (2 GB binary size). So let's omit 52, 53, 62, 72 from the list.
local _cuda_archs="50;60;61;70;75;80;86;87;89;90;90a;90a-virtual"

cmake -DCMAKE_CUDA_ARCHITECTURES="$_cuda_archs" ...

But building the develop branch now fails again with the same trick... Any ideas? Maybe split libginkgo_cuda.so to several smaller libs?

@yhmtsai
Copy link
Member

yhmtsai commented Nov 27, 2024

if reducing more architectures, will it still happen?

@upsj
Copy link
Member

upsj commented Nov 27, 2024

If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp by emptying all functions, does the issue still appear?

@lahwaacz
Copy link
Contributor Author

if reducing more architectures, will it still happen?

Building for just one architecture works, but that does not help. The intention is to build a general binary package that can be used efficiently on any GPU architecture. Also, I've found a reduced set of archs that works for Ginkgo 1.8.0 but will lead to the same error on the next release (currently develop branch), and it is not practical to reduce architectures again and again for new releases.

If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp by emptying all functions, does the issue still appear?

It is not a problem with one specific library. Just tried to build it on a different system (without any code changes) and a different name appears in the output:

/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x16): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x57): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x76): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x81): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /usr/lib/libc.so.6
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x8e): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x94): additional relocation overflows omitted from the output
lib/libginkgo_cuda.so.1.9.0: PC-relative offset overflow in PLT entry for `_ZN3gko7kernels4cuda10run_kernelI17__nv_dl_wrapper_tI11__nv_dl_tagIPFvSt10shared_ptrIKNS_12CudaExecutorEEPKlPKNS_6matrix5DenseIfEEPSD_EXadL_ZNS1_5dense12symm_permuteIflEEvS8_PKT0_PKNSC_IT_EEPSP_EELj1EEJEEJRSF_RSA_RSG_EEEvS8_SO_NS_3dimILm2EmEEDpOT0_'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants