Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fail to download rapids_cpm_generate_pinned_versions nvcomp version 4.0.1.0 from developer site #16772

Closed
pxLi opened this issue Sep 9, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pxLi
Copy link
Member

pxLi commented Sep 9, 2024

Describe the bug
https://github.com/NVIDIA/spark-rapids-jni build relies on the https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/thirdparty/cudf-pins/add_dependency_pins.cmake#L26-L27 which pins the generated cudf build versions (I think nvcomp project version in cmake file mismatch the pkg (4.0.1.0 vs 4.0.1) they provided on developer site) target/libcudf-install/lib64/cmake/nvcomp/nvcomp-config.cmake (found version "4.0.1.0")

    include(${rapids-cmake-dir}/cpm/package_override.cmake)
    rapids_cpm_package_override(${CMAKE_CURRENT_FUNCTION_LIST_DIR}/versions.json)

and caused that the pined version become 4.0.1.0 https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/thirdparty/cudf-pins/versions.json#L131

    include(${rapids-cmake-dir}/cpm/generate_pinned_versions.cmake)
    rapids_cpm_generate_pinned_versions(OUTPUT ${CMAKE_CURRENT_FUNCTION_LIST_DIR}/versions.json)

when update cudf submodule ref and pin their deps versions,

      {
        "11" : "11.x",
        "12" : "12.x"
      },
      "version" : "4.0.1.0"
    },
13:32:10  [INFO]      [exec] -- Downloading...
13:32:10  [INFO]      [exec]    dst='/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-867-cuda11/target/libcudf/cmake-build/_deps/nvcomp_proprietary_binary-subbuild/nvcomp_proprietary_binary-populate-prefix/src/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz'
13:32:10  [INFO]      [exec]    timeout='none'
13:32:10  [INFO]      [exec]    inactivity timeout='none'
13:32:10  [INFO]      [exec] -- Using src='https://developer.download.nvidia.com/compute/nvcomp/4.0.1.0/local_installers/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz'
13:32:10  [INFO]      [exec] -- [download 0% complete]
13:32:10  [INFO]      [exec] CMake Error at nvcomp_proprietary_binary-subbuild/nvcomp_proprietary_binary-populate-prefix/src/nvcomp_proprietary_binary-populate-stamp/download-nvcomp_proprietary_binary-populate.cmake:170 (message):
13:32:10  [INFO]      [exec]   Each download failed!
13:32:10  [INFO]      [exec] 
13:32:10  [INFO]      [exec]     error: downloading 'https://developer.download.nvidia.com/compute/nvcomp/4.0.1.0/local_installers/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz' failed

4.0.1 exists

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior
A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@pxLi pxLi added the bug Something isn't working label Sep 9, 2024
@pxLi
Copy link
Member Author

pxLi commented Sep 9, 2024

the generated version should be available or the same as the one (4.0.1 with no trailing 0) at rapids-cmake. cc @vuule can you help take a look? thanks

@pxLi pxLi changed the title [BUG] nvcomp 4.0.1.0 failed to download 404 [BUG] fail to download rapids_cpm_generate_pinned_versions nvcomp version 4.0.1.0 from developer site Sep 9, 2024
@bdice
Copy link
Contributor

bdice commented Oct 15, 2024

This is an nvcomp issue, and is not easily addressable within cudf. This was reported internally and a fix has been marked as complete. Future nvcomp releases should be fine. Closing this, as it has a workaround in NVIDIA/spark-rapids-jni#2388 and an upstream fix.

@bdice bdice closed this as completed Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants