Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link_nvidia_host_libraries reports success even on failure #699

Open
Flamefire opened this issue Sep 12, 2024 · 0 comments
Open

link_nvidia_host_libraries reports success even on failure #699

Flamefire opened this issue Sep 12, 2024 · 0 comments

Comments

@Flamefire
Copy link
Contributor

The script is missing some error checking. When I run it the output is:

+ /cvmfs/software.eessi.io/versions/2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Found ldconfig in the following locations:
- /sbin/ldconfig
- /usr/sbin/ldconfig
Using first version
Found NVIDIA GPU driver version 560.35.03
Found host CUDA version 12.6
The host GPU driver libraries (v560.35.03) have already been linked! (based on /cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/driver_version.txt)
ln: failed to create symbolic link 'lib/latest': Permission denied
Host NVIDIA GPU drivers linked successfully for EESSI

So there is an error due to a permission denial but success is still reported and I have no indication for which folder the issue occurred.

I suggest some more status output on locations used, at least on failure, and checks for each command run with a clear error message.

One strategy would be something like:

function handleError {
    echo -e "${RED}ERROR in Line ${BASH_LINENO[0]}: $*${NC}" >&2
    exit 1
}

ln -s foo bar || handleError "Creating this in $that failed"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant