Skip to content

Commit

Permalink
New cuda installer tool (#35)
Browse files Browse the repository at this point in the history
* WIP: Maybe 80% there.

Signed-off-by: Maciej Strzelczyk <[email protected]>

* The functionality seems complete.

* Docstrings, license headers and reformat

* Updating the tool and tests.

Signed-off-by: Maciej Strzelczyk <[email protected]>

* Fixing test problems and updating the old script.

* Final fixes to the READMEs

---------

Signed-off-by: Maciej Strzelczyk <[email protected]>
  • Loading branch information
m-strzelczyk authored May 20, 2024
1 parent 7d1f09e commit a800bd5
Show file tree
Hide file tree
Showing 16 changed files with 1,225 additions and 101 deletions.
55 changes: 27 additions & 28 deletions linux/README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,47 @@
# Installation for Linux.
# Installation for Linux

In the `install_gpu_driver.py` you can find a script that automates installation
of newer GPU drivers for NVIDIA GPU drivers available for Google Compute Engine
instances.
The recommended way to install NVIDIA GPU drivers and CUDA Toolkit for Google Cloud Compute Engine
instances is through the cuda_installer tool. Look for the newest version in the
[releases](https://github.com/GoogleCloudPlatform/compute-gpu-installation/releases)
section of this repository.

The script support the following operating systems:
The `install_gpu_driver.py` script is still available to not break existing setups,
but is considered deprecated and should not be used anymore.

* CentOS: versions 7
* CentOS Stream: version 8
* Debian: versions 10 and 11
* RHEL: versions 7 and 8
* Rocky: version 8
* Ubuntu: version 20 and 21
The tool supports following operating systems (x86_64/amd64 architecture):

Note: Just because an operating system is not supported by this script, doesn't
mean that it's impossible to install NVIDIA drivers on it. You should check and
try instructions on
[NVIDIAs website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
to discover other ways of installing drivers.
* Debian: versions 10, 11 and 12
* RHEL: versions 8 and 9
* Rocky: version 8 and 9
* Ubuntu: version 20, 22 and 24

Note: Just because an operating system is not listed as supported by this tool,
it doesn't mean that it's impossible to install NVIDIA drivers on it. You should check and
try instructions on [NVIDIAs website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) to discover other ways of installing drivers.

## Requirements

The system on which you want to run the script needs to meet the following
requirements:

* Python interpreter in version 3.6 installed (by default available in all
supported OSes except CentOS 7 and RHEL 7).
* Access to Internet (the script needs to download the driver).
* (optional) At least one GPU unit attached.
* Python interpreter in version 3.6 or newer installed.
* Access to Internet (the script needs to download the driver and CUDA tookit).
* At least one GPU unit attached.

## Running the script
## Running the tool

The `install_gpu_driver.py` script needs to be executed with root privileges
(for example `sudo python3 install_gpu_driver.py`).
The `cuda_installer.pyz` script needs to be executed with root privileges
(for example `sudo python3 cuda_installer.pyz`).

Note: On some systems the script might trigger system reboot, it
needs to be restarted after the reboot is done.
Note: During the installation the script will trigger system reboots. After a
reboot, the script needs to be started again to continue the installation process.

After the installation, you should restart your system to make sure everything
is initialized properly and working.
After successfully installation, the tool will restart your system once more to make
sure everything is initialized properly and working system-wide.

## Script output

The installation script logs its outputs to `/opt/google/gpu-installer/` folder.
The installation tool logs its outputs to `/opt/google/cuda-installer/` folder.
If you are facing any problems with the installation, this should be the first
place to check for any errors. When asking for support, you will be asked to
provide the log files from this folder.
76 changes: 76 additions & 0 deletions linux/cuda_installer/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env python3
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os
import sys

import config
from logger import logger
# Need to import all the subpackages here, or the program fails for Python 3.6
from os_installers import get_installer, debian, ubuntu, rhel, rocky


# Mentioning the packages from import above, so automatic import cleanups don't remove them
del debian
del ubuntu
del rhel
del rocky


def parse_args():
parser = argparse.ArgumentParser(
description="Manage GPU drivers and CUDA toolkit installation."
)
parser.add_argument(
"command",
choices=[
"install_driver",
"install_cuda",
"verify_driver",
"verify_cuda",
"uninstall_driver",
],
help="Install GPU driver or CUDA Toolkit.",
)

return parser.parse_args()


if __name__ == "__main__":
if os.geteuid() != 0:
print("This script needs to be run with root privileges!")
sys.exit(1)
args = parse_args()
logger.info(f"Switching to working directory: {config.INSTALLER_DIR}")
os.chdir(config.INSTALLER_DIR)
installer = get_installer()

if args.command == "install_driver":
installer.install_driver()
elif args.command == "verify_driver":
if installer.verify_driver(verbose=True):
sys.exit(0)
else:
sys.exit(1)
elif args.command == "uninstall_driver":
installer.uninstall_driver()
elif args.command == "install_cuda":
installer.install_cuda()
elif args.command == "verify_cuda":
if installer.verify_cuda():
sys.exit(0)
else:
sys.exit(1)
49 changes: 49 additions & 0 deletions linux/cuda_installer/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pathlib

INSTALLER_DIR = pathlib.Path("/opt/google/cuda-installer/")
try:
INSTALLER_DIR.mkdir(parents=True, exist_ok=True)
except PermissionError:
pass


K80_DRIVER_VERSION = "470.239.06"
K80_DEVICE_CODE = "10de:102d"
K80_DRIVER_URL = f"https://us.download.nvidia.com/tesla/{K80_DRIVER_VERSION}/NVIDIA-Linux-x86_64-{K80_DRIVER_VERSION}.run"
K80_DRIVER_SHA256_SUM = (
"7d74caac140a0432d79ebe8e4330dc796f39ba7dd40b3fcd61df760181bf9ccc"
)

CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run"
CUDA_TOOLKIT_SHA256_SUM = (
"367d2299b3a4588ab487a6d27276ca5d9ead6e394904f18bccb9e12433b9c4fb"
)

CUDA_SAMPLES_TARGZ = (
"https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.4.1.tar.gz"
)
CUDA_SAMPLES_SHA256_SUM = (
"01bb311cc8f802a0d243700e4abe6a2d402132c9d97ecf2c64f3fbb1006c304c"
)

CUDA_PROFILE_FILENAME = pathlib.Path("/etc/profile.d/google_cuda_install.sh")
CUDA_BIN_FOLDER = "/usr/local/cuda-12.4/bin"
CUDA_LIB_FOLDER = "/usr/local/cuda-12.4/lib64"

NVIDIA_PERSISTANCED_INSTALLER = (
"/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2"
)
44 changes: 44 additions & 0 deletions linux/cuda_installer/decorators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pathlib
from datetime import datetime

from config import INSTALLER_DIR
from logger import logger


def checkpoint_decorator(file_name: str, skip_message: str):
from os_installers import RebootRequired

def decorator(func):
def wrapper(*args, **kwargs):
if pathlib.Path(INSTALLER_DIR / file_name).exists():
logger.info(skip_message)
return
try:
func(*args, **kwargs)
except RebootRequired:
reboot_required = True
else:
reboot_required = False
with pathlib.Path(INSTALLER_DIR / file_name).open(mode="w") as flag:
flag.write(str(datetime.now()))
flag.flush()
if reboot_required:
raise RebootRequired

return wrapper

return decorator
40 changes: 40 additions & 0 deletions linux/cuda_installer/logger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging
import logging.handlers
import sys

from config import INSTALLER_DIR


logger = logging.getLogger("GoogleCUDAInstaller")
_file_handler = logging.FileHandler(INSTALLER_DIR / "installer.log", mode="a")
_file_handler.level = logging.DEBUG
logger.addHandler(_file_handler)
_sys_handler = logging.handlers.SysLogHandler(
"/dev/log", facility=logging.handlers.SysLogHandler.LOG_LOCAL0
)
_sys_handler.ident = "[GoogleCUDAInstaller] "
_sys_handler.level = logging.INFO
logger.addHandler(_sys_handler)
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.level = logging.INFO
logger.addHandler(stdout_handler)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter("[%(asctime)s] %(levelname)s - %(message)s")
_file_handler.setFormatter(formatter)

__all__ = ["logger"]
Loading

0 comments on commit a800bd5

Please sign in to comment.