-
Notifications
You must be signed in to change notification settings - Fork 36
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* WIP: Maybe 80% there. Signed-off-by: Maciej Strzelczyk <[email protected]> * The functionality seems complete. * Docstrings, license headers and reformat * Updating the tool and tests. Signed-off-by: Maciej Strzelczyk <[email protected]> * Fixing test problems and updating the old script. * Final fixes to the READMEs --------- Signed-off-by: Maciej Strzelczyk <[email protected]>
- Loading branch information
1 parent
7d1f09e
commit a800bd5
Showing
16 changed files
with
1,225 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,48 +1,47 @@ | ||
# Installation for Linux. | ||
# Installation for Linux | ||
|
||
In the `install_gpu_driver.py` you can find a script that automates installation | ||
of newer GPU drivers for NVIDIA GPU drivers available for Google Compute Engine | ||
instances. | ||
The recommended way to install NVIDIA GPU drivers and CUDA Toolkit for Google Cloud Compute Engine | ||
instances is through the cuda_installer tool. Look for the newest version in the | ||
[releases](https://github.com/GoogleCloudPlatform/compute-gpu-installation/releases) | ||
section of this repository. | ||
|
||
The script support the following operating systems: | ||
The `install_gpu_driver.py` script is still available to not break existing setups, | ||
but is considered deprecated and should not be used anymore. | ||
|
||
* CentOS: versions 7 | ||
* CentOS Stream: version 8 | ||
* Debian: versions 10 and 11 | ||
* RHEL: versions 7 and 8 | ||
* Rocky: version 8 | ||
* Ubuntu: version 20 and 21 | ||
The tool supports following operating systems (x86_64/amd64 architecture): | ||
|
||
Note: Just because an operating system is not supported by this script, doesn't | ||
mean that it's impossible to install NVIDIA drivers on it. You should check and | ||
try instructions on | ||
[NVIDIAs website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) | ||
to discover other ways of installing drivers. | ||
* Debian: versions 10, 11 and 12 | ||
* RHEL: versions 8 and 9 | ||
* Rocky: version 8 and 9 | ||
* Ubuntu: version 20, 22 and 24 | ||
|
||
Note: Just because an operating system is not listed as supported by this tool, | ||
it doesn't mean that it's impossible to install NVIDIA drivers on it. You should check and | ||
try instructions on [NVIDIAs website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) to discover other ways of installing drivers. | ||
|
||
## Requirements | ||
|
||
The system on which you want to run the script needs to meet the following | ||
requirements: | ||
|
||
* Python interpreter in version 3.6 installed (by default available in all | ||
supported OSes except CentOS 7 and RHEL 7). | ||
* Access to Internet (the script needs to download the driver). | ||
* (optional) At least one GPU unit attached. | ||
* Python interpreter in version 3.6 or newer installed. | ||
* Access to Internet (the script needs to download the driver and CUDA tookit). | ||
* At least one GPU unit attached. | ||
|
||
## Running the script | ||
## Running the tool | ||
|
||
The `install_gpu_driver.py` script needs to be executed with root privileges | ||
(for example `sudo python3 install_gpu_driver.py`). | ||
The `cuda_installer.pyz` script needs to be executed with root privileges | ||
(for example `sudo python3 cuda_installer.pyz`). | ||
|
||
Note: On some systems the script might trigger system reboot, it | ||
needs to be restarted after the reboot is done. | ||
Note: During the installation the script will trigger system reboots. After a | ||
reboot, the script needs to be started again to continue the installation process. | ||
|
||
After the installation, you should restart your system to make sure everything | ||
is initialized properly and working. | ||
After successfully installation, the tool will restart your system once more to make | ||
sure everything is initialized properly and working system-wide. | ||
|
||
## Script output | ||
|
||
The installation script logs its outputs to `/opt/google/gpu-installer/` folder. | ||
The installation tool logs its outputs to `/opt/google/cuda-installer/` folder. | ||
If you are facing any problems with the installation, this should be the first | ||
place to check for any errors. When asking for support, you will be asked to | ||
provide the log files from this folder. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
#!/usr/bin/env python3 | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import argparse | ||
import os | ||
import sys | ||
|
||
import config | ||
from logger import logger | ||
# Need to import all the subpackages here, or the program fails for Python 3.6 | ||
from os_installers import get_installer, debian, ubuntu, rhel, rocky | ||
|
||
|
||
# Mentioning the packages from import above, so automatic import cleanups don't remove them | ||
del debian | ||
del ubuntu | ||
del rhel | ||
del rocky | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser( | ||
description="Manage GPU drivers and CUDA toolkit installation." | ||
) | ||
parser.add_argument( | ||
"command", | ||
choices=[ | ||
"install_driver", | ||
"install_cuda", | ||
"verify_driver", | ||
"verify_cuda", | ||
"uninstall_driver", | ||
], | ||
help="Install GPU driver or CUDA Toolkit.", | ||
) | ||
|
||
return parser.parse_args() | ||
|
||
|
||
if __name__ == "__main__": | ||
if os.geteuid() != 0: | ||
print("This script needs to be run with root privileges!") | ||
sys.exit(1) | ||
args = parse_args() | ||
logger.info(f"Switching to working directory: {config.INSTALLER_DIR}") | ||
os.chdir(config.INSTALLER_DIR) | ||
installer = get_installer() | ||
|
||
if args.command == "install_driver": | ||
installer.install_driver() | ||
elif args.command == "verify_driver": | ||
if installer.verify_driver(verbose=True): | ||
sys.exit(0) | ||
else: | ||
sys.exit(1) | ||
elif args.command == "uninstall_driver": | ||
installer.uninstall_driver() | ||
elif args.command == "install_cuda": | ||
installer.install_cuda() | ||
elif args.command == "verify_cuda": | ||
if installer.verify_cuda(): | ||
sys.exit(0) | ||
else: | ||
sys.exit(1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import pathlib | ||
|
||
INSTALLER_DIR = pathlib.Path("/opt/google/cuda-installer/") | ||
try: | ||
INSTALLER_DIR.mkdir(parents=True, exist_ok=True) | ||
except PermissionError: | ||
pass | ||
|
||
|
||
K80_DRIVER_VERSION = "470.239.06" | ||
K80_DEVICE_CODE = "10de:102d" | ||
K80_DRIVER_URL = f"https://us.download.nvidia.com/tesla/{K80_DRIVER_VERSION}/NVIDIA-Linux-x86_64-{K80_DRIVER_VERSION}.run" | ||
K80_DRIVER_SHA256_SUM = ( | ||
"7d74caac140a0432d79ebe8e4330dc796f39ba7dd40b3fcd61df760181bf9ccc" | ||
) | ||
|
||
CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run" | ||
CUDA_TOOLKIT_SHA256_SUM = ( | ||
"367d2299b3a4588ab487a6d27276ca5d9ead6e394904f18bccb9e12433b9c4fb" | ||
) | ||
|
||
CUDA_SAMPLES_TARGZ = ( | ||
"https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.4.1.tar.gz" | ||
) | ||
CUDA_SAMPLES_SHA256_SUM = ( | ||
"01bb311cc8f802a0d243700e4abe6a2d402132c9d97ecf2c64f3fbb1006c304c" | ||
) | ||
|
||
CUDA_PROFILE_FILENAME = pathlib.Path("/etc/profile.d/google_cuda_install.sh") | ||
CUDA_BIN_FOLDER = "/usr/local/cuda-12.4/bin" | ||
CUDA_LIB_FOLDER = "/usr/local/cuda-12.4/lib64" | ||
|
||
NVIDIA_PERSISTANCED_INSTALLER = ( | ||
"/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import pathlib | ||
from datetime import datetime | ||
|
||
from config import INSTALLER_DIR | ||
from logger import logger | ||
|
||
|
||
def checkpoint_decorator(file_name: str, skip_message: str): | ||
from os_installers import RebootRequired | ||
|
||
def decorator(func): | ||
def wrapper(*args, **kwargs): | ||
if pathlib.Path(INSTALLER_DIR / file_name).exists(): | ||
logger.info(skip_message) | ||
return | ||
try: | ||
func(*args, **kwargs) | ||
except RebootRequired: | ||
reboot_required = True | ||
else: | ||
reboot_required = False | ||
with pathlib.Path(INSTALLER_DIR / file_name).open(mode="w") as flag: | ||
flag.write(str(datetime.now())) | ||
flag.flush() | ||
if reboot_required: | ||
raise RebootRequired | ||
|
||
return wrapper | ||
|
||
return decorator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import logging | ||
import logging.handlers | ||
import sys | ||
|
||
from config import INSTALLER_DIR | ||
|
||
|
||
logger = logging.getLogger("GoogleCUDAInstaller") | ||
_file_handler = logging.FileHandler(INSTALLER_DIR / "installer.log", mode="a") | ||
_file_handler.level = logging.DEBUG | ||
logger.addHandler(_file_handler) | ||
_sys_handler = logging.handlers.SysLogHandler( | ||
"/dev/log", facility=logging.handlers.SysLogHandler.LOG_LOCAL0 | ||
) | ||
_sys_handler.ident = "[GoogleCUDAInstaller] " | ||
_sys_handler.level = logging.INFO | ||
logger.addHandler(_sys_handler) | ||
stdout_handler = logging.StreamHandler(sys.stdout) | ||
stdout_handler.level = logging.INFO | ||
logger.addHandler(stdout_handler) | ||
logger.setLevel(logging.DEBUG) | ||
|
||
formatter = logging.Formatter("[%(asctime)s] %(levelname)s - %(message)s") | ||
_file_handler.setFormatter(formatter) | ||
|
||
__all__ = ["logger"] |
Oops, something went wrong.