Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ubuntu-precompiled] enable offline installation of driver packages #222

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tariq1890
Copy link
Contributor

@tariq1890 tariq1890 commented Feb 11, 2025

This PR introduces a completely new method of installing the nvidia precompiled driver packages.

Motivation

We recently discovered that our offline installs weren't exactly "offline". During the driver container run-time, the driver container was still downloading packages externally. The root cause of this was the multiple apt-get update calls, which would erase the previously downloaded packages from the apt-get install -y --no-install-recommends --download-only <packages> command.

Summary of changes:

i) Download the packages and its dependencies using the following commands

apt-get download $PACKAGE
apt-get download $(apt-cache depends --recurse --no-recommends --no-suggests --no-conflicts --no-breaks --no-replaces --no-enhances $PACKAGE | grep "^\w" | sort -u)

ii) Move to a permanent location and gzip the apt-get downloaded packages.
iii) Create a local apt package source pointing to the new directory with the downloaded packages

NOTE: In this PR, I also move away from installing the giant metapackage nvidia-drivers-${DRIVER_VERSION}-server and purging unneeded packages thereafter. Instead, we just install the packages that we actually need instead of worrying about any bloat

@@ -48,28 +48,21 @@ RUN if [ -n "${CVE_UPDATES}" ]; then \
rm -rf /var/lib/apt/lists/*; \
fi

# update pkg cache and install pkgs for userspace driver libs
RUN apt-get update && apt-get install -y --download-only --no-install-recommends nvidia-driver-${DRIVER_BRANCH}-server \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else was apt-get remote or apt-get clean run after this step which was causing the packages to be removed? apt-get update itself doesn't remove downloaded packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I downloaded packages using apt-get install --download-only and then ran an update apt-get update. I noticed that the downloaded packages in /var/cache/apt/archives were wiped out

nvidia-kernel-source-${DRIVER_BRANCH}-server \
xserver-xorg-video-nvidia-${DRIVER_BRANCH}-server
# Install necessary driver userspace packages
apt-get install -y --no-install-recommends \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will just install utils like nvidia-smi, nvidia-persistenced, nvidia-mps-control etc, but not other required libs like encoder/decoder/gl/cuda libs etc. There are ton of other user space packages that we need. where are those installed as dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the encoder/decoder/gl/cuda libs are required, I can install them. I wasn't aware that they were needed.

I validated the current container with mps and tests like gpu-burn. They worked fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the other packages. Please check again

@tariq1890 tariq1890 force-pushed the offline-install branch 2 times, most recently from 1148aa7 to 5617fe0 Compare February 12, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants