Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add Graphviz so Karpathy's neural networks zero to hero tutorial will work #471

Open
jeremiahbuckley opened this issue Mar 18, 2024 · 7 comments

Comments

@jeremiahbuckley
Copy link

Why you need this feature:
This youtube series ( https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ ) has 1.5M views and the author, Andrej Karpathy, is a very influential educator on AI/ML topics. There is an accompanying github account here: https://github.com/karpathy/nn-zero-to-hero/tree/master which has 10K stars.

When running the micrograd notebooks in this repo ( https://github.com/karpathy/nn-zero-to-hero/tree/master/lectures/micrograd ), they fail on the draw_dot(param) command, because that calls out to a binary outside of the jupyter notebook to render the graph. In order to really use graphviz you need to be able to run [yum install graphviz]*

I have tried including graphviz as a pip requirement. That makes most of graphviz usuable, but it doesn't make the draw_dot() functionality useable because of this outside-of-notebook functionality.

This is usying the pytorch notebook.

*It is impossible to use yum install graphviz in a ubi container because it is not available as part of the ubi-9-appstream-rpms, it is available as part of rhel-9-for-x86_64-appstream-rpms .

Describe the solution you'd like:

  1. Easiest, have graphviz already installed as part of a notebook image.
  2. Maybe best: at least allow graphviz to be installed, so that the end user could create a forked container like this and use it for graphviz-requiring workflows:
    FROM quay.io/modh/cuda-notebooks:cuda-jupyter-tensorflow-ubi9-python-3.9-2023b-20240209-0cf5af6
    USER 0
    RUN yum install graphviz
    USER 1001

2.a. Maybe all that is needed is to make graphviz available as part of the ubi-9-appstream-rpms repository.

Anything else you would like to add:

@jeremiahbuckley
Copy link
Author

By the way, the rest of the notebook works if draw_dot() is commented out. So, this is a last-piece-of-the-puzzle type change.

@jiridanek
Copy link
Member

My favorite strategy for addressing this would be to adjust permissions in the container so that regular user can run dnf and can install what's needed. I am trying to gather info if doing this is likely to work mostly fine, or if there are nasty cases where it breaks (say package install scripts that require uid 0, and gid of 0 with gid 0 being able to write everywhere does not resolve it).

In my view, there is no harm in having the workbench user be complete master of their container, as long as they don't have capabilities or UID of 0.

@jiridanek
Copy link
Member

jiridanek commented Nov 27, 2024

Red Hat is intending to support user namespaces eventually. This is I think already present in the latest Kubernetes in some fashion, so it's coming to OpenShift too https://access.redhat.com/solutions/6977863

If you can't access the access.redhat.com, it says that the RFE is tracked in https://issues.redhat.com/browse/RFE-3254

In Kubernetes, the feature entered Beta in 1.30, https://kubernetes.io/docs/concepts/workloads/pods/user-namespaces/

In OpenShift it is a Tech Preview in 4.17 (meaning currently it's unusable in production instances)

@cgruver
Copy link

cgruver commented Nov 27, 2024

I would recommend against running dnf inside of a running container image. It breaks the immutability pattern of containers. It can also lead to a bad user experience because the results of the dnf install will not be persistent except in the ephemeral copy-on-write space of the container.

If on a restart of the Pod, it is scheduled on a different cluster node, then the user will have to run dnf install again. If an image pull strategy of Always is used, this will have the same effect.

The ideal solution is to build a new image which contains the correct dependencies and use that.

Is there a reason that the base image here can't be refactored to include the graphviz dependencies?

@jiridanek
Copy link
Member

jiridanek commented Nov 27, 2024

It can also lead to a bad user experience because the results of the dnf install will not be persistent except in the ephemeral copy-on-write space of the container.

We have similar situation with pip. Python packages added with pip install also don't persist across pod restarts (in our images, we don't have python venv on the mounted volume), but pip install is what we do in various quickstarts to get packages in quickly. First cell in Jupyter Notebooks that our docs offer as part of tutorials tens to be a !pip install .... I am aware of the disadvantage, but I am also attracted by the ad-hoc convenience.

There's an idea floating around to persist these modified images on the fly, even

The ideal solution is to build a new image which contains the correct dependencies and use that.

We're aware, we have Jiras to make this user friendly, https://issues.redhat.com/browse/RHOAIENG-3272 (BYON means build-your-own-notebook, lol)

edit: forgot we have this wizard, also ;P

https://github.com/opendatahub-io-contrib/workbench-images/blob/main/interactive-image-builder.sh

It's a less fancy version of the wip design above.

I would recommend against running dnf inside of a running container image. It breaks the immutability pattern of containers. It can also lead to a bad user experience because the results of the dnf install will not be persistent except in the ephemeral copy-on-write space of the container.

If on a restart of the Pod, it is scheduled on a different cluster node, then the user will have to run dnf install again. If an image pull strategy of Always is used, this will have the same effect.

The ideal solution is to build a new image which contains the correct dependencies and use that.

Is there a reason that the base image here can't be refactored to include the graphviz dependencies?

The general n+1 problem (if we added everything anyone wants, we'd be adding one more thing each time until eventually we add everything). Also the ubi problem, that some packages we like aren't in RHEL/UBI.

@cgruver
Copy link

cgruver commented Nov 27, 2024

We have similar situation with pip. Python packages added with pip install also don't persist across pod restarts (in our images, we don't have python venv on the mounted volume), but pip install is what we do in various quickstarts to get packages in quickly. First cell in Jupyter Notebooks that our docs offer as part of tutorials tens to be a !pip install .... I am aware of the disadvantage, but I am also attracted by the ad-hoc convenience.

OpenShift Dev Spaces (upstream Eclipse Che) addresses your persistence issue for pip install by backing configurable portions of the workspace with a PVC. That might be an option to consider as well for user scoped packages and dependencies... doesn't solve OS level packages.

The general n+1 problem (if we added everything anyone wants, we'd be adding one more thing each time until eventually we add everything). Also the ubi problem, that some packages we like aren't in RHEL/UBI.

Agreed... The devfile.io Universal Developer Image is a victim of that... massive image.

A few of us built a prototype with CeKit to solve the composable image issue for Dev Spaces:

https://github.com/redhat-cop/devspaces-images

@jiridanek
Copy link
Member

jiridanek commented Nov 27, 2024

persistence issue for pip install by backing configurable portions of the workspace with a PVC

We are deliberately making pip install be not persistent, by mounting the PVC in a way so that the virtenv is not on it. To avoid the nightmares with broken workbenches that cannot be returned to a sane state by restarting. It's not a bug, it is a feature, at least that's what the business people told about it : shrug :

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants