Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker support (Tested on RTX3080) + ignoring reproducable files / cache #50

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

peterwilli
Copy link

@peterwilli peterwilli commented Apr 15, 2022

Intro

I added support for Docker, and made sure cache and build files do not hit the repository, making it easier to work for others on this.

The Docker image downloads Python from source and uses optimized compilation making sure for max possible speed. It replicates the Conda environment and gives easier access for others as well as deploying this to servers!

I added a clause in the README explaining how to install the software on Docker.

Also offers a fix for #49 (this issue won't appear on Docker)

Tested on:

  • Tuxedo Stellaris 15 (AMD Ryzen 9 5900HX - 32GB RAM - RTX 3080 16GB)

"A large blue whale on a freight ship, vector art"

A-large-blue-whale-on-a-freight-ship,-vector-art

@peterwilli peterwilli force-pushed the docker-support branch 2 times, most recently from ee9e5bf to c65bf38 Compare April 15, 2022 12:21
Dockerfile Outdated

RUN python3 -m pip install pip==20.3
RUN pip3 install torch==1.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
RUN pip3 install numpy==1.19.2 torchvision==0.11.2 albumentations==0.4.3 opencv-python==4.1.2.30 pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 pytorch-lightning==1.6.1 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 einops==0.3.0 torch-fidelity==0.3.0 transformers==4.3.1 -e "git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers" -e "git+https://github.com/openai/CLIP.git@main#egg=clip"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why not put these in a requirements file so its easier to read?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, except RUN python3 -m pip install pip==20.3 because the goal was to stay as close to the Conda environment as possible to make sure we won't run into issues that they wouldn't have otherwise when they chose to install with Conda.

@srelbo
Copy link

srelbo commented Apr 15, 2022

Thank you @peterwilli this is very useful.

Dockerfile Outdated
ENV PATH="/opt/python-3.8.5/bin:${PATH}"

RUN python3 -m pip install pip==20.3
RUN pip3 install torch==1.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ This will probably not work on older GPUs. Perhaps we should state it in the documentation somewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of using build args for this. However, I don't know the exact configuration for each GPU. I used an RTX3080, I think that we should just expose the choice of Torch versions as build arg, and then let people choose themselves. We can set a default version that works for most GPU's.

But I don't think GPU with less than 12GB VRAM works anyway, so it narrows down the old GPU's supported, depending on how old we're talking about!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, is there a better way to get the right Torch version supporting the GPU of the machine running the image builder I'm not aware of? Currently, all my Dockerfiles are configured for my GPU, but it makes it less portable.

@peterwilli
Copy link
Author

Thank you @peterwilli this is very useful.

Thank you, it's useful for me too. I'll work with you to resolve the comments.

@peterwilli
Copy link
Author

peterwilli commented Jun 5, 2022

Hey @srelbo! I uploaded a new commit, sorry for the long wait. The Dockerfile now uses Conda rather than trying hard to replicate the same environment. The only exception is #49 which can safely be deleted after that is fixed.

In addition, we use the pre-made nvcr.io images as base. These are officially made by Nvidia, so they should be trustworthy and widely supported. I had good experience with them before in my work at LAION medical (https://github.com/LAION-AI/medical) so I think this will be the best we can get with Docker...

The README is slightly adjusted for the new command structure (you have to call python now, something I did because now, for debugging reasons or otherwise, one can bash into the Docker container should they want to).

@srelbo
Copy link

srelbo commented Jun 5, 2022

That's awesome! Thank you so much @peterwilli ! We will merge your commit to our fork of this branch. But I think it should get it into mainline too, so everyone can benefit from your work.

@rromb can you please review and merge?

Also @oguzelibol FYI

@Utopiah
Copy link

Utopiah commented Aug 10, 2022

FWIW just tried it, trying to follow the instructions as well as I could. The image was built but running it, using the suggested parameters I get ModuleNotFoundError: No module named 'kornia'.

PS: if that helps somehow Ubuntu 22.04 with nvidia-docker2 showing nvidia-smi working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants