-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Familiarize with Intel DSS environment #145
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6002.
|
To proceed with the spec for DSS Intel integration we need to answer following questions:
I went through older poc guide for setting up the Intel support on microk8s and I also went through the new spec which we received. How to install gpu operator for Intel Hardware on Microk8s?In order to install intel gpu plugin to microk8s we need
NOTE: The script to generate yamls is here. How to get jupyter backed images for pytorch and tensorfow?Currently we should be using the
Both of the images come with present jupyter. Keep in mind that there is a setting on the pod's site we need to do in order to run it correctly in DSS. How to support multiple containers on one Intel gpu device?There is a setting in the intel-gpu-plugin. Which enables sharing the gpu accross multiple containers. Without it only one container can get the device. To assign the GPU to container we need this setting. Here is discussion about the setting. How to support iGPUs and dGPUs Intel devices at the same time?This one may not be supported. Please check the discussion. How to support Intel and NVIDIA workloads at the same time?The initial tests in this doc show that it is possible without problems. Where we gonna develop the DSS Intel suppport?Wainting for access to machines with iGPU and dGPU. After that I will rerun all the tests from this doc. How we gonna run CI for Intel support?This might be a challenge we might need a way to access on demand an instance with iGPU dGPU for CI testing. |
Today I got the access to the dell device lab and successfully execute test cases from this spec. Namely:
The process to get access to the lab
Note: read more about the procedure here. |
Changes needed for dss Intel support
Right now the dss status command outputs this information
We need to add one more row about Intel status. The correct way to get the intel device info is under discussion here. First idea is to check for intel gpu labels on kubernetes node.
After discussing with the team we decided to drop the Because of this the Because we are using We also need to add
|
As part of this task we have opnned following issues: When designing the spec we need to align on following open problems: How are we going to recommend installation of theIntel device plugin?Accoding to this spec we need to instruct the user to build the manifests from upstream repository as microk8s has problems with remote urls for its customization feature. the aforementioned spec recommands to keep the built manifests in the DSS repository. This is not ideal solution as DSS should not be responsible for installing the device plugin. Should we be specific about Intel GPUs' versions which we support with DSS?As DSS is not responsible for setting up the plugin, it should not care about the versions of the underlying Intel GPUs. User should handle the correct plugin installation with the correct GPU device. How to support iGPUs and dGPUs Intel devices at the same time?This one may not be supported. Please check the discussion. |
@misohu Your exploration of the Intel DSS environment has been very thorough, and you have specified very clearly defined tasks in order to achieve the integration. Great job! The only thing that I find missing is to determine clearly whether iGPUs and dGPUs Intel devices will be supported simultaneously, before we proceed with the spec. |
Thanks @mvlassis The thing is that devices with both Intel iGPUs and dGPUs will be support just we cannot specify in the resources section if the workload should be deployed to iGPU or dGPU. |
@misohu If that is the case we should add a note/warning in the DSS documentation for that specific usecase. |
Why it needs to get done
In order to be able to tackle #144, we 'll need first to spend some time to familiarize with the Intel DSS environment.
What needs to get done
Interact with Intel DSS environment and document instructions for it.
When is the task considered done
We have familiarized and documented how to interact with the Intel DSS environment
The text was updated successfully, but these errors were encountered: