Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for DSS on NVIDIA GPUs and only CPUs (New) #1609

Merged
merged 44 commits into from
Dec 3, 2024
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c500cea
add jobs to DSS validation for setup and test on NVIDIA GPUs
motjuste Nov 18, 2024
14979c9
fix cuda test for tensorflow and give more time for things to settle
motjuste Nov 18, 2024
c8a4afb
fix dependency of nvidia_gpu_addon/enable job
motjuste Nov 18, 2024
6786061
fix wrong dependency for cuda jobs and make validation more reliable
motjuste Nov 19, 2024
50601d1
fix shebang to use control instead of remote in launcher script
motjuste Nov 19, 2024
45d427c
fix flaky gpu addon rollout checking in better order and more sleep
motjuste Nov 19, 2024
9da8078
make the GPU checking into resources to control GPU tests are run
motjuste Nov 19, 2024
37a9b64
remove flaky mlflow deployed test
motjuste Nov 19, 2024
e180c9a
update other dss test-plans to use the GPU as resources
motjuste Nov 19, 2024
ac91262
reduce max_attempts for retry to 2
motjuste Nov 19, 2024
6d046e0
add cpu-only tests for dss
motjuste Nov 19, 2024
d722754
rename validate script to not contain intel and bump snap's version
motjuste Nov 19, 2024
a741596
refactor testflinger job file builder to unify into one re-usable one
motjuste Nov 20, 2024
18b591e
add nvidia dgx as target machine for DSS testflinger jobs
motjuste Nov 20, 2024
2d43581
allow other workflow jobs in matrix to continue running if one fails
motjuste Nov 20, 2024
8d290ae
add notebook removal tests and rename cases to be consistent
motjuste Nov 25, 2024
eb4c09e
skip installing intel gpu plugin if it is already there
motjuste Nov 25, 2024
b6202e1
remove unused itex- and ipex-only test plans
motjuste Nov 25, 2024
98359c5
rename check_dss.sh to check_dss for pseudo-fluent usage
motjuste Nov 25, 2024
2971237
refactor remove notebook test to accept multiple arguments
motjuste Nov 25, 2024
b73a281
extract out notebook creation to reused function
motjuste Nov 25, 2024
d13de27
disable intel gpu capacity tests temporarily
motjuste Nov 26, 2024
99bb957
rename test case for dss to be more fluid
motjuste Nov 25, 2024
adfe2cd
refactor checking dss status into reusable function
motjuste Nov 25, 2024
89bfdca
add missing usage string for dss create notebook function
motjuste Nov 25, 2024
12fca77
use pushd popd instead of cd-ing to HOME in check dss
motjuste Nov 25, 2024
6e051f3
rename check_cuda.sh to check_cuda to have a pseudo-fluent usage
motjuste Nov 26, 2024
8e1f358
refactor cuda notebook tests to reusable script
motjuste Nov 27, 2024
ae0178c
refactor out the notebook tests for cpu
motjuste Nov 27, 2024
70c8673
refactor out itex tests to common notebook script
motjuste Nov 27, 2024
b8b3551
refactor out ipex tests to common notebook script
motjuste Nov 27, 2024
87d1526
reformat long requires clauses to multi-line ones
motjuste Nov 27, 2024
42ab519
drop .sh extension from check_intel script
motjuste Nov 27, 2024
ec75b21
fix failing intel gpu verification tests
motjuste Nov 27, 2024
e468f73
reduce sleep time in steps while enabling nvidia gpu addon
motjuste Nov 27, 2024
2cf48d0
fix help string for check_notebook
motjuste Nov 27, 2024
44a504a
refactor install-deps script allowing customization of microk8s and k…
motjuste Nov 27, 2024
5962863
add customized microk8s channels to github workflow for dss
motjuste Nov 27, 2024
896d18d
fix default dss_snap_channel to latest/stable instead of non-existent…
motjuste Nov 28, 2024
c10b121
add .sh extension back to the test runner scripts
motjuste Dec 2, 2024
3068584
use graphics_card resource for checking GPU instead of own
motjuste Dec 2, 2024
4481c28
change to detecting GPU based on vendor
motjuste Dec 2, 2024
8d7703c
fix mention of default channel for DSS in the README
motjuste Dec 3, 2024
1dc30bd
remove unnecessary dss integration tests script (coming later)
motjuste Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
rename check_dss.sh to check_dss for pseudo-fluent usage
motjuste committed Dec 3, 2024
commit 98359c54687dc32ed526798300e7731cf306de2e
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ requires:
executable.name == 'microk8s'
_summary: Check that the DSS environment initializes
estimated_duration: 2m
command: check_dss.sh dss_can_be_initialized
command: check_dss dss_can_be_initialized

id: dss/namespace
category_id: dss-regress
@@ -17,7 +17,7 @@ requires: executable.name == 'microk8s'
depends: dss/initialize
_summary: Check that the dss namespace is deployed
estimated_duration: 5s
command: check_dss.sh dss_namespace_is_deployed
command: check_dss dss_namespace_is_deployed

id: dss/status_mlflow
category_id: dss-regress
@@ -27,7 +27,7 @@ requires: executable.name == 'dss'
depends: dss/namespace
_summary: Check that the dss mlflow is deployed
estimated_duration: 5s
command: check_dss.sh mlflow_status_is_ready
command: check_dss mlflow_status_is_ready

id: dss/create_pytorch_cpu_notebook
category_id: dss-regress
@@ -37,7 +37,7 @@ requires: executable.name == 'dss'
depends: dss/initialize
_summary: Check that an PyTorch CPU notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_pytorch_cpu_notebook
command: check_dss can_create_pytorch_cpu_notebook

id: cpu/pytorch_can_use_cpu
category_id: dss-regress
@@ -57,7 +57,7 @@ requires: executable.name == 'dss'
depends: dss/create_pytorch_cpu_notebook
_summary: Check that the PyTorch CPU notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "pytorch-cpu"
command: check_dss can_remove_notebook "pytorch-cpu"

id: dss/create_tensorflow_cpu_notebook
category_id: dss-regress
@@ -67,7 +67,7 @@ requires: executable.name == 'dss'
depends: dss/initialize
_summary: Check that an Tensorflow CPU notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_tensorflow_cpu_notebook
command: check_dss can_create_tensorflow_cpu_notebook

id: cpu/tensorflow_can_use_cpu
category_id: dss-regress
@@ -87,7 +87,7 @@ requires: executable.name == 'dss'
depends: dss/create_tensorflow_cpu_notebook
_summary: Check that the Tensorflow CPU notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "tensorflow-cpu"
command: check_dss can_remove_notebook "tensorflow-cpu"

id: intel_gpu_plugin/install
category_id: dss-regress
@@ -177,7 +177,7 @@ requires: executable.name == 'dss'
depends: intel_gpu_plugin/node_gpu_allocatable
_summary: Check that dss status reports that Intel GPU acceleration is enabled
estimated_duration: 5s
command: check_dss.sh intel_gpu_acceleration_is_enabled
command: check_dss intel_gpu_acceleration_is_enabled

id: dss/create_tensorflow_intel_notebook
category_id: dss-regress
@@ -187,7 +187,7 @@ requires: executable.name == 'dss'
depends: dss/status_intel_gpu
_summary: Check that an ITEX 2.15 notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_itex_215_notebook
command: check_dss can_create_itex_215_notebook

id: itex/itex_2.15_import
category_id: dss-regress
@@ -217,7 +217,7 @@ requires: executable.name == 'dss'
depends: dss/create_tensorflow_intel_notebook
_summary: Check that the Tensorflow Intel notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "itex-215-notebook"
command: check_dss can_remove_notebook "itex-215-notebook"

id: dss/create_pytorch_intel_notebook
category_id: dss-regress
@@ -227,7 +227,7 @@ requires: executable.name == 'dss'
depends: dss/status_intel_gpu
_summary: Check that an IPEX 2.1.20 notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_ipex_2120_notebook
command: check_dss can_create_ipex_2120_notebook

id: ipex/ipex_2.1.20_import
category_id: dss-regress
@@ -257,7 +257,7 @@ requires: executable.name == 'dss'
depends: dss/create_pytorch_intel_notebook
_summary: Check that the PyTorch Intel notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "ipex-2120-notebook"
command: check_dss can_remove_notebook "ipex-2120-notebook"

id: nvidia_gpu_addon/enable
category_id: dss-regress
@@ -287,7 +287,7 @@ requires: executable.name == 'dss'
depends: nvidia_gpu_addon/validations_succeed
_summary: Check that dss status reports that NVIDIA GPU acceleration is enabled
estimated_duration: 5s
command: check_dss.sh nvidia_gpu_acceleration_is_enabled
command: check_dss nvidia_gpu_acceleration_is_enabled

id: dss/create_pytorch_cuda_notebook
category_id: dss-regress
@@ -297,7 +297,7 @@ requires: executable.name == 'dss'
depends: dss/status_nvidia_gpu
_summary: Check that an PyTorch CUDA notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_pytorch_cuda_notebook
command: check_dss can_create_pytorch_cuda_notebook

id: cuda/pytorch_can_use_cuda
category_id: dss-regress
@@ -317,7 +317,7 @@ requires: executable.name == 'dss'
depends: dss/create_pytorch_cuda_notebook
_summary: Check that the PyTorch CUDA notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "pytorch-cuda"
command: check_dss can_remove_notebook "pytorch-cuda"

id: dss/create_tensorflow_cuda_notebook
category_id: dss-regress
@@ -327,7 +327,7 @@ requires: executable.name == 'dss'
depends: dss/status_nvidia_gpu
_summary: Check that an Tensorflow CUDA notebook can be successfully created
estimated_duration: 3m
command: check_dss.sh can_create_tensorflow_cuda_notebook
command: check_dss can_create_tensorflow_cuda_notebook

id: cuda/tensorflow_can_use_cuda
category_id: dss-regress
@@ -347,4 +347,4 @@ requires: executable.name == 'dss'
depends: dss/create_tensorflow_cuda_notebook
_summary: Check that the Tensorflow CUDA notebook can be removed
estimated_duration: 1m
command: check_dss.sh can_remove_notebook "tensorflow-cuda"
command: check_dss can_remove_notebook "tensorflow-cuda"