Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add testing for example folder #287

Draft
wants to merge 41 commits into
base: main
Choose a base branch
from
Draft

add testing for example folder #287

wants to merge 41 commits into from

Conversation

baptistecolle
Copy link
Collaborator

No description provided.

@baptistecolle baptistecolle added the examples [CI] Requires and enables running all test on the "examples" folder label Oct 9, 2024
@IlyasMoutawwakil
Copy link
Member

I prefer these examples to be distributed across the same CIs that already exist, for the simple reason that these CIs already cover the entirety of device+backend combinations.

@@ -25,5 +25,4 @@ backend:
device: cpu
no_weights: false
export: true
torch_dtype: bfloat16
Copy link
Collaborator Author

@baptistecolle baptistecolle Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed because BF16 weight prepack needs the cpu support avx_ne_convert or avx512bw, avx512vl and avx512dg. We are running the test on ubuntu-latest. This must be some old CPU so it is not the best to test ipex IMO. Do you have an idea or preference for a more modern intel CPU to use for the CI? @IlyasMoutawwakil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer testing on a more advanced cpu machine, but minimal in terms of size / vCPUs.

# run: |
# pytest tests/test_cli.py -s -k "cli and cpu and openvino"

- name: Run tests from example folder
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes forever because the CPU for testing ubuntu-latest is not the most optimal for the CI, I think. Maybe we should also update the machine that run those tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this, let's only run the example test if "example" is in the tags

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Oct 14, 2024

I now use the CLI workflows to also run the different examples, however I believe this should lead to some refactoring. I think it would be make sense to maybe rename the CLI tests. For example CLI CUDA Pytorch Tests should be CUDA Pytorch Tests and test_cli_cuda_pytorch.yaml should now be test_cuda_pytorch.yaml. Wdyt? @IlyasMoutawwakil

@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Oct 14, 2024

Also, for your information, there are also a lot of device+backend combinations that do not have an example config:

  • cpu+llama_cpp -> the example is on a mps device
  • cpu+pytorch
  • cuda+onnxruntime
  • cuda+torchort

Comment on lines 81 to 85
# Those tests are not run on the CI/CD pipeline as they are currently broken
UNTESTED_YAML_CONFIGS = [
"energy_star.yaml",
"trt_llama.yaml",
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to run energy_star and tensor rt with the current optimum-benchmark so they are not part of the CI/CD for now. There is currently #261 about energy_star and #285 about tensor_rt, we could maybe re-test those after those PR are merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

energy star is under work, but trt should work fine

@baptistecolle baptistecolle added cli_cuda_pytorch_single_gpu cli_rocm_pytorch_single_gpu and removed examples [CI] Requires and enables running all test on the "examples" folder labels Oct 14, 2024
@baptistecolle baptistecolle added examples [CI] Requires and enables running all test on the "examples" folder and removed cli_cuda_pytorch_single_gpu labels Oct 15, 2024
@@ -7,7 +7,7 @@
from tqdm import tqdm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not change anything in energy star, the energy start PR will take care of it, otherwise will have conflicts

@@ -24,7 +24,7 @@ class TRTLLMConfig(BackendConfig):
world_size: int = 1
gpus_per_node: int = 1

max_prompt_length: int = 128
max_prompt_length: int = 256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples [CI] Requires and enables running all test on the "examples" folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants