-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] llama compile regression test for sharktank exported mlir #19306
base: main
Are you sure you want to change the base?
Conversation
3a2bb13
to
157794f
Compare
@rsuderman fyi |
157794f
to
e5e7283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test!
Please add more context to the PR title and description.
experimental/regression_suite/shark-test-suite-models/llama/test_llama.py
Outdated
Show resolved
Hide resolved
experimental/regression_suite/shark-test-suite-models/llama/test_llama.py
Show resolved
Hide resolved
Signed-off-by: dan <[email protected]>
Signed-off-by: dan <[email protected]>
Signed-off-by: dan <[email protected]>
6e305dc
to
fb4d829
Compare
Signed-off-by: dan <[email protected]>
############################################################################### | ||
|
||
# This allows using as part of a matrix strat that includes cpu | ||
def test_pass_cpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScottTodd any thoughts on this? It gives me mildly gross vibes but enables us to add a test here later if desired and makes it so we dont have to branch on the matrix strategy for this suite of tests. Open to alternatives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the existing test setup, I'd rather use `@pytest.mark.skip("reason") instead of "passing" as a no-op: https://docs.pytest.org/en/stable/how-to/skipping.html
So
@pytest.mark.skip("llama tests not implemented for CPU yet")
def test_compile_llama_cpu():
pass
A better method is used in these older (stale, currently not run) tests, which add marks for the platform and then run using filters:
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 122 to 125 in 5708d42
@pytest.mark.presubmit | |
@pytest.mark.unstable_linalg | |
@pytest.mark.plat_rdna3_vulkan | |
def test_step_rdna3_vulkan_stripped(llama2_7b_f16qi4_stripped_rdna3_vulkan_vmfb): |
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 145 to 148 in 5708d42
@pytest.mark.presubmit | |
@pytest.mark.unstable_linalg | |
@pytest.mark.plat_host_cpu | |
def test_step_host_cpu_stripped(llama2_7b_f16qi4_stripped_host_cpu_vmfb): |
iree/.github/workflows/pkgci_regression_test.yml
Lines 174 to 181 in 5708d42
# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm | |
# # In-tree tests | |
# - name: Run experimental/regression_suite tests | |
# run: | | |
# source ${VENV_DIR}/bin/activate | |
# pytest \ | |
# -rA -s -m "plat_host_cpu and presubmit" \ | |
# experimental/regression_suite |
That lets new platforms be added in a structured way, without just relying on test case names following a convention.
Signed-off-by: dan <[email protected]>
c45e53c
to
1a3de97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM but would like the generation process to be documented somewhere.
############################################################################### | ||
|
||
# This allows using as part of a matrix strat that includes cpu | ||
def test_pass_cpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the existing test setup, I'd rather use `@pytest.mark.skip("reason") instead of "passing" as a no-op: https://docs.pytest.org/en/stable/how-to/skipping.html
So
@pytest.mark.skip("llama tests not implemented for CPU yet")
def test_compile_llama_cpu():
pass
A better method is used in these older (stale, currently not run) tests, which add marks for the platform and then run using filters:
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 122 to 125 in 5708d42
@pytest.mark.presubmit | |
@pytest.mark.unstable_linalg | |
@pytest.mark.plat_rdna3_vulkan | |
def test_step_rdna3_vulkan_stripped(llama2_7b_f16qi4_stripped_rdna3_vulkan_vmfb): |
iree/experimental/regression_suite/tests/pregenerated/test_llama2.py
Lines 145 to 148 in 5708d42
@pytest.mark.presubmit | |
@pytest.mark.unstable_linalg | |
@pytest.mark.plat_host_cpu | |
def test_step_host_cpu_stripped(llama2_7b_f16qi4_stripped_host_cpu_vmfb): |
iree/.github/workflows/pkgci_regression_test.yml
Lines 174 to 181 in 5708d42
# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm | |
# # In-tree tests | |
# - name: Run experimental/regression_suite tests | |
# run: | | |
# source ${VENV_DIR}/bin/activate | |
# pytest \ | |
# -rA -s -m "plat_host_cpu and presubmit" \ | |
# experimental/regression_suite |
That lets new platforms be added in a structured way, without just relying on test case names following a convention.
ROCM_CHIP: ${{ matrix.rocm-chip }} | ||
# Note: mi250 benchmark times are more lenient than mi300 (allowing about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a newline between groups in this file
ROCM_CHIP: ${{ matrix.rocm-chip }} | |
# Note: mi250 benchmark times are more lenient than mi300 (allowing about | |
ROCM_CHIP: ${{ matrix.rocm-chip }} | |
# Note: mi250 benchmark times are more lenient than mi300 (allowing about |
llama_mlir = fetch_source_fixture( | ||
"https://sharkpublic.blob.core.windows.net/sharkpublic/halo-models/llm-dev/llama3_8b/8b_f16_decomposed_11_22.mlir", | ||
group="llama_fp16_8b", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here indicating how this was generated? What version of the packages were used, what command were run, etc.
Ideally link to some documentation in https://github.com/nod-ai/shark-ai/tree/main/docs
See the earlier comment on #14915 (comment)
We ultimately want all of these coming from common places like that which we update with a generation script. However, as long as we're in this phase with this one of it being a bit hand-massaged, I'm ok with Dan taking responsibility to keep it updated.
;)
(the tests that PR updated are also linked in my other comment... and have been outdated + disabled for months. IDK how to update them since that isn't documented)
Pulls mlir from public azure file and compiles. Next steps are including bin files to run with real inputs.