[GPU] llama compile regression test for sharktank exported mlir #19306

dan-garvey · 2024-11-26T18:18:01Z

Pulls mlir from public azure file and compiles. Next steps are including bin files to run with real inputs.

dan-garvey · 2024-11-26T18:19:40Z

@rsuderman fyi

ScottTodd

Thanks for the test!

Please add more context to the PR title and description.

.github/workflows/pkgci_regression_test.yml

experimental/regression_suite/shark-test-suite-models/llama/test_llama.py

Signed-off-by: dan <[email protected]>

dan-garvey · 2024-11-26T19:34:37Z

woop:

Signed-off-by: dan <[email protected]>

dan-garvey · 2024-11-26T19:46:43Z

experimental/regression_suite/shark-test-suite-models/llama/test_llama.py

+###############################################################################
+
+# This allows using as part of a matrix strat that includes cpu
+def test_pass_cpu():


@ScottTodd any thoughts on this? It gives me mildly gross vibes but enables us to add a test here later if desired and makes it so we dont have to branch on the matrix strategy for this suite of tests. Open to alternatives

Given the existing test setup, I'd rather use `@pytest.mark.skip("reason") instead of "passing" as a no-op: https://docs.pytest.org/en/stable/how-to/skipping.html

So

@pytest.mark.skip("llama tests not implemented for CPU yet") def test_compile_llama_cpu(): pass

A better method is used in these older (stale, currently not run) tests, which add marks for the platform and then run using filters:

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 122 to 125 in 5708d42

@pytest.mark.presubmit

@pytest.mark.unstable_linalg

@pytest.mark.plat_rdna3_vulkan

def test_step_rdna3_vulkan_stripped(llama2_7b_f16qi4_stripped_rdna3_vulkan_vmfb):

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 145 to 148 in 5708d42

@pytest.mark.presubmit

@pytest.mark.unstable_linalg

@pytest.mark.plat_host_cpu

def test_step_host_cpu_stripped(llama2_7b_f16qi4_stripped_host_cpu_vmfb):

iree/.github/workflows/pkgci_regression_test.yml

Lines 174 to 181 in 5708d42

# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm

# # In-tree tests

# - name: Run experimental/regression_suite tests

# run: |

# source ${VENV_DIR}/bin/activate

# pytest \

# -rA -s -m "plat_host_cpu and presubmit" \

# experimental/regression_suite

That lets new platforms be added in a structured way, without just relying on test case names following a convention.

Signed-off-by: dan <[email protected]>

ScottTodd

Mostly LGTM but would like the generation process to be documented somewhere.

ScottTodd · 2024-11-26T23:04:51Z

experimental/regression_suite/shark-test-suite-models/llama/test_llama.py

+###############################################################################
+
+# This allows using as part of a matrix strat that includes cpu
+def test_pass_cpu():


Given the existing test setup, I'd rather use `@pytest.mark.skip("reason") instead of "passing" as a no-op: https://docs.pytest.org/en/stable/how-to/skipping.html

So

@pytest.mark.skip("llama tests not implemented for CPU yet") def test_compile_llama_cpu(): pass

A better method is used in these older (stale, currently not run) tests, which add marks for the platform and then run using filters:

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 122 to 125 in 5708d42

@pytest.mark.presubmit

@pytest.mark.unstable_linalg

@pytest.mark.plat_rdna3_vulkan

def test_step_rdna3_vulkan_stripped(llama2_7b_f16qi4_stripped_rdna3_vulkan_vmfb):

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 145 to 148 in 5708d42

@pytest.mark.presubmit

@pytest.mark.unstable_linalg

@pytest.mark.plat_host_cpu

def test_step_host_cpu_stripped(llama2_7b_f16qi4_stripped_host_cpu_vmfb):

iree/.github/workflows/pkgci_regression_test.yml

Lines 174 to 181 in 5708d42

# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm

# # In-tree tests

# - name: Run experimental/regression_suite tests

# run: |

# source ${VENV_DIR}/bin/activate

# pytest \

# -rA -s -m "plat_host_cpu and presubmit" \

# experimental/regression_suite

That lets new platforms be added in a structured way, without just relying on test case names following a convention.

ScottTodd · 2024-11-26T23:06:09Z

.github/workflows/pkgci_regression_test.yml

+          ROCM_CHIP: ${{ matrix.rocm-chip }}
      # Note: mi250 benchmark times are more lenient than mi300 (allowing about


Please add a newline between groups in this file

Suggested change

ROCM_CHIP: ${{ matrix.rocm-chip }}

# Note: mi250 benchmark times are more lenient than mi300 (allowing about

ROCM_CHIP: ${{ matrix.rocm-chip }}

# Note: mi250 benchmark times are more lenient than mi300 (allowing about

ScottTodd · 2024-11-26T23:06:56Z

experimental/regression_suite/shark-test-suite-models/llama/test_llama.py

+llama_mlir = fetch_source_fixture(
+    "https://sharkpublic.blob.core.windows.net/sharkpublic/halo-models/llm-dev/llama3_8b/8b_f16_decomposed_11_22.mlir",
+    group="llama_fp16_8b",
+)


Can you add a comment here indicating how this was generated? What version of the packages were used, what command were run, etc.

Ideally link to some documentation in https://github.com/nod-ai/shark-ai/tree/main/docs

See the earlier comment on #14915 (comment)

We ultimately want all of these coming from common places like that which we update with a generation script. However, as long as we're in this phase with this one of it being a bit hand-massaged, I'm ok with Dan taking responsibility to keep it updated.

;)

(the tests that PR updated are also linked in my other comment... and have been outdated + disabled for months. IDK how to update them since that isn't documented)

dan-garvey requested review from benvanik and stellaraccident as code owners November 26, 2024 18:18

dan-garvey force-pushed the dan-garvey/llama_test branch from 3a2bb13 to 157794f Compare November 26, 2024 18:18

dan-garvey force-pushed the dan-garvey/llama_test branch from 157794f to e5e7283 Compare November 26, 2024 18:22

dan-garvey requested a review from ScottTodd as a code owner November 26, 2024 18:31

ScottTodd requested changes Nov 26, 2024

View reviewed changes

dan-garvey changed the title ~~llama compile test~~ llama compile regression test for sharktank exported mlir Nov 26, 2024

dan-garvey changed the title ~~llama compile regression test for sharktank exported mlir~~ [GPU] llama compile regression test for sharktank exported mlir Nov 26, 2024

dan-garvey added 3 commits November 26, 2024 13:19

llama compile test

dcc2db1

Signed-off-by: dan <[email protected]>

actually do the test

a249049

Signed-off-by: dan <[email protected]>

update yml and remove the weight fixture until required

fb4d829

Signed-off-by: dan <[email protected]>

dan-garvey force-pushed the dan-garvey/llama_test branch from 6e305dc to fb4d829 Compare November 26, 2024 19:20

add auto-pass for cpu

b9c9497

Signed-off-by: dan <[email protected]>

dan-garvey commented Nov 26, 2024

View reviewed changes

fix lint

1a3de97

Signed-off-by: dan <[email protected]>

dan-garvey force-pushed the dan-garvey/llama_test branch from c45e53c to 1a3de97 Compare November 26, 2024 20:31

dan-garvey requested a review from ScottTodd November 26, 2024 20:32

ScottTodd approved these changes Nov 26, 2024

View reviewed changes

ScottTodd mentioned this pull request Nov 27, 2024

Add a llama 3.1 toy test with cross entropy test iree-org/iree-test-suites#50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] llama compile regression test for sharktank exported mlir #19306

[GPU] llama compile regression test for sharktank exported mlir #19306

dan-garvey commented Nov 26, 2024 •

edited

Loading

dan-garvey commented Nov 26, 2024

ScottTodd left a comment

dan-garvey commented Nov 26, 2024

dan-garvey Nov 26, 2024

ScottTodd Nov 26, 2024

ScottTodd left a comment

ScottTodd Nov 26, 2024

ScottTodd Nov 26, 2024

ScottTodd Nov 26, 2024

	@pytest.mark.presubmit
	@pytest.mark.unstable_linalg
	@pytest.mark.plat_rdna3_vulkan
	def test_step_rdna3_vulkan_stripped(llama2_7b_f16qi4_stripped_rdna3_vulkan_vmfb):

	@pytest.mark.presubmit
	@pytest.mark.unstable_linalg
	@pytest.mark.plat_host_cpu
	def test_step_host_cpu_stripped(llama2_7b_f16qi4_stripped_host_cpu_vmfb):

	# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm
	# # In-tree tests
	# - name: Run experimental/regression_suite tests
	# run: \|
	# source ${VENV_DIR}/bin/activate
	# pytest \
	# -rA -s -m "plat_host_cpu and presubmit" \
	# experimental/regression_suite

		ROCM_CHIP: ${{ matrix.rocm-chip }}
		# Note: mi250 benchmark times are more lenient than mi300 (allowing about

[GPU] llama compile regression test for sharktank exported mlir #19306

Are you sure you want to change the base?

[GPU] llama compile regression test for sharktank exported mlir #19306

Conversation

dan-garvey commented Nov 26, 2024 • edited Loading

dan-garvey commented Nov 26, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

dan-garvey commented Nov 26, 2024

dan-garvey Nov 26, 2024

Choose a reason for hiding this comment

ScottTodd Nov 26, 2024

Choose a reason for hiding this comment

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd Nov 26, 2024

Choose a reason for hiding this comment

ScottTodd Nov 26, 2024

Choose a reason for hiding this comment

ScottTodd Nov 26, 2024

Choose a reason for hiding this comment

dan-garvey commented Nov 26, 2024 •

edited

Loading