[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

sayakpaul · 2024-10-16T07:45:35Z

What does this PR do?

I have only touched a handful of tests with the marker being introduced. I think we may need to change the slices based on the CI machine and infra. @a-r-r-o-w should consider marking the Cog tests similarly as well?

@DN6 would love to get your thoughts on the design.

sayakpaul · 2024-10-16T07:45:59Z

.github/workflows/nightly_tests.yml

@@ -2,6 +2,7 @@ name: Nightly and release tests on main/release branch

 on:
 workflow_dispatch:
+ pull_request:


This is temporary.

sayakpaul · 2024-10-16T07:46:13Z

.github/workflows/nightly_tests.yml

@@ -18,6 +19,7 @@ env:

 jobs:
 setup_torch_cuda_pipeline_matrix:
+ if: github.event_name == 'schedule'


HuggingFaceDocBuilderDev · 2024-10-16T07:54:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2024-10-16T11:09:27Z

should consider marking the Cog tests similarly as well?

With model cpu offload and vae tiling, it should be < 16 GB, and I think we documented it here. Are we seeing Cog test failures due to memory? I see that they are passing here

sayakpaul · 2024-10-16T11:10:14Z

Ah okay then. No issues.

.github/workflows/nightly_tests.yml

DN6

Nice work! 👍🏽

sayakpaul · 2024-10-16T13:38:06Z

@DN6 okay if I modified the failing tests to account for the machine change?

src/diffusers/utils/testing_utils.py

.github/workflows/nightly_tests.yml

sayakpaul · 2024-10-17T10:54:24Z

@DN6 can you give this a look? I think the test failures should go away once the CI Bot has access to Flux.

Once approved I will revert the changes which I have denoted as temporary (like this).

sayakpaul · 2024-10-18T06:51:59Z

tests/pipelines/controlnet_flux/test_controlnet_flux_img2img.py

-
-@slow
-@require_torch_gpu
-class FluxControlNetImg2ImgPipelineSlowTests(unittest.TestCase):


I don't think this test was correctly done as it doesn't pass the controlnet module to the pipeline and it also uses very dummy inputs which I think should be avoided for an integration test. LMK if you think otherwise.

sayakpaul · 2024-10-18T07:07:52Z

@DN6 regarding https://github.com/huggingface/diffusers/actions/runs/11398910357/job/31716739483?pr=9691#step:7:67, my hunch is that there's some kind of leakage happening which is causing the worker to crash. When I SSH'd into the runner and manually ran the test, it passed.

sayakpaul added 3 commits October 16, 2024 12:36

add a marker for big gpu tests

32e23d8

update

da92ca0

trigger on PRs temporarily.

219a3cc

sayakpaul requested a review from DN6 October 16, 2024 07:45

sayakpaul commented Oct 16, 2024

View reviewed changes

sayakpaul added 2 commits October 16, 2024 13:18

onnx

c679563

fix

a0bae4b

sayakpaul added 4 commits October 16, 2024 13:34

total memory

95f396e

fixes

02f0aa3

reduce memory threshold.

9441016

bigger gpu

15d1127

sayakpaul added 3 commits October 16, 2024 16:40

Merge branch 'main' into big-model-marker

6c82fd4

empty

676b8a5

g6e

3b50732

DN6 reviewed Oct 16, 2024

View reviewed changes

.github/workflows/nightly_tests.yml Outdated Show resolved Hide resolved

sayakpaul commented Oct 16, 2024

View reviewed changes

.github/workflows/nightly_tests.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

9ef5435

DN6 approved these changes Oct 16, 2024

View reviewed changes

DN6 reviewed Oct 16, 2024

View reviewed changes

src/diffusers/utils/testing_utils.py Outdated Show resolved Hide resolved

DN6 reviewed Oct 16, 2024

View reviewed changes

.github/workflows/nightly_tests.yml Outdated Show resolved Hide resolved

sayakpaul added 5 commits October 17, 2024 10:58

address comments.

4ff06b4

fix

46cab82

fix

2b25688

fix

b0568da

fix

928dd73

sayakpaul added 13 commits October 17, 2024 12:28

fix

9020d8f

okay

2732720

further reduce.

f265f7d

updates

1755305

remove

fcb57ae

updates

6f477ac

updates

ff47576

updates

1ad8c64

updates

605a21d

fixes

9e1cacb

fixes

0704d9a

updates.

c9fd1ab

Merge branch 'main' into big-model-marker

f8086f6

sayakpaul requested a review from DN6 October 17, 2024 10:53

sayakpaul added 2 commits October 18, 2024 07:52

Merge branch 'main' into big-model-marker

e31b0bd

fix

cf280ba

sayakpaul commented Oct 18, 2024

View reviewed changes

a-r-r-o-w mentioned this pull request Oct 18, 2024

[CI] pin max torch version to fix CI errors #9709

Merged

a-r-r-o-w and others added 2 commits October 20, 2024 01:51

Merge branch 'main' into big-model-marker

5b9c771

Merge branch 'main' into big-model-marker

0e07597

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

sayakpaul commented Oct 16, 2024

sayakpaul Oct 16, 2024

sayakpaul Oct 16, 2024

HuggingFaceDocBuilderDev commented Oct 16, 2024

a-r-r-o-w commented Oct 16, 2024

sayakpaul commented Oct 16, 2024

DN6 left a comment

sayakpaul commented Oct 16, 2024

sayakpaul commented Oct 17, 2024

sayakpaul Oct 18, 2024

sayakpaul commented Oct 18, 2024

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

Are you sure you want to change the base?

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

Conversation

sayakpaul commented Oct 16, 2024

What does this PR do?

sayakpaul Oct 16, 2024

Choose a reason for hiding this comment

sayakpaul Oct 16, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 16, 2024

a-r-r-o-w commented Oct 16, 2024

sayakpaul commented Oct 16, 2024

DN6 left a comment

Choose a reason for hiding this comment

sayakpaul commented Oct 16, 2024

sayakpaul commented Oct 17, 2024

sayakpaul Oct 18, 2024

Choose a reason for hiding this comment

sayakpaul commented Oct 18, 2024