Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Abhishek-TAMU · 2024-10-03T01:31:13Z

What does this PR do?

This PR adds cu_seq_lens_q, cu_seq_lens_k, max_length_k, max_length_q to the batch in DataCollatorForCompletionOnlyLM. This, together with a PR in transformers (link to be added), removes graph breaks in padding-free tuning, allowing for maximum performance to be obtained.
Specifically, these parameters should be generated here (this PR change), outside of the transformers loop, as they incur a cpu-gpu sync that is unavoidable. Otherwise, this cpu-gpu sync happens here, inside the attention call which causes graph breaks and hence the transformers PR removes this call to remove all graph breaks when torch_compile flag is turned on in Training arguments to use in SFTTrainer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU · 2024-10-04T18:42:18Z

CC: @kashif @qgallouedec

HuggingFaceDocBuilderDev · 2024-10-06T19:35:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Abhishek <[email protected]>

qgallouedec · 2024-10-10T08:54:50Z

Hi, thanks for the PR.
Can you provide the link of the PR in transformers? Is it huggingface/transformers#33932?

qgallouedec · 2024-10-10T08:58:59Z

Could you provide a simple test to:

Confirm that it is a case of non-functioning.
Verify that this addition resolves it.

It might also be helpful to add a few comments, as these lines are unclear without context.

…llator_batch

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU added 2 commits October 2, 2024 14:18

feat: Add info to batch in DataCollatorForCompletionOnlyLM

4472501

Signed-off-by: Abhishek <[email protected]>

fix: formatting

6cfa171

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU changed the title ~~Add Sequence Lengths to Batch in DataCollatorForCompletionOnlyLM~~ Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM Oct 3, 2024

Abhishek-TAMU force-pushed the collator_batch branch from 6cfa171 to a27b3a2 Compare October 3, 2024 15:41

Abhishek-TAMU marked this pull request as ready for review October 3, 2024 15:42

Abhishek-TAMU mentioned this pull request Oct 3, 2024

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned huggingface/transformers#33932

Open

5 tasks

Abhishek-TAMU force-pushed the collator_batch branch from a27b3a2 to 4b34ec3 Compare October 4, 2024 18:42

kashif added ✨ enhancement New feature or request 🏋 SFT Related to SFT labels Oct 6, 2024

Abhishek-TAMU added 2 commits October 8, 2024 14:13

feat: Add info to batch in DataCollatorForCompletionOnlyLM

a821ce0

Signed-off-by: Abhishek <[email protected]>

fix: formatting

fb669b6

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU force-pushed the collator_batch branch from f0afdb2 to fb669b6 Compare October 8, 2024 18:13

qgallouedec added 🐛 bug Something isn't working and removed ✨ enhancement New feature or request labels Oct 10, 2024

Abhishek-TAMU and others added 5 commits October 14, 2024 13:00

Merge branch 'huggingface:main' into collator_batch

f4b1955

Merge branch 'collator_batch' of github.com:Abhishek-TAMU/trl into co…

1b7c060

…llator_batch

Merge branch 'main' into collator_batch

c3578f8

fix: max_length_k to int

e83fc8a

Signed-off-by: Abhishek <[email protected]>

fix:Added comments

68554b1

Signed-off-by: Abhishek <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Abhishek-TAMU commented Oct 3, 2024 •

edited

Loading

Abhishek-TAMU commented Oct 4, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

qgallouedec commented Oct 10, 2024 •

edited

Loading

qgallouedec commented Oct 10, 2024

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Are you sure you want to change the base?

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Conversation

Abhishek-TAMU commented Oct 3, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Abhishek-TAMU commented Oct 4, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

qgallouedec commented Oct 10, 2024 • edited Loading

qgallouedec commented Oct 10, 2024

Abhishek-TAMU commented Oct 3, 2024 •

edited

Loading

qgallouedec commented Oct 10, 2024 •

edited

Loading