Enable users to use their own loss functions + deal with prefetching for grad accum #34198

muellerzr · 2024-10-16T14:52:28Z

What does this PR do?

In conjunction with #34191, this PR solves the other half of what's needed:

Letting users pass in their own loss functions directly to the Trainer via compute_loss
Prefetching the first gradient_accumulation_steps worth of data each complete step and marking how many samples were seen (num_items_in_batch), which can be passed to a loss function if it takes in num_items_seen (name TBD)

A bit of feedback needed we need to coordinate:

Should it be called num_items_in_batch and then passed through to the loss functions as such? Or is there a better name we can think of

Fixes huggingface/trl#2175

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik @ArthurZucker

ArthurZucker

LGTM, IMO a regression test on the grad norms could be fairly nice!

ArthurZucker · 2024-10-16T19:17:51Z

src/transformers/trainer.py

+ self.state.num_input_tokens_seen += (
+ torch.sum(
+ self.accelerator.gather(
+ torch.tensor(
+ inputs[main_input_name].numel(), device=self.args.device, dtype=torch.int64
+ )
 )
 )
+ .cpu()
+ .item()


let's make this more readable!

clean did this one 🫠

you can split in 3-4 lines 🎐

ArthurZucker · 2024-10-16T19:20:05Z

src/transformers/trainer.py

+ if (self.label_smoother is not None or self.compute_loss is not None) and "labels" in inputs:
 labels = inputs.pop("labels")


mmmm if people don't pass a loss, we won't use the model's default?

We will, it stays in inputs and gets passed to the models forward()

src/transformers/trainer.py

muellerzr · 2024-10-17T01:25:57Z

A bit more context, full fine-tuning does NOT SEEM TO BE IMPACTED BY THIS (when padding). I am looking into how this directly affects TRL, however things are not as bad as they may seem.

(Below is an example CausalLM result comparing grad accum 4, bs 8 vs bs 32 both before and after this fix)

BenjaminBossan · 2024-10-17T11:56:32Z

src/transformers/trainer.py

+ # For now we don't support object detection
+ try:
+ num_items_in_batch = sum(
+ [data_batch["labels"][..., 1:].ne(-100).sum().item() for data_batch in batch_samples]


I already quickly discussed this with Zach, so this is a more general questions to other reviewers:

Would this line be work for all the different task types we support? Specifically, can we always skip the first item in the sequence, i.e. is the [..., 1:] part valid?

For casual auto regressive models it works but won't work in other ones

src/transformers/trainer.py

ArthurZucker · 2024-10-17T15:44:54Z

src/transformers/trainer.py

+ self.state.num_input_tokens_seen += (
+ torch.sum(
+ self.accelerator.gather(
+ torch.tensor(
+ inputs[main_input_name].numel(), device=self.args.device, dtype=torch.int64
+ )
 )
 )
+ .cpu()
+ .item()


you can split in 3-4 lines 🎐

src/transformers/trainer.py

tests/trainer/test_trainer.py

src/transformers/trainer.py

danielhanchen

Just a denominator change in the test case

tests/trainer/test_trainer.py

ArthurZucker

Feel free to merge!

Co-authored-by: Daniel Han <[email protected]>

…for grad accum (huggingface#34198) * bookmark * Bookmark * Bookmark * Actually implement * Pass in kwarg explicitly * Adjust for if we do or don't have labels * Bookmark fix for od * bookmark * Fin * closer * Negate accelerate grad accum div * Fixup not training long enough * Add in compute_loss to take full model output * Document * compute_loss -> compute_loss_fn * Add a test * Refactor * Refactor * Uncomment tests * Update tests/trainer/test_trainer.py Co-authored-by: Daniel Han <[email protected]> --------- Co-authored-by: Daniel Han <[email protected]>

muellerzr added 9 commits October 16, 2024 08:23

bookmark

57c698f

Bookmark

3c57947

Bookmark

1d57bd8

Actually implement

15e61f1

Pass in kwarg explicitly

928e927

Adjust for if we do or don't have labels

b59c8f1

Bookmark fix for od

79f9479

bookmark

13f3369

Fin

8080f28

muellerzr marked this pull request as ready for review October 16, 2024 17:29

muellerzr requested review from LysandreJik and ArthurZucker October 16, 2024 17:29

closer

13160e0

ArthurZucker reviewed Oct 16, 2024

View reviewed changes

muellerzr added 2 commits October 16, 2024 16:34

Negate accelerate grad accum div

6fa155a

Fixup not training long enough

c2a705f

danielhanchen mentioned this pull request Oct 17, 2024

Fix Gradient Accumulation issue #34191

Merged

1 task

BenjaminBossan reviewed Oct 17, 2024

View reviewed changes

Add in compute_loss to take full model output

ac04e61

winglian reviewed Oct 17, 2024

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

muellerzr added 2 commits October 17, 2024 11:38

Document

af8411b

compute_loss -> compute_loss_fn

a5fac5a

ArthurZucker reviewed Oct 17, 2024

View reviewed changes

Add a test

39d8f28

muellerzr changed the title ~~[DRAFT] Enable users to use their own loss functions + deal with prefetching for grad accum~~ Enable users to use their own loss functions + deal with prefetching for grad accum Oct 17, 2024

muellerzr added 3 commits October 17, 2024 13:07

Refactor

4284930

Refactor

932a491

Uncomment tests

2a6b038

muellerzr requested a review from ArthurZucker October 17, 2024 17:14

danielhanchen suggested changes Oct 17, 2024

View reviewed changes

tests/trainer/test_trainer.py Outdated Show resolved Hide resolved

ArthurZucker approved these changes Oct 17, 2024

View reviewed changes

Update tests/trainer/test_trainer.py

54d10de

Co-authored-by: Daniel Han <[email protected]>

muellerzr merged commit 6ba31a8 into main Oct 17, 2024
25 of 26 checks passed

muellerzr deleted the muellerzr-fix-loss-calc branch October 17, 2024 21:01

qgallouedec mentioned this pull request Oct 18, 2024

🔀 Rename get_batch_sample and add num_items_in_batch to compute_loss huggingface/trl#2246

Merged

5 tasks

This was referenced Oct 21, 2024

New GA fix causes training loss multiple times higher across the board (5x to 10x higher) #34263

Open

Enable Gradient Accumulation fix across all models + trainer fully in forward() #34283

Open

SunMarc mentioned this pull request Oct 22, 2024

[Trainer][Eval] Why the model output for the first element in eval batch is skipped in logits? #34278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable users to use their own loss functions + deal with prefetching for grad accum #34198

Enable users to use their own loss functions + deal with prefetching for grad accum #34198

muellerzr commented Oct 16, 2024 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Oct 16, 2024

muellerzr Oct 17, 2024

ArthurZucker Oct 17, 2024

ArthurZucker Oct 16, 2024

muellerzr Oct 17, 2024

muellerzr commented Oct 17, 2024 •

edited

Loading

BenjaminBossan Oct 17, 2024

danielhanchen Oct 17, 2024

ArthurZucker Oct 17, 2024

danielhanchen left a comment

ArthurZucker left a comment

		if (self.label_smoother is not None or self.compute_loss is not None) and "labels" in inputs:
		labels = inputs.pop("labels")

Enable users to use their own loss functions + deal with prefetching for grad accum #34198

Enable users to use their own loss functions + deal with prefetching for grad accum #34198

Conversation

muellerzr commented Oct 16, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muellerzr commented Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielhanchen left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

muellerzr commented Oct 16, 2024 •

edited

Loading

muellerzr commented Oct 17, 2024 •

edited

Loading