Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Transformers benchmarks. #97

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

Vaibhavs10
Copy link
Owner

Closes: #96

(Preliminary script for now)

@Vaibhavs10 Vaibhavs10 marked this pull request as draft December 1, 2023 18:17
@Vaibhavs10
Copy link
Owner Author

Maybe I should test for different chunk lengths too?

@Vaibhavs10
Copy link
Owner Author

Steps to build the environment:

python -m venv benchmark
source benchmark/bin/activate
pip install --no-cache-dir torch transformers optimum accelerate wheel flash-attn --no-build-isolation

@Vaibhavs10
Copy link
Owner Author

Cool I think we're done here! 🚀

You should be able to run this script via python script/benchmark.py

Here's how the output looks like:

Running Model: openai/whisper-large-v3
Flash Attention: True
Batch Size: 1
Total time: 7.989270925521851
Total memory: 3608.0

All the results are saved over a benchmark.json file.

Next step: Let this run on an A100 and a T4 (I'll probably only get to this tomorrow!)

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to have a look at what we used for Distil-Whisper benchmarking as an example: for evaluating over a dataset and computing the WER: https://github.com/huggingface/distil-whisper/blob/914dcdf3919552d5a3826a9d5db99b059ddcc16e/training/flax/run_speed_pt.py#L595

=> it's a bit messy, but largely follows the same structure as the short-form script. You can just pull out the logic we use for pipeline here (rather than original Whisper or model + processor)

start = time.time()
outputs = pipe(
file_name,
chunk_length_s=30,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration is a bit different for Whisper vs Distil-Whisper:

  • Whisper: chunk length 30s, with timestamps
  • Distil-Whisper: chunk length 15s, without timestamps

chunk_length_s=30,
batch_size=batch_size,
return_timestamps=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also force the language/task for multilingual Whisper/Distil-Whisper checkpoints


max_mem = torch.cuda.max_memory_reserved()
max_mem_mb = max_mem / (1024 * 1024)
print(f"Total memory: {max_mem_mb}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned offline, "relative memory" is probably the best estimate we can give for memory, if we decide to provide one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Benchmarking] Thorough benchmarking for Transformers!
2 participants