-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for huggingface ASR datasets #749
base: main
Are you sure you want to change the base?
Conversation
del inputs | ||
del targets | ||
del outputs | ||
torch.cuda.empty_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this explicit empty_cache()
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the data from previous batches were not freed which lead to the evaluator crashing OOM after a some batches.
Quoting #740 (comment)
I noticed that running the evaluation with
max_samples > 20
on a 15GB VRAM setup was causing a CUDA out-of-memory error. To address this, I made improvements to better manage and free up memory. With these updates, you can now run the evaluation on the entire dataset, provided thebatch_size
fits within the available memory. I successfully tested this on the whole test split (approximately 2939 examples) usingwhisper-tiny
andwav2vec2_asr_base_10h
withmax_num_elements=10
.
...Edit:
max_num_elements
can also be increased significantly by definingmax_audio_len
as some audio samples were quite long and the whole batch gets padded to the longest sequence causing some batches to require a lot more memory than expected.
@@ -82,97 +125,133 @@ class AsrEvalConfig: | |||
"""The data type of the model.""" | |||
|
|||
|
|||
asr_eval_presets = ConfigRegistry[AsrEvalConfig]() | |||
@dataclass | |||
class EvalSeqBatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Seq2SeqBatch
class expects targets to be tensors, which is useful for training. However, during evaluation, encoding the target only to decode it again without utilizing the tensors seems redundant.
Is there a better approach? another fairseq2 data-structure to use?
hg
recipes
What does this PR do? Please describe:
This PR builds upon #740 by incorporating the refactoring changes here and adding new commits that enable support for defining other datasets via configuration.
Example:
This evaluates wav2vec2 model on google/fleurs dataset by just overriding the relevant configs.
Does your PR introduce any breaking changes? If yes, please list them:
None.
Check list: