Make the Parallelizer class work for inference #715

michaelbenayoun · 2024-10-10T14:17:52Z

What does this PR do?

As per title.

JingyaHuang · 2024-10-22T15:12:47Z

optimum/neuron/distributed/encoder_decoder_models.py

@@ -111,6 +111,7 @@ def transform(
 sequence_parallel_enabled: bool = False,
 device: Optional[torch.device] = None,
 should_parallelize_layer_predicate_func: Optional[Callable[[torch.nn.Module], bool]] = None,
+ **parallel_layer_specific_kwargs,


as per my experience with T5, the majority of args in parallel_layer_specific_kwargs sent to the transform() functions in T5's parallel modules raised errors, eg:

TypeError: transform() got an unexpected keyword argument 'skip_linear_weight_load'

Fix small issue

dc4b0fd

JingyaHuang reviewed Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Parallelizer class work for inference #715

Make the Parallelizer class work for inference #715

michaelbenayoun commented Oct 10, 2024

JingyaHuang Oct 22, 2024 •

edited

Loading

Make the Parallelizer class work for inference #715

Are you sure you want to change the base?

Make the Parallelizer class work for inference #715

Conversation

michaelbenayoun commented Oct 10, 2024

What does this PR do?

JingyaHuang Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

JingyaHuang Oct 22, 2024 •

edited

Loading