Changelog for DilatedAttention with ParallelWrapper:
1. Added ParallelWrapper
Class
- Introduced a
ParallelWrapper
class to simplify the usage of data parallelism. - The
ParallelWrapper
class:- Takes a neural network model as input.
- Allows the user to specify a device ("cuda" or "cpu").
- Contains a flag
use_data_parallel
to enable or disable data parallelism. - Checks if multiple GPUs are available and applies
nn.DataParallel
to the model accordingly. - Redirects attribute accesses to the internal model for seamless usage.
2. Modified Usage of DilatedAttention
Model
- Wrapped the
DilatedAttention
model using theParallelWrapper
class. - Enabled the model to be run on multiple GPUs if available.
3. Device Assignment
- Explicitly defined a device and used it to specify where the
DilatedAttention
model should be loaded. - The device defaults to GPU (
cuda:0
) if CUDA is available; otherwise, it defaults to CPU.
4. Example Usage
- Provided an example of how to initialize and use the
ParallelWrapper
with theDilatedAttention
model.
Summary:
The key addition was the ParallelWrapper
class to facilitate easy and configurable usage of data parallelism with the provided DilatedAttention
model. This ensures scalability across multiple GPUs without any significant change in the existing workflow. The user can now enable or disable data parallelism using a single flag.