Release 0.4.8 · kyegomez/LongNet

Changelog for DilatedAttention with ParallelWrapper:

1. Added `ParallelWrapper` Class

Introduced a ParallelWrapper class to simplify the usage of data parallelism.
The ParallelWrapper class:
- Takes a neural network model as input.
- Allows the user to specify a device ("cuda" or "cpu").
- Contains a flag use_data_parallel to enable or disable data parallelism.
- Checks if multiple GPUs are available and applies nn.DataParallel to the model accordingly.
- Redirects attribute accesses to the internal model for seamless usage.

2. Modified Usage of `DilatedAttention` Model

Wrapped the DilatedAttention model using the ParallelWrapper class.
Enabled the model to be run on multiple GPUs if available.

3. Device Assignment

Explicitly defined a device and used it to specify where the DilatedAttention model should be loaded.
The device defaults to GPU (cuda:0) if CUDA is available; otherwise, it defaults to CPU.

4. Example Usage

Provided an example of how to initialize and use the ParallelWrapper with the DilatedAttention model.

Summary:

The key addition was the ParallelWrapper class to facilitate easy and configurable usage of data parallelism with the provided DilatedAttention model. This ensures scalability across multiple GPUs without any significant change in the existing workflow. The user can now enable or disable data parallelism using a single flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.4.8

Changelog for DilatedAttention with ParallelWrapper:

1. Added `ParallelWrapper` Class

2. Modified Usage of `DilatedAttention` Model

3. Device Assignment

4. Example Usage

Summary:

0.4.8

Changelog for DilatedAttention with ParallelWrapper:

1. Added ParallelWrapper Class

2. Modified Usage of DilatedAttention Model

3. Device Assignment

4. Example Usage

Summary:

1. Added `ParallelWrapper` Class

2. Modified Usage of `DilatedAttention` Model