DropGrad is a regularization method for neural networks that works by randomly (and independently) setting gradient values to zero before an optimization step. Similarly to Dropout, it has a single parameter, drop_rate
, the probability of setting each parameter gradient to zero. In order to de-bias the remaining gradient values, they are divided by 1.0 - drop_rate
.
- Features
- What's New in Version 0.3.5?
- Directory Structure
- Installation
- Usage
- Examples
- Testing
- Analysis
- Windows CUDA Setup
- Contributing
- License
- Star History
- Simple and easy-to-use gradient regularization technique
- Compatible with various optimizers and learning rate schedulers
- Supports per-parameter drop rates for fine-grained control
- Implements drop rate schedulers for dynamic regularization
- Provides an option to apply "full" update drop for further regularization
- Utilizes mixed-precision training for improved performance and memory efficiency (CUDA devices only)
- Cross-platform compatibility: Works seamlessly on macOS, Windows, and Linux
- Added support for the Lion optimizer in the ViT experiments
- Implemented gradient clipping to prevent gradient explosion and improve training stability
- Enhanced data augmentation techniques for better model generalization
- Improved error handling and user interruption handling during training
- Updated test suite to cover various aspects of DropGrad, including initialization, optimization step, drop rate scheduling, and saving of loss values
- Code refactoring and documentation enhancements for better readability and maintainability
Description | Quick Access |
---|---|
The examples directory contains sample code demonstrating various use cases of DropGrad, including basic usage, integration with learning rate schedulers, applying full update drop, and training a Vision Transformer (ViT) on the CIFAR-10 dataset under different regularization scenarios.
| └── examples ├── basic_usage.py ├── lr_scheduler_integration.py ├── full_update_drop.py └── vit_experiments ├── vit_model.py ├── train.py ├── visualize.py ├── mathematical_analysis.py ├── benchmark_visualizations.py └── *.pth |
The docs directory contains detailed documentation and analysis of the DropGrad method, as well as instructions for setting up CUDA on Windows for PyTorch and DropGrad.
| └── docs ├── analysis.md └── windows_cuda_setup.md |
The dropgrad directory contains the core implementation of the DropGrad optimizer and drop rate schedulers.
| └── dropgrad ├── __init__.py ├── dropgrad_opt.py └── dropgrad_scheduler.py |
The tests directory contains the test suite for DropGrad, ensuring the correctness of the implementation. The tests cover the functionality of the DropGrad optimizer and the drop rate schedulers.
| └── tests ├── __init__.py ├── test_dropgrad.py ├── test_dropgrad_optimizer.py └── test_dropgrad_scheduler.py |
This section highlights the key files related to project configuration, requirements, and licensing. | ├── .gitignore ├── LICENSE ├── pyproject.toml ├── README.md └── requirements.txt |
- Python >= 3.7
- PyTorch >= 1.12.0
- torchvision >= 0.13.0
- torchaudio >= 0.12.0
- matplotlib
- scipy
To install DropGrad using pip, run the following command:
pip install dropgrad
To install DropGrad from source, follow these steps:
git clone https://github.com/dingo-actual/dropgrad.git
cd dropgrad
pip install -r requirements.txt
pip install .
To use DropGrad in your neural network optimization, simply import the DropGrad
class and wrap your optimizer:
from dropgrad import DropGrad
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
optimizer = DropGrad(optimizer, drop_rate=0.1)
During training, call .step()
on the wrapped optimizer to apply DropGrad, and then call .zero_grad()
to reset the gradients:
optimizer.step()
optimizer.zero_grad()
DropGrad supports drop rate schedulers to dynamically adjust the drop rate during training. The package provides several built-in schedulers, including LinearDropRateScheduler
, CosineAnnealingDropRateScheduler
, and StepDropRateScheduler
. To use a drop rate scheduler, pass an instance of a scheduler to the DropGrad
constructor:
from dropgrad import DropGrad, LinearDropRateScheduler
scheduler = LinearDropRateScheduler(initial_drop_rate=0.1, final_drop_rate=0.0, num_steps=1000)
optimizer = DropGrad(optimizer, drop_rate_scheduler=scheduler)
DropGrad provides an option to apply "full" update drop by interrupting the .step()
method. To enable this feature, pass full_update_drop=True
to the DropGrad
constructor:
optimizer = DropGrad(optimizer, drop_rate=0.1, full_update_drop=True)
DropGrad allows specifying different drop rates for individual parameters or parameter groups. This enables fine-grained control over the regularization applied to different parts of the model. To vary drop rates per parameter, pass a dictionary mapping parameters to drop rates:
params = {
'encoder': 0.1,
'decoder': 0.2
}
optimizer = DropGrad(optimizer, params=params)
The examples
directory contains sample code demonstrating various use cases of DropGrad, including basic usage, integration with learning rate schedulers, applying full update drop, and training a Vision Transformer (ViT) on the CIFAR-10 dataset under different regularization scenarios.
DropGrad includes a test suite to ensure the correctness of the implementation. The tests cover the functionality of the DropGrad
optimizer and the drop rate schedulers. To run the tests, use the following command:
pytest tests/
For a detailed analysis of the DropGrad method, including its theoretical foundations, advantages, and empirical results, please refer to the docs/analysis.md
file.
For instructions on setting up CUDA on Windows for PyTorch and DropGrad, please refer to the docs/windows_cuda_setup.md
file.
Contributions to DropGrad are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
DropGrad is released under the MIT License. See the LICENSE
file for more details.