Feature Description
Add support for distributed training and generation to handle datasets that exceed single-GPU/node memory limits.
Motivation
Currently, Synthcity plugins are limited to single-GPU training, which restricts the size of datasets that can be processed. Large healthcare, financial, or scientific datasets often require distributed computing resources.
Proposed Solution
Implement a distributed training wrapper that supports:
- Multi-GPU training using PyTorch DDP (DistributedDataParallel)
- Distributed data loading with automatic sharding
- Gradient accumulation for memory-constrained environments
- Distributed generation with load balancing
- Integration with Ray or Dask for cluster orchestration