[1.0.0] CTGAN Optimization #77
Labels
difficulty-hard
documentation
Improvements or additions to documentation
enhancement
New feature or request
Milestone
Problem
When large amount of real data is used to train a CTGAN model, the current implementation is not working well.
Since all the data (DataFrame) is loaded into the memory when training, this will cause huge memory consumption, which is not an elegant solution.
Proposed Solution
Fortunately, in this refactoring, sdgx provides the new DataLoader and the NDArryLoader under development.
We can use these new data-related components to modify the Data transformer, Data sampler, and CTGAN model.
The data will not be loaded into the memory all at once, instead, the data will be loaded into the memory in rows or columns (chunks) according to needs, then the data will be used to train the model.
This will effectively reduce memory consumption and provide larger data processing capabilities.
Additional context
TBD
The text was updated successfully, but these errors were encountered: