-
Notifications
You must be signed in to change notification settings - Fork 583
Documentation for cpu offloading #2520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…m low precision training Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
… add GPU checks Changes: - Remove optimizer code from all recipe examples (keep only forward/backward) - Fix Format imports (use Format.E4M3 instead of string 'E4M3') - Fix params_dtype for PyTorch examples (add params_dtype=torch.bfloat16) - Add GPU capability assertions before START blocks for blockwise/mxfp8/nvfp4 - Fix JAX imports (Float8CurrentScaling from common.recipe, NVFP4BlockScaling) - Add global_shard_guard for TransformerLayer examples in JAX - Fix fused_layers_jax.py return tuple unpacking - Update memory_usage JAX examples with dynamic GPU measurement - Remove memory_usage_3_jax (JAX doesn't support FP8 weight storage) - Update performance_considerations.rst for JAX differences - Delete unused .out files and fp8_autocast_jax.py Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
- Restructure sections: move example to 'CPU Offloading in Transformer Engine', create separate 'Default Offloading Scheduling' section - Add intro paragraphs explaining when each mode is enabled - Clarify scheduling algorithm with event-based offloading details - Document ManualOffloadSynchronizer methods with accurate stream synchronization behavior - Add use case for manual mode (pipeline parallelism, custom scheduling) - Improve Caveats section with PyTorch hooks link and clearer explanations - Use documentation-style language throughout - Fix grammatical issues and trailing whitespace Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Greptile OverviewGreptile SummaryThis PR adds comprehensive documentation for CPU offloading, a memory optimization technique that moves activation tensors between GPU and CPU memory to reduce GPU memory usage during training. Key additions:
The documentation is well-structured, technically accurate, and provides practical guidance for users implementing CPU offloading in their training pipelines. Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant Model
participant Context as cpu_offload_context
participant Sync as sync_function
participant GPU
participant CPU
participant Stream as offload_stream
User->>Model: Initialize layers & get_cpu_offload_context()
Model->>User: Returns (context, sync_function)
Note over User,CPU: Forward Pass
loop For each layer
User->>Context: with cpu_offload_context:
Context->>GPU: Capture tensors saved for backward
User->>Model: layer.forward(x)
Model->>GPU: Compute forward pass
GPU-->>Context: Store activations
Context->>Stream: Queue async GPU→CPU copy
Stream->>CPU: Transfer activations (async)
User->>Sync: sync_function(x)
Sync->>GPU: Return output tensor
end
Note over User,CPU: Backward Pass
User->>GPU: loss.backward()
loop For each layer (reverse order)
GPU->>Stream: Check if activation ready
Stream->>GPU: Wait for CPU→GPU reload
CPU->>GPU: Transfer activations (async)
GPU->>GPU: Compute layer backward
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
51 files reviewed, no comments
Description
Add docs for cpu offloading, must be merged after #2343 . Please review only file with cpu offloading doc: features -> other optimizations -> cpu offloading in the docs menu.
Fixes # (issue)
Type of change
Checklist: