-
Notifications
You must be signed in to change notification settings - Fork 583
Description
Is your feature request related to a problem? Please describe.
When trying to integrate TE FP8 training with frameworks built on Hugging Face Trainer (like LLaMA-Factory), because there's no documentation showing the correct integration pattern, this led to bugs in LLaMA-Factory where:
- FP8 training silently fell back to BF16
- When FP8 was activated, it used torchao backend instead of TE
Without official documentation showing best practices for framework integration, it was implemented incorrectly, leading to poor user experiences and TE getting blamed for performance issues that are actually integration bugs.
Related issue: hiyouga/LLaMA-Factory#9500
Describe the solution you'd like
Add official documentation/examples in the TE repository showing how to integrate TE with frameworks built on Hugging Face Trainer. Specifically:
- Integration Guide Page covering:
- Architecture: TE → HF Accelerate → HF Trainer
- Why config files are recommended over programmatic setup
- Common pitfalls when wrapping Trainer
- Framework-Specific Examples:
- examples/llamafactory/ - LLaMA-Factory integration
- examples/axolotl/ - Axolotl integration example
Describe alternatives you've considered
N/A
Additional context
Key Findings:
- When properly configured, TE delivers the expected 1.3-1.5x speedup on H100/B200
- The accelerate launch --config_file approach is more reliable than programmatic config