Skip to content

Examples: Using TE with LLaMA-Factory #2509

@sbhavani

Description

@sbhavani

Is your feature request related to a problem? Please describe.

When trying to integrate TE FP8 training with frameworks built on Hugging Face Trainer (like LLaMA-Factory), because there's no documentation showing the correct integration pattern, this led to bugs in LLaMA-Factory where:

  1. FP8 training silently fell back to BF16
  2. When FP8 was activated, it used torchao backend instead of TE

Without official documentation showing best practices for framework integration, it was implemented incorrectly, leading to poor user experiences and TE getting blamed for performance issues that are actually integration bugs.

Related issue: hiyouga/LLaMA-Factory#9500

Describe the solution you'd like

Add official documentation/examples in the TE repository showing how to integrate TE with frameworks built on Hugging Face Trainer. Specifically:

  1. Integration Guide Page covering:
  • Architecture: TE → HF Accelerate → HF Trainer
  • Why config files are recommended over programmatic setup
  • Common pitfalls when wrapping Trainer
  1. Framework-Specific Examples:
  • examples/llamafactory/ - LLaMA-Factory integration
  • examples/axolotl/ - Axolotl integration example

Describe alternatives you've considered

N/A

Additional context

Key Findings:

  • When properly configured, TE delivers the expected 1.3-1.5x speedup on H100/B200
  • The accelerate launch --config_file approach is more reliable than programmatic config

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions