Examples: Using TE with LLaMA-Factory

**Is your feature request related to a problem? Please describe.**

When trying to integrate TE FP8 training with frameworks built on Hugging Face Trainer (like LLaMA-Factory), because there's no documentation showing the correct integration pattern, this led to bugs in LLaMA-Factory where:

1. FP8 training silently fell back to BF16
2. When FP8 was activated, it used torchao backend instead of TE

Without official documentation showing best practices for framework integration, it was implemented incorrectly, leading to poor user experiences and TE getting blamed for performance issues that are actually integration bugs.

Related issue: https://github.com/hiyouga/LLaMA-Factory/issues/9500

**Describe the solution you'd like**

Add official documentation/examples in the TE repository showing how to integrate TE with frameworks built on Hugging Face Trainer. Specifically:

1. Integration Guide Page covering:

  - Architecture: TE → HF Accelerate → HF Trainer
  - Why config files are recommended over programmatic setup
  - Common pitfalls when wrapping Trainer

  2. Framework-Specific Examples:
  - examples/llamafactory/ - LLaMA-Factory integration
  - examples/axolotl/ - Axolotl integration example

**Describe alternatives you've considered**

N/A

**Additional context**

Key Findings:
  - When properly configured, TE delivers the expected 1.3-1.5x speedup on H100/B200
  - The accelerate launch --config_file approach is more reliable than programmatic config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Examples: Using TE with LLaMA-Factory #2509

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Examples: Using TE with LLaMA-Factory #2509

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions