Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[事前学習] - Moving model averaging #85

Open
odashi opened this issue Nov 21, 2024 · 0 comments
Open

[事前学習] - Moving model averaging #85

odashi opened this issue Nov 21, 2024 · 0 comments
Assignees
Labels
pretrain Experiment of model pretrain

Comments

@odashi
Copy link
Member

odashi commented Nov 21, 2024

Overview

学習済みの事前学習モデルについて、直前 $N$ チェックポイントの平均を用いて平均化モデルを作成する。
作成したモデルは通常通り #60 で検証する。

Details

$N$ は2から順に大きな値で試す。
1.8B, 3.7B, 7.2B, 13Bで実験を行い、可能なら172Bでも実施

Resources

  • 計算機
    • クラスタ: Sakura (Ishikari)
    • ノード種別: gpu-small (H100x8)
    • ノード台数: 1
  • コード
  • 入力データ:
    • LLM-jp-3 checkpoints (1.8B, 3.7B, 7.2B, 13B)
  • 出力データ:
    • 保存先:
      • sakura:/data/experiments/0085_averaging
    • データ内訳:
      • sakura: 200 TB (バッファ容量を含む)
  • 開始日: 2024-11-21
  • 終了予定日: 2024-12-15 (バッファ期間を含む)
@odashi odashi added the pretrain Experiment of model pretrain label Nov 21, 2024
@odashi odashi self-assigned this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pretrain Experiment of model pretrain
Projects
None yet
Development

No branches or pull requests

1 participant