Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[事前学習] - MoE 8x13B #94

Open
Taishi-N324 opened this issue Dec 9, 2024 · 0 comments
Open

[事前学習] - MoE 8x13B #94

Taishi-N324 opened this issue Dec 9, 2024 · 0 comments
Assignees
Labels
pretrain Experiment of model pretrain

Comments

@Taishi-N324
Copy link
Member

Taishi-N324 commented Dec 9, 2024

Overview

LLM-jp-3 13B のリリース済みcheckpointである、llm-jp/llm-jp-3-13bにDrop-Upcycling (r=0.5)を適用をし, 8x13Bのcheckpointを構築したのち, 2.1Tデータでの学習を行う

Details

モデルカードPR: https://github.com/llm-jp/model-cards/pull/29

Resources

  • 計算機
    • クラスタ: Sakura (Ishikari)
    • ノード種別: gpu-small (H100x8)
    • ノード台数: 32
  • コード
  • 入力データ:
    • LLM-jp v3.1 コーパス: sakura:/data/llm-jp-corpus/v3.{0,1}.0
  • 出力データ:
    • 保存先:
      • `sakura:/data/experiments/0094_v3-8x13b-exp1
      • `sakura:/home/shared/experiments/0094_v3-8x13b-exp1
    • データ内訳:
      • checkpoint: FIXME TB (バッファ容量を含む)
  • W&B ログ:
  • 開始日: 2024-12-13
  • 終了予定日: 2024-MM-DD (バッファ期間を含む)
@Taishi-N324 Taishi-N324 added the pretrain Experiment of model pretrain label Dec 9, 2024
@Taishi-N324 Taishi-N324 self-assigned this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pretrain Experiment of model pretrain
Projects
None yet
Development

No branches or pull requests

1 participant