Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sakura]Add v3 172B exp2 scripts #13

Merged
merged 11 commits into from
Aug 24, 2024
Merged

[sakura]Add v3 172B exp2 scripts #13

merged 11 commits into from
Aug 24, 2024

Conversation

k141303
Copy link
Contributor

@k141303 k141303 commented Aug 7, 2024

@k141303 k141303 requested review from odashi, Taka008 and YumaTsuta August 7, 2024 10:04

事前に v3-megatron-sakura インストーラで `/data/experiments/{exp-id}/environment` に環境をインストールしたものとします。
`{exp-id}` は登録時のIDを指定しますが、実験結果保全のため本実験のIDは指定しないでください。
また `/data/experiments/{exp-id}/checkpoints` に以前のチェックポイントが保存されているものとします。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/data/home/shared に変更してください。

@@ -0,0 +1,39 @@
#!/bin/bash
#SBATCH --job-name=llama-2-172b-exp2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slurmの管理上の都合で、実験番号をprefixしておきたいです。

Suggested change
#SBATCH --job-name=llama-2-172b-exp2
#SBATCH --job-name=9_llama-2-172b-exp2

@k141303
Copy link
Contributor Author

k141303 commented Aug 8, 2024

一旦実験番号14で打ちました。(後から9に変わる可能性があります。)

@k141303
Copy link
Contributor Author

k141303 commented Aug 8, 2024

CHECKPOINT_LOAD_DIRを/home以下にしてからマージすること。

@k141303 k141303 requested a review from odashi August 14, 2024 08:30
@k141303 k141303 requested a review from jnszk August 14, 2024 08:39
@k141303
Copy link
Contributor Author

k141303 commented Aug 14, 2024

実験14の環境で学習が低速になる問題が解決できていないため、実験9の環境で実行することになりました。

@k141303 k141303 requested a review from odashi August 24, 2024 14:12
@k141303 k141303 merged commit c79ff5c into main Aug 24, 2024
@k141303 k141303 deleted the v3-172b-exp2-sakura branch August 24, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants