-
Notifications
You must be signed in to change notification settings - Fork 460
Open
Description
# Make sure you are in directory ./deepanalyze/ms-swift/
swift sft \
--model "${MODEL_SINGLE_ABILITY_PATH}" \
--train_type "lora" \
--lora_rank 32 \
--lora_alpha 64 \
--dataset \
"${DATA_DIR}/interation/data_pipeline_3601.json#10" \
"${DATA_DIR}/interation/data_preparation_3311.json#10" \
"${DATA_DIR}/interation/data_cleaning_1616.json#10" \
"${DATA_DIR}/interation/data_analysis_3936.json#10" \
"${DATA_DIR}/interation/data_insight_1062.json#10" \
"${DATA_DIR}/interation/research_database_818.json#10" \
"${DATA_DIR}/interation/research_xlsx_848.json#10" \
"${DATA_DIR}/interation/research_other_3505.json#10" \
"${DATA_DIR}/interation/research_data_preparation_488.json#10" \
"${DATA_DIR}/interation/research_data_analysis_1339.json#10" \
"${DATA_DIR}/interation/research_data_insight_1351.json#10" \
"${DATA_DIR}/interation/research_report_generation_4327.json#10" \
--torch_dtype "bfloat16" \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 4 \
--learning_rate 1e-5 \
--gradient_accumulation_steps 32 \
--packing true \
--eval_steps 1 \
--save_steps 5 \
--logging_steps 1 \
--max_length 32768 \
--warmup_ratio 0.05 \
--dataloader_num_workers 8 \
--dataset_num_proc 8 \
--save_total_limit 1 \
--response_prefix "" \
--save_only_model false \
--output_dir "${MODEL_MULTI_ABILITY_PATH}" \
--deepspeed "zero3_offload" \
--use_liger_kernel true \
--attn_impl "flash_attn" \
--model_type "deepseek_r1_distill"如题,请问博主在第二阶段的训练中有出现类似情况吗,刚训练就grad_norm就直接很大
{'loss': 0.85817486, 'grad_norm': 9764998217728.0, 'learning_rate': 1e-05, 'memory(GiB)': 18.41, 'train_speed(iter/s)': 0.014675, 'epoch': 0.17, 'global_step/max_steps': '1/18', 'percentage': '5.56%', 'elapsed_time': '58s', 'remaining_time': '16m 41s'}
Train: 6%|██▍ | 1/18
[00:5Train: 11%|██▋ | 2/18 [01:40<13:03, 48.96s/it]{'loss': 0.80747825, 'grad_norm': 1.0, 'learning_rate': 9.91e-06, 'memory(GiB)': 20.31, 'train_speed(iter/s)': 0.01816, 'epoch': 0.33, 'global_step/max_steps': '2/18', 'percentage': '11.11%', 'elapsed_time': '1m 40s', 'remaining_time': '13m 27s'}
{'loss': 0.0, 'grad_norm': 1.0, 'learning_rate': 9.66e-06, 'memory(GiB)': 20.31, 'train_speed(iter/s)': 0.019781, 'epoch': 0.5, 'global_step/max_steps': '3/18', 'percentage': '16.67%', 'elapsed_time': '2m 22s', 'remaining_time': '11m 52s'}
{'loss': 0.0, 'grad_norm': 1.0, 'learning_rate': 9.25e-06, 'memory(GiB)': 20.31, 'train_speed(iter/s)': 0.020808, 'epoch': 0.67, 'global_step/max_steps': '4/18', 'percentage': '22.22%', 'elapsed_time': '3m 3s', 'remaining_time': '10m 40s'}
Train: 22%|█████▎ | 4/18 [03:03<10:10, 43.59s/it]
Metadata
Metadata
Assignees
Labels
No labels