You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for bug reports. For questions, please visit Discussions.
I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English中文日本語Portuguese (Brazil)
I have searched for existing issues, including closed ones. Search issues
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Ubuntu 22.04, torch==2.4.1, Gradio 5.9.0
Steps to Reproduce
I am fine tuning fishspeech 1.4 on a new language(Panjabi) without lora.
First I checkout to git checkout tags/v1.4.3
Then I started the training with below config
This created the step*.ckpt in {result} dir.. Which I converted to .pth using tools/extract_model.py
Updated the model.pth from fishspeech1.4 to new checkpoint.
When trying to inference, I am getting the following on gradio The expanded size of the tensor (4096) must match the existing size (4170) at non-singleton dimension 1. Target sizes: [9, 4096]. Tensor sizes: [9, 4170]
defaults:
- base
- _self_paths:
run_dir: results/${project}ckpt_dir: ${paths.run_dir}/ft_checkpoints2hydra:
run:
dir: ${paths.run_dir}project: text2semantic_finetune_dual_armax_length: 4096pretrained_ckpt_path: checkpoints/fish-speech-1.4# Lightning Trainertrainer:
accumulate_grad_batches: 2gradient_clip_val: 1.0gradient_clip_algorithm: "norm"max_steps: 10000precision: bf16-truelimit_val_batches: 10val_check_interval: 500benchmark: true# Dataset Configurationtokenizer:
_target_: transformers.AutoTokenizer.from_pretrainedpretrained_model_name_or_path: ${pretrained_ckpt_path} # Dataset Configurationtrain_dataset:
_target_: fish_speech.datasets.semantic.AutoTextSemanticInstructionDatasetproto_files:
- data/protostokenizer: ${tokenizer}causal: truemax_length: ${max_length}use_speaker: falseinteractive_prob: 0.7val_dataset:
_target_: fish_speech.datasets.semantic.AutoTextSemanticInstructionDatasetproto_files:
- data/protostokenizer: ${tokenizer}causal: truemax_length: ${max_length}use_speaker: falseinteractive_prob: 0.7data:
_target_: fish_speech.datasets.semantic.SemanticDataModuletrain_dataset: ${train_dataset}val_dataset: ${val_dataset}num_workers: 8batch_size: 20tokenizer: ${tokenizer}max_length: ${max_length}# Model Configurationmodel:
_target_: fish_speech.models.text2semantic.lit_module.TextToSemanticmodel:
_target_: fish_speech.models.text2semantic.llama.BaseTransformer.from_pretrainedpath: ${pretrained_ckpt_path}load_weights: truemax_length: ${max_length}lora_config: nulloptimizer:
_target_: torch.optim.AdamW_partial_: truelr: 2e-5weight_decay: 0betas: [0.9, 0.95]eps: 1e-5lr_scheduler:
_target_: torch.optim.lr_scheduler.LambdaLR_partial_: truelr_lambda:
_target_: fish_speech.scheduler.get_constant_schedule_with_warmup_lr_lambda_partial_: truenum_warmup_steps: 2500# Callbackscallbacks:
model_checkpoint:
every_n_train_steps: ${trainer.val_check_interval}dirpath: ${paths.ckpt_dir}filename: "step_{step:09d}"save_last: true # additionally always save an exact copy of the last checkpoint to a file last.ckptsave_top_k: 5# save 5 latest checkpointsmonitor: step # use step to monitor checkpointsmode: max # save the latest checkpoint with the highest global_stepevery_n_epochs: null # don't save checkpoints by epoch endauto_insert_metric_name: falsemodel_summary:
_target_: lightning.pytorch.callbacks.ModelSummarymax_depth: 2# the maximum depth of layer nesting that the summary will includelearning_rate_monitor:
_target_: lightning.pytorch.callbacks.LearningRateMonitorlogging_interval: steplog_momentum: falsegrad_norm_monitor:
_target_: fish_speech.callbacks.GradNormMonitornorm_type: 2logging_interval: step# Loggerlogger:
tensorboard:
_target_: lightning.pytorch.loggers.tensorboard.TensorBoardLoggersave_dir: "${paths.run_dir}/tensorboard/"name: nulllog_graph: falsedefault_hp_metric: trueprefix: ""wandb:
_target_: lightning.pytorch.loggers.wandb.WandbLogger# name: "" # name of the run (normally generated by wandb)save_dir: "${paths.run_dir}"offline: Falseid: null # pass correct id to resume experiment!anonymous: null # enable anonymous loggingproject: "fish-speech"log_model: False # upload lightning ckptsprefix: ""# a string to put at the beginning of metric keys# entity: "" # set to name of your wandb teamgroup: ""tags: ["vq", "hq", "finetune"]job_type: ""```### ✔️ Expected BehaviorInference should work on fine tuned check point.Loss should not diverge.### ❌ Actual BehaviorInference error.
The text was updated successfully, but these errors were encountered:
Self Checks
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Ubuntu 22.04, torch==2.4.1, Gradio 5.9.0
Steps to Reproduce
I am fine tuning
fishspeech
1.4 on a new language(Panjabi) without lora.git checkout tags/v1.4.3
tools/extract_model.py
NOTE: The loss is also started increasing after certain iterations.. not sure why that is the case... You can check the logs on wandb - https://wandb.ai/kdcyberdude/fish-speech/workspace?nw=nwuserkdcyberdude
YAML config to train
The text was updated successfully, but these errors were encountered: