generator loss curve #66

demour96 · 2024-12-25T09:08:24Z

I trained a maskgit（ImageBert） model using imagenet-1k dataset for reproduce your work, but after training 22 epochs(199 in total), i found the mlm loss cuve decrease slowly and the accuracy always lower than 0.01. Could you please provide some info in the process of reducing training loss and acc? If possible, please also share the possible problems that may occur during the mlm task process. Looking forward to your reply, tks.
The training command line like this:
WANDB_MODE=offline accelerate launch --num_processes=8 --main_process_port=10086 --same_network scripts/train_maskgit.py config=configs/training/generator/maskgit_original.yaml \ experiment.project="titok_generation" \ experiment.name="titok_l32_maskgit_original" \ experiment.output_dir="titok_l32_maskgit_original" \ experiment.output_dir="titok_l32_maskgit_original"
and the config file like this：

`
experiment:
project: "titok_generation"
name: "titok_l32_maskgit"
max_train_examples: 1_281_167
save_every: 50_000
eval_every: 50_000
generate_every: 5_000
log_every: 50
log_grad_norm_every: 1_000
resume: True
tokenizer_checkpoint: "xxx/tokenizer_titok_l32.bin"
model:
vq_model:
codebook_size: 4096
token_size: 12
use_l2_norm: True
commitment_cost: 0.25
# vit arch
vit_enc_model_size: "large"
vit_dec_model_size: "large"
vit_enc_patch_size: 16
vit_dec_patch_size: 16
num_latent_tokens: 32
finetune_decoder: True
generator:
model_type: "ViT"
hidden_size: 768
num_hidden_layers: 24
num_attention_heads: 16
intermediate_size: 3072
dropout: 0.1
attn_drop: 0.1
num_steps: 8
class_label_dropout: 0.1
image_seq_len: ${model.vq_model.num_latent_tokens}
condition_num_classes: 1000

    # sampling hyper-params on the flight
    randomize_temperature: 1.0
    guidance_scale: 4.5
    guidance_decay: "constant"

losses:
label_smoothing: 0.1
loss_weight_unmasked_token: 0.1

dataset:
params:
train_shards_path_or_url: "xxx/imagenet-wds/imagenet-train-{000000..000320}.tar" #"imagenet_sharded/train/imagenet-train-{0000..0252}.tar"
eval_shards_path_or_url: "xxx/imagenet-val-{000000..000049}.tar"
num_workers_per_gpu: 12
preprocessing:
resize_shorter_edge: 256
crop_size: 256
random_crop: False
random_flip: True

optimizer:
name: adamw
params:
learning_rate: 2e-4
beta1: 0.9
beta2: 0.96
weight_decay: 0.03

lr_scheduler:
scheduler: "cosine"
params:
learning_rate: ${optimizer.params.learning_rate}
warmup_steps: 10_000
end_lr: 1e-5

training:
gradient_accumulation_steps: 1
per_gpu_batch_size: 96 # 32 GPU, total batch size 2048
mixed_precision: "bf16"
enable_tf32: True
enable_wandb: True
use_ema: True
seed: 42
max_train_steps: 500_000
max_grad_norm: 1.0`

The text was updated successfully, but these errors were encountered:

cornettoyu · 2025-01-03T21:40:31Z

Hi,

It seems that you are using a total batch size of 96 * 8 = 768 while our results was obtained by a total batch size = 2048. I would suggest stick to our original setting if you aim at reproducing the results. For loss curve and accuracy, I think it is reasonable to see it decreases slowly, mainly due to the high masking ratio in MaskGIT strategy, it should be fine as long as you can see meaningful images are generated in the process.

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generator loss curve #66

generator loss curve #66

demour96 commented Dec 25, 2024 •

edited

Loading

cornettoyu commented Jan 3, 2025

generator loss curve #66

generator loss curve #66

Comments

demour96 commented Dec 25, 2024 • edited Loading

cornettoyu commented Jan 3, 2025

demour96 commented Dec 25, 2024 •

edited

Loading