Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator loss curve #66

Open
demour96 opened this issue Dec 25, 2024 · 1 comment
Open

generator loss curve #66

demour96 opened this issue Dec 25, 2024 · 1 comment

Comments

@demour96
Copy link

demour96 commented Dec 25, 2024

I trained a maskgit(ImageBert) model using imagenet-1k dataset for reproduce your work, but after training 22 epochs(199 in total), i found the mlm loss cuve decrease slowly and the accuracy always lower than 0.01. Could you please provide some info in the process of reducing training loss and acc? If possible, please also share the possible problems that may occur during the mlm task process. Looking forward to your reply, tks.
The training command line like this:
WANDB_MODE=offline accelerate launch --num_processes=8 --main_process_port=10086 --same_network scripts/train_maskgit.py config=configs/training/generator/maskgit_original.yaml \ experiment.project="titok_generation" \ experiment.name="titok_l32_maskgit_original" \ experiment.output_dir="titok_l32_maskgit_original" \ experiment.output_dir="titok_l32_maskgit_original"
and the config file like this:

`
experiment:
project: "titok_generation"
name: "titok_l32_maskgit"
max_train_examples: 1_281_167
save_every: 50_000
eval_every: 50_000
generate_every: 5_000
log_every: 50
log_grad_norm_every: 1_000
resume: True
tokenizer_checkpoint: "xxx/tokenizer_titok_l32.bin"
model:
vq_model:
codebook_size: 4096
token_size: 12
use_l2_norm: True
commitment_cost: 0.25
# vit arch
vit_enc_model_size: "large"
vit_dec_model_size: "large"
vit_enc_patch_size: 16
vit_dec_patch_size: 16
num_latent_tokens: 32
finetune_decoder: True
generator:
model_type: "ViT"
hidden_size: 768
num_hidden_layers: 24
num_attention_heads: 16
intermediate_size: 3072
dropout: 0.1
attn_drop: 0.1
num_steps: 8
class_label_dropout: 0.1
image_seq_len: ${model.vq_model.num_latent_tokens}
condition_num_classes: 1000

    # sampling hyper-params on the flight
    randomize_temperature: 1.0
    guidance_scale: 4.5
    guidance_decay: "constant"

losses:
label_smoothing: 0.1
loss_weight_unmasked_token: 0.1

dataset:
params:
train_shards_path_or_url: "xxx/imagenet-wds/imagenet-train-{000000..000320}.tar" #"imagenet_sharded/train/imagenet-train-{0000..0252}.tar"
eval_shards_path_or_url: "xxx/imagenet-val-{000000..000049}.tar"
num_workers_per_gpu: 12
preprocessing:
resize_shorter_edge: 256
crop_size: 256
random_crop: False
random_flip: True

optimizer:
name: adamw
params:
learning_rate: 2e-4
beta1: 0.9
beta2: 0.96
weight_decay: 0.03

lr_scheduler:
scheduler: "cosine"
params:
learning_rate: ${optimizer.params.learning_rate}
warmup_steps: 10_000
end_lr: 1e-5

training:
gradient_accumulation_steps: 1
per_gpu_batch_size: 96 # 32 GPU, total batch size 2048
mixed_precision: "bf16"
enable_tf32: True
enable_wandb: True
use_ema: True
seed: 42
max_train_steps: 500_000
max_grad_norm: 1.0`

@cornettoyu
Copy link
Collaborator

Hi,

It seems that you are using a total batch size of 96 * 8 = 768 while our results was obtained by a total batch size = 2048. I would suggest stick to our original setting if you aim at reproducing the results. For loss curve and accuracy, I think it is reasonable to see it decreases slowly, mainly due to the high masking ratio in MaskGIT strategy, it should be fine as long as you can see meaningful images are generated in the process.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants