About the details of learning rate #7

hongxin001 · 2020-11-02T13:00:39Z

There is a sentence in the appendix: "With batch normalization, we effectively cancel the learning rate of Meta-Weight-Net, and it works well with a fixed learning rate. "

I'm not sure what it is about. Would you please give an explanation in detail? Does it mean we don't need to fine-tune the learning rate of meta networks because of BN?

shanshuo · 2022-02-20T16:21:07Z

@xjtushujun Thanks for sharing the code of your nice work. For this part, I also have questions as @hongxin001 .

Why batch normalization could cancel the learning rate of MV-Net?
In the old version, you normalize the weight by its sum:

meta-weight-net/train_WRN-28-10_Meta_PGC.py

Line 175 in bd1fd3e

w_v = w_new / norm_v

While in the stable version of MV-Net you didn't:

meta-weight-net/MW-Net.py

Line 207 in 3f1800f

l_f_meta = torch.sum(cost_v * v_lambda)/len(cost_v)

Why new version is more stable? How does it avoid the output weight becoming all zeros?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the details of learning rate #7

About the details of learning rate #7

hongxin001 commented Nov 2, 2020

shanshuo commented Feb 20, 2022

About the details of learning rate #7

About the details of learning rate #7

Comments

hongxin001 commented Nov 2, 2020

shanshuo commented Feb 20, 2022