Replies: 3 comments 23 replies
-
One thing I noticed that I don't know what it means is the I don't believe that the optimizer should be one of the first things to swap out. I could be wrong but the sentiment in ML seems to be that Adam or AdamW are the way to go. Also Instead you could play around with the I also don't understand the range of The metrics have a high variance, what value for Lastly, you can try to train for more iterations for the best hyper parameters. I've trained one model for 300k steps and it still improved somewhat in the last half. |
Beta Was this translation helpful? Give feedback.
-
if you have enough gpu resources, you can use some hyperparameter optimization algorithm such as random search or Bayesian Model-Based optimization, there are some packages for python, such as hyperopt, more info here |
Beta Was this translation helpful? Give feedback.
-
not much, I am trying to construct a new model, only about 0.89 bleu4 for
now, still finding better hyper parameters
Yuhang Tao ***@***.***> 于2022年5月29日周日 22:38写道:
… No, how about you?
Best wishes
Yuhang Tao
On Sun, May 29, 2022 at 5:46 PM rainyl ***@***.***> wrote:
> so, have you achieved higher scores?
>
> —
> Reply to this email directly, view it on GitHub
> <
#146 (reply in thread)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AILZBLML4F4RFKEIRDWX2PLVMM4HTANCNFSM5VD2IWCQ
>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#146 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHGQMH2C23DCLZM6YNNELXDVMN6PVANCNFSM5VD2IWCQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This discussion is more close to an endless answer's question.
Recap
Here is some related information I have found, contains two issues #27 (comment), #44 (comment) about BLEU and one discussion about LR schedule and optimizer.
Experiment
I have added 0.35 million Wikipedia data to the training set, so there are 0.5 million in total. After that, I create a wandb sweep.
panel1 date&bleu
panel2 hyperparamer influence
panel3 fully connected layer style
panel4 token acc
panel5 val_bleu
I have added more data and tried different hyperparameters which include lr schedule and optimizer, even the number of encoder layers and decoder layers, see panel3 fully connected layer style for more detail. But nothing gets better. What can I do to achieve higher performance?
5 days before, I reckon more data and hyper-parameter searching will work, but now the result is frustrating.
Beta Was this translation helpful? Give feedback.
All reactions