-
Notifications
You must be signed in to change notification settings - Fork 64
/
log-50k.txt
360 lines (360 loc) · 30.7 KB
/
log-50k.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
====================================================================================================
- data : /root/autodl-tmp/data/wikitext-103/
- dataset : wt103
- n_layer : 16
- n_head : 10
- d_head : 41
- d_embed : 410
- d_model : 410
- d_inner : 2100
- dropout : 0.1
- dropatt : 0.0
- init : normal
- emb_init : normal
- init_range : 0.1
- emb_init_range : 0.01
- init_std : 0.02
- proj_init_std : 0.01
- optim : adan
- lr : 0.0015
- wd : 0.02
- mom : 0.0
- scheduler : cosine
- warmup_step : 5000
- decay_rate : 0.5
- lr_min : 1e-06
- clip : 0.25
- clip_nonemb : False
- max_step : 50000
- batch_size : 60
- batch_chunk : 1
- tgt_len : 150
- eval_tgt_len : 150
- ext_len : 0
- mem_len : 150
- not_tied : False
- seed : 1111
- cuda : True
- adaptive : True
- div_val : 1
- pre_lnorm : False
- varlen : False
- multi_gpu : True
- log_interval : 200
- eval_interval : 4000
- work_dir : /root/autodl-tmp/-wt103/20220809-222534
- restart : False
- restart_dir :
- debug : False
- same_length : False
- attn_type : 0
- clamp_len : -1
- eta_min : 0.0
- gpu0_bsz : 4
- max_eval_steps : -1
- sample_softmax : -1
- patience : 0
- finetune_v2 : False
- finetune_v3 : False
- fp16 : False
- static_loss_scale : 1
- dynamic_loss_scale : False
- opt_betas : [0.9, 0.9, 0.999]
- tied : True
- n_token : 267735
- n_all_param : 151107538
- n_nonemb_param : 41066400
====================================================================================================
#params = 151107538
#non emb params = 41066400
| epoch 1 step 200 | 200 batches | lr 6e-05 | ms/batch 731.01 | loss 8.99 | ppl 7986.754
| epoch 1 step 400 | 400 batches | lr 0.00012 | ms/batch 671.04 | loss 6.94 | ppl 1033.129
| epoch 1 step 600 | 600 batches | lr 0.00018 | ms/batch 674.05 | loss 6.40 | ppl 599.798
| epoch 1 step 800 | 800 batches | lr 0.00024 | ms/batch 672.64 | loss 6.11 | ppl 452.258
| epoch 1 step 1000 | 1000 batches | lr 0.0003 | ms/batch 672.77 | loss 5.85 | ppl 348.893
| epoch 1 step 1200 | 1200 batches | lr 0.00036 | ms/batch 673.66 | loss 5.65 | ppl 285.037
| epoch 1 step 1400 | 1400 batches | lr 0.00042 | ms/batch 674.81 | loss 5.48 | ppl 240.623
| epoch 1 step 1600 | 1600 batches | lr 0.00048 | ms/batch 671.81 | loss 5.33 | ppl 206.955
| epoch 1 step 1800 | 1800 batches | lr 0.00054 | ms/batch 673.69 | loss 5.21 | ppl 182.225
| epoch 1 step 2000 | 2000 batches | lr 0.0006 | ms/batch 670.74 | loss 5.09 | ppl 162.138
| epoch 1 step 2200 | 2200 batches | lr 0.00066 | ms/batch 672.15 | loss 4.98 | ppl 145.111
| epoch 1 step 2400 | 2400 batches | lr 0.00072 | ms/batch 670.57 | loss 4.89 | ppl 133.331
| epoch 1 step 2600 | 2600 batches | lr 0.00078 | ms/batch 672.95 | loss 4.80 | ppl 121.355
| epoch 1 step 2800 | 2800 batches | lr 0.00084 | ms/batch 671.53 | loss 4.72 | ppl 112.435
| epoch 1 step 3000 | 3000 batches | lr 0.0009 | ms/batch 667.80 | loss 4.67 | ppl 107.032
| epoch 1 step 3200 | 3200 batches | lr 0.00096 | ms/batch 670.42 | loss 4.61 | ppl 100.273
| epoch 1 step 3400 | 3400 batches | lr 0.00102 | ms/batch 673.73 | loss 4.56 | ppl 95.679
| epoch 1 step 3600 | 3600 batches | lr 0.00108 | ms/batch 670.60 | loss 4.48 | ppl 88.439
| epoch 1 step 3800 | 3800 batches | lr 0.00114 | ms/batch 672.03 | loss 4.51 | ppl 90.996
| epoch 1 step 4000 | 4000 batches | lr 0.0012 | ms/batch 660.71 | loss 4.47 | ppl 87.228
----------------------------------------------------------------------------------------------------
| Eval 1 at step 4000 | time: 2706.60s | valid loss 4.43 | valid ppl 83.560
----------------------------------------------------------------------------------------------------
| epoch 1 step 4200 | 4200 batches | lr 0.00126 | ms/batch 741.78 | loss 4.42 | ppl 83.146
| epoch 1 step 4400 | 4400 batches | lr 0.00132 | ms/batch 671.50 | loss 4.40 | ppl 81.572
| epoch 1 step 4600 | 4600 batches | lr 0.00138 | ms/batch 669.10 | loss 4.38 | ppl 79.989
| epoch 1 step 4800 | 4800 batches | lr 0.00144 | ms/batch 671.50 | loss 4.33 | ppl 76.228
| epoch 1 step 5000 | 5000 batches | lr 0.0015 | ms/batch 669.83 | loss 4.37 | ppl 79.175
| epoch 1 step 5200 | 5200 batches | lr 0.0015 | ms/batch 669.53 | loss 4.32 | ppl 74.879
| epoch 1 step 5400 | 5400 batches | lr 0.00149 | ms/batch 668.42 | loss 4.26 | ppl 70.961
| epoch 1 step 5600 | 5600 batches | lr 0.00149 | ms/batch 669.68 | loss 4.28 | ppl 72.426
| epoch 1 step 5800 | 5800 batches | lr 0.00149 | ms/batch 668.33 | loss 4.28 | ppl 71.883
| epoch 1 step 6000 | 6000 batches | lr 0.00148 | ms/batch 669.96 | loss 4.23 | ppl 68.809
| epoch 1 step 6200 | 6200 batches | lr 0.00148 | ms/batch 671.62 | loss 4.20 | ppl 66.917
| epoch 1 step 6400 | 6400 batches | lr 0.00148 | ms/batch 670.80 | loss 4.23 | ppl 68.826
| epoch 1 step 6600 | 6600 batches | lr 0.00147 | ms/batch 671.47 | loss 4.17 | ppl 64.485
| epoch 1 step 6800 | 6800 batches | lr 0.00147 | ms/batch 671.88 | loss 4.16 | ppl 64.148
| epoch 1 step 7000 | 7000 batches | lr 0.00146 | ms/batch 669.08 | loss 4.16 | ppl 64.382
| epoch 1 step 7200 | 7200 batches | lr 0.00146 | ms/batch 669.37 | loss 4.12 | ppl 61.310
| epoch 1 step 7400 | 7400 batches | lr 0.00146 | ms/batch 669.99 | loss 4.11 | ppl 61.000
| epoch 1 step 7600 | 7600 batches | lr 0.00145 | ms/batch 669.12 | loss 4.09 | ppl 59.732
| epoch 1 step 7800 | 7800 batches | lr 0.00145 | ms/batch 671.55 | loss 4.11 | ppl 60.794
| epoch 1 step 8000 | 8000 batches | lr 0.00144 | ms/batch 659.11 | loss 4.10 | ppl 60.478
----------------------------------------------------------------------------------------------------
| Eval 2 at step 8000 | time: 2687.58s | valid loss 4.01 | valid ppl 55.175
----------------------------------------------------------------------------------------------------
| epoch 1 step 8200 | 8200 batches | lr 0.00144 | ms/batch 742.68 | loss 4.08 | ppl 58.932
| epoch 1 step 8400 | 8400 batches | lr 0.00143 | ms/batch 669.52 | loss 4.09 | ppl 59.603
| epoch 1 step 8600 | 8600 batches | lr 0.00143 | ms/batch 670.69 | loss 4.07 | ppl 58.419
| epoch 1 step 8800 | 8800 batches | lr 0.00142 | ms/batch 670.29 | loss 4.08 | ppl 58.862
| epoch 1 step 9000 | 9000 batches | lr 0.00142 | ms/batch 671.07 | loss 4.04 | ppl 57.075
| epoch 1 step 9200 | 9200 batches | lr 0.00141 | ms/batch 670.31 | loss 4.03 | ppl 56.375
| epoch 1 step 9400 | 9400 batches | lr 0.00141 | ms/batch 668.76 | loss 4.04 | ppl 56.654
| epoch 1 step 9600 | 9600 batches | lr 0.0014 | ms/batch 668.70 | loss 4.05 | ppl 57.438
| epoch 1 step 9800 | 9800 batches | lr 0.0014 | ms/batch 669.90 | loss 4.01 | ppl 54.931
| epoch 1 step 10000 | 10000 batches | lr 0.00139 | ms/batch 671.54 | loss 4.02 | ppl 55.691
| epoch 1 step 10200 | 10200 batches | lr 0.00138 | ms/batch 668.10 | loss 3.98 | ppl 53.731
| epoch 1 step 10400 | 10400 batches | lr 0.00138 | ms/batch 668.55 | loss 3.98 | ppl 53.647
| epoch 1 step 10600 | 10600 batches | lr 0.00137 | ms/batch 670.24 | loss 4.00 | ppl 54.823
| epoch 1 step 10800 | 10800 batches | lr 0.00137 | ms/batch 669.67 | loss 3.96 | ppl 52.449
| epoch 1 step 11000 | 11000 batches | lr 0.00136 | ms/batch 668.12 | loss 4.00 | ppl 54.511
| epoch 1 step 11200 | 11200 batches | lr 0.00135 | ms/batch 669.36 | loss 3.98 | ppl 53.348
| epoch 1 step 11400 | 11400 batches | lr 0.00135 | ms/batch 667.23 | loss 3.97 | ppl 53.053
| epoch 2 step 11600 | 130 batches | lr 0.00134 | ms/batch 671.47 | loss 3.95 | ppl 51.832
| epoch 2 step 11800 | 330 batches | lr 0.00134 | ms/batch 670.28 | loss 3.92 | ppl 50.430
| epoch 2 step 12000 | 530 batches | lr 0.00133 | ms/batch 658.97 | loss 3.94 | ppl 51.495
----------------------------------------------------------------------------------------------------
| Eval 3 at step 12000 | time: 2685.36s | valid loss 3.83 | valid ppl 46.199
----------------------------------------------------------------------------------------------------
| epoch 2 step 12200 | 730 batches | lr 0.00132 | ms/batch 741.77 | loss 3.91 | ppl 50.018
| epoch 2 step 12400 | 930 batches | lr 0.00132 | ms/batch 669.29 | loss 3.91 | ppl 50.118
| epoch 2 step 12600 | 1130 batches | lr 0.00131 | ms/batch 670.23 | loss 3.94 | ppl 51.393
| epoch 2 step 12800 | 1330 batches | lr 0.0013 | ms/batch 670.21 | loss 3.91 | ppl 49.684
| epoch 2 step 13000 | 1530 batches | lr 0.00129 | ms/batch 669.82 | loss 3.90 | ppl 49.205
| epoch 2 step 13200 | 1730 batches | lr 0.00129 | ms/batch 668.80 | loss 3.89 | ppl 48.946
| epoch 2 step 13400 | 1930 batches | lr 0.00128 | ms/batch 669.89 | loss 3.90 | ppl 49.160
| epoch 2 step 13600 | 2130 batches | lr 0.00127 | ms/batch 670.73 | loss 3.91 | ppl 50.134
| epoch 2 step 13800 | 2330 batches | lr 0.00127 | ms/batch 669.47 | loss 3.89 | ppl 48.907
| epoch 2 step 14000 | 2530 batches | lr 0.00126 | ms/batch 670.64 | loss 3.88 | ppl 48.187
| epoch 2 step 14200 | 2730 batches | lr 0.00125 | ms/batch 669.45 | loss 3.85 | ppl 47.194
| epoch 2 step 14400 | 2930 batches | lr 0.00124 | ms/batch 670.69 | loss 3.84 | ppl 46.316
| epoch 2 step 14600 | 3130 batches | lr 0.00124 | ms/batch 668.19 | loss 3.84 | ppl 46.742
| epoch 2 step 14800 | 3330 batches | lr 0.00123 | ms/batch 668.82 | loss 3.85 | ppl 46.832
| epoch 2 step 15000 | 3530 batches | lr 0.00122 | ms/batch 669.99 | loss 3.81 | ppl 45.024
| epoch 2 step 15200 | 3730 batches | lr 0.00121 | ms/batch 668.58 | loss 3.83 | ppl 46.255
| epoch 2 step 15400 | 3930 batches | lr 0.0012 | ms/batch 670.31 | loss 3.82 | ppl 45.787
| epoch 2 step 15600 | 4130 batches | lr 0.0012 | ms/batch 667.87 | loss 3.81 | ppl 45.203
| epoch 2 step 15800 | 4330 batches | lr 0.00119 | ms/batch 669.87 | loss 3.82 | ppl 45.456
| epoch 2 step 16000 | 4530 batches | lr 0.00118 | ms/batch 656.97 | loss 3.82 | ppl 45.455
----------------------------------------------------------------------------------------------------
| Eval 4 at step 16000 | time: 2684.61s | valid loss 3.70 | valid ppl 40.554
----------------------------------------------------------------------------------------------------
| epoch 2 step 16200 | 4730 batches | lr 0.00117 | ms/batch 743.72 | loss 3.77 | ppl 43.325
| epoch 2 step 16400 | 4930 batches | lr 0.00116 | ms/batch 669.07 | loss 3.79 | ppl 44.198
| epoch 2 step 16600 | 5130 batches | lr 0.00116 | ms/batch 670.76 | loss 3.78 | ppl 43.728
| epoch 2 step 16800 | 5330 batches | lr 0.00115 | ms/batch 673.39 | loss 3.77 | ppl 43.271
| epoch 2 step 17000 | 5530 batches | lr 0.00114 | ms/batch 668.77 | loss 3.75 | ppl 42.620
| epoch 2 step 17200 | 5730 batches | lr 0.00113 | ms/batch 668.81 | loss 3.77 | ppl 43.340
| epoch 2 step 17400 | 5930 batches | lr 0.00112 | ms/batch 671.39 | loss 3.75 | ppl 42.598
| epoch 2 step 17600 | 6130 batches | lr 0.00111 | ms/batch 670.80 | loss 3.74 | ppl 42.211
| epoch 2 step 17800 | 6330 batches | lr 0.0011 | ms/batch 670.83 | loss 3.77 | ppl 43.377
| epoch 2 step 18000 | 6530 batches | lr 0.0011 | ms/batch 670.94 | loss 3.71 | ppl 40.882
| epoch 2 step 18200 | 6730 batches | lr 0.00109 | ms/batch 671.71 | loss 3.71 | ppl 41.009
| epoch 2 step 18400 | 6930 batches | lr 0.00108 | ms/batch 671.77 | loss 3.73 | ppl 41.510
| epoch 2 step 18600 | 7130 batches | lr 0.00107 | ms/batch 672.45 | loss 3.70 | ppl 40.538
| epoch 2 step 18800 | 7330 batches | lr 0.00106 | ms/batch 676.93 | loss 3.68 | ppl 39.664
| epoch 2 step 19000 | 7530 batches | lr 0.00105 | ms/batch 673.81 | loss 3.70 | ppl 40.567
| epoch 2 step 19200 | 7730 batches | lr 0.00104 | ms/batch 673.02 | loss 3.70 | ppl 40.493
| epoch 2 step 19400 | 7930 batches | lr 0.00103 | ms/batch 671.76 | loss 3.69 | ppl 40.199
| epoch 2 step 19600 | 8130 batches | lr 0.00102 | ms/batch 672.49 | loss 3.70 | ppl 40.628
| epoch 2 step 19800 | 8330 batches | lr 0.00102 | ms/batch 675.15 | loss 3.69 | ppl 40.150
| epoch 2 step 20000 | 8530 batches | lr 0.00101 | ms/batch 662.59 | loss 3.68 | ppl 39.675
----------------------------------------------------------------------------------------------------
| Eval 5 at step 20000 | time: 2694.60s | valid loss 3.60 | valid ppl 36.520
----------------------------------------------------------------------------------------------------
| epoch 2 step 20200 | 8730 batches | lr 0.000997 | ms/batch 743.34 | loss 3.70 | ppl 40.281
| epoch 2 step 20400 | 8930 batches | lr 0.000988 | ms/batch 672.38 | loss 3.69 | ppl 40.101
| epoch 2 step 20600 | 9130 batches | lr 0.000978 | ms/batch 671.32 | loss 3.68 | ppl 39.723
| epoch 2 step 20800 | 9330 batches | lr 0.000969 | ms/batch 670.29 | loss 3.67 | ppl 39.195
| epoch 2 step 21000 | 9530 batches | lr 0.00096 | ms/batch 673.92 | loss 3.71 | ppl 40.874
| epoch 2 step 21200 | 9730 batches | lr 0.00095 | ms/batch 673.78 | loss 3.66 | ppl 38.777
| epoch 2 step 21400 | 9930 batches | lr 0.000941 | ms/batch 671.65 | loss 3.67 | ppl 39.193
| epoch 2 step 21600 | 10130 batches | lr 0.000932 | ms/batch 671.55 | loss 3.65 | ppl 38.482
| epoch 2 step 21800 | 10330 batches | lr 0.000922 | ms/batch 671.69 | loss 3.66 | ppl 38.807
| epoch 2 step 22000 | 10530 batches | lr 0.000913 | ms/batch 671.36 | loss 3.67 | ppl 39.367
| epoch 2 step 22200 | 10730 batches | lr 0.000903 | ms/batch 672.87 | loss 3.63 | ppl 37.849
| epoch 2 step 22400 | 10930 batches | lr 0.000894 | ms/batch 674.08 | loss 3.63 | ppl 37.837
| epoch 2 step 22600 | 11130 batches | lr 0.000884 | ms/batch 671.07 | loss 3.68 | ppl 39.497
| epoch 2 step 22800 | 11330 batches | lr 0.000875 | ms/batch 671.94 | loss 3.64 | ppl 38.144
| epoch 3 step 23000 | 60 batches | lr 0.000865 | ms/batch 672.34 | loss 3.65 | ppl 38.332
| epoch 3 step 23200 | 260 batches | lr 0.000855 | ms/batch 674.27 | loss 3.60 | ppl 36.501
| epoch 3 step 23400 | 460 batches | lr 0.000846 | ms/batch 674.42 | loss 3.64 | ppl 37.995
| epoch 3 step 23600 | 660 batches | lr 0.000836 | ms/batch 672.56 | loss 3.60 | ppl 36.540
| epoch 3 step 23800 | 860 batches | lr 0.000827 | ms/batch 673.12 | loss 3.63 | ppl 37.738
| epoch 3 step 24000 | 1060 batches | lr 0.000817 | ms/batch 664.65 | loss 3.62 | ppl 37.164
----------------------------------------------------------------------------------------------------
| Eval 6 at step 24000 | time: 2697.80s | valid loss 3.52 | valid ppl 33.726
----------------------------------------------------------------------------------------------------
| epoch 3 step 24200 | 1260 batches | lr 0.000807 | ms/batch 740.67 | loss 3.60 | ppl 36.765
| epoch 3 step 24400 | 1460 batches | lr 0.000798 | ms/batch 674.30 | loss 3.60 | ppl 36.720
| epoch 3 step 24600 | 1660 batches | lr 0.000788 | ms/batch 672.55 | loss 3.59 | ppl 36.339
| epoch 3 step 24800 | 1860 batches | lr 0.000778 | ms/batch 671.83 | loss 3.60 | ppl 36.487
| epoch 3 step 25000 | 2060 batches | lr 0.000769 | ms/batch 671.74 | loss 3.63 | ppl 37.859
| epoch 3 step 25200 | 2260 batches | lr 0.000759 | ms/batch 672.23 | loss 3.61 | ppl 36.807
| epoch 3 step 25400 | 2460 batches | lr 0.000749 | ms/batch 671.61 | loss 3.59 | ppl 36.224
| epoch 3 step 25600 | 2660 batches | lr 0.00074 | ms/batch 674.02 | loss 3.59 | ppl 36.343
| epoch 3 step 25800 | 2860 batches | lr 0.00073 | ms/batch 671.84 | loss 3.53 | ppl 34.173
| epoch 3 step 26000 | 3060 batches | lr 0.00072 | ms/batch 672.60 | loss 3.58 | ppl 35.903
| epoch 3 step 26200 | 3260 batches | lr 0.000711 | ms/batch 673.04 | loss 3.58 | ppl 35.696
| epoch 3 step 26400 | 3460 batches | lr 0.000701 | ms/batch 673.00 | loss 3.54 | ppl 34.395
| epoch 3 step 26600 | 3660 batches | lr 0.000692 | ms/batch 673.81 | loss 3.55 | ppl 34.771
| epoch 3 step 26800 | 3860 batches | lr 0.000682 | ms/batch 672.00 | loss 3.55 | ppl 34.852
| epoch 3 step 27000 | 4060 batches | lr 0.000672 | ms/batch 673.44 | loss 3.56 | ppl 35.128
| epoch 3 step 27200 | 4260 batches | lr 0.000663 | ms/batch 671.63 | loss 3.54 | ppl 34.582
| epoch 3 step 27400 | 4460 batches | lr 0.000653 | ms/batch 672.23 | loss 3.55 | ppl 34.678
| epoch 3 step 27600 | 4660 batches | lr 0.000644 | ms/batch 671.70 | loss 3.53 | ppl 34.204
| epoch 3 step 27800 | 4860 batches | lr 0.000634 | ms/batch 670.97 | loss 3.52 | ppl 33.707
| epoch 3 step 28000 | 5060 batches | lr 0.000625 | ms/batch 663.55 | loss 3.53 | ppl 34.105
----------------------------------------------------------------------------------------------------
| Eval 7 at step 28000 | time: 2697.22s | valid loss 3.44 | valid ppl 31.229
----------------------------------------------------------------------------------------------------
| epoch 3 step 28200 | 5260 batches | lr 0.000615 | ms/batch 738.31 | loss 3.51 | ppl 33.439
| epoch 3 step 28400 | 5460 batches | lr 0.000606 | ms/batch 670.03 | loss 3.49 | ppl 32.676
| epoch 3 step 28600 | 5660 batches | lr 0.000596 | ms/batch 673.65 | loss 3.53 | ppl 34.273
| epoch 3 step 28800 | 5860 batches | lr 0.000587 | ms/batch 670.70 | loss 3.50 | ppl 33.257
| epoch 3 step 29000 | 6060 batches | lr 0.000577 | ms/batch 672.88 | loss 3.50 | ppl 33.035
| epoch 3 step 29200 | 6260 batches | lr 0.000568 | ms/batch 671.74 | loss 3.50 | ppl 33.001
| epoch 3 step 29400 | 6460 batches | lr 0.000559 | ms/batch 670.97 | loss 3.50 | ppl 33.162
| epoch 3 step 29600 | 6660 batches | lr 0.00055 | ms/batch 671.14 | loss 3.45 | ppl 31.426
| epoch 3 step 29800 | 6860 batches | lr 0.00054 | ms/batch 672.59 | loss 3.48 | ppl 32.386
| epoch 3 step 30000 | 7060 batches | lr 0.000531 | ms/batch 671.72 | loss 3.47 | ppl 32.047
| epoch 3 step 30200 | 7260 batches | lr 0.000522 | ms/batch 669.64 | loss 3.44 | ppl 31.093
| epoch 3 step 30400 | 7460 batches | lr 0.000513 | ms/batch 674.88 | loss 3.46 | ppl 31.766
| epoch 3 step 30600 | 7660 batches | lr 0.000504 | ms/batch 673.98 | loss 3.44 | ppl 31.226
| epoch 3 step 30800 | 7860 batches | lr 0.000495 | ms/batch 672.05 | loss 3.45 | ppl 31.633
| epoch 3 step 31000 | 8060 batches | lr 0.000486 | ms/batch 675.06 | loss 3.46 | ppl 31.822
| epoch 3 step 31200 | 8260 batches | lr 0.000477 | ms/batch 675.76 | loss 3.45 | ppl 31.384
| epoch 3 step 31400 | 8460 batches | lr 0.000468 | ms/batch 674.16 | loss 3.46 | ppl 31.680
| epoch 3 step 31600 | 8660 batches | lr 0.000459 | ms/batch 673.56 | loss 3.45 | ppl 31.480
| epoch 3 step 31800 | 8860 batches | lr 0.00045 | ms/batch 671.05 | loss 3.45 | ppl 31.470
| epoch 3 step 32000 | 9060 batches | lr 0.000441 | ms/batch 662.55 | loss 3.45 | ppl 31.454
----------------------------------------------------------------------------------------------------
| Eval 8 at step 32000 | time: 2696.71s | valid loss 3.37 | valid ppl 29.048
----------------------------------------------------------------------------------------------------
| epoch 3 step 32200 | 9260 batches | lr 0.000433 | ms/batch 741.24 | loss 3.43 | ppl 30.924
| epoch 3 step 32400 | 9460 batches | lr 0.000424 | ms/batch 672.63 | loss 3.45 | ppl 31.583
| epoch 3 step 32600 | 9660 batches | lr 0.000415 | ms/batch 672.60 | loss 3.45 | ppl 31.560
| epoch 3 step 32800 | 9860 batches | lr 0.000407 | ms/batch 671.88 | loss 3.41 | ppl 30.145
| epoch 3 step 33000 | 10060 batches | lr 0.000398 | ms/batch 672.49 | loss 3.45 | ppl 31.582
| epoch 3 step 33200 | 10260 batches | lr 0.00039 | ms/batch 671.16 | loss 3.40 | ppl 29.971
| epoch 3 step 33400 | 10460 batches | lr 0.000382 | ms/batch 671.28 | loss 3.43 | ppl 30.997
| epoch 3 step 33600 | 10660 batches | lr 0.000373 | ms/batch 672.12 | loss 3.44 | ppl 31.166
| epoch 3 step 33800 | 10860 batches | lr 0.000365 | ms/batch 671.60 | loss 3.39 | ppl 29.578
| epoch 3 step 34000 | 11060 batches | lr 0.000357 | ms/batch 672.62 | loss 3.43 | ppl 30.954
| epoch 3 step 34200 | 11260 batches | lr 0.000349 | ms/batch 671.84 | loss 3.44 | ppl 31.123
| epoch 3 step 34400 | 11460 batches | lr 0.000341 | ms/batch 673.17 | loss 3.41 | ppl 30.185
| epoch 4 step 34600 | 190 batches | lr 0.000333 | ms/batch 670.84 | loss 3.39 | ppl 29.520
| epoch 4 step 34800 | 390 batches | lr 0.000325 | ms/batch 673.47 | loss 3.39 | ppl 29.798
| epoch 4 step 35000 | 590 batches | lr 0.000317 | ms/batch 672.91 | loss 3.38 | ppl 29.482
| epoch 4 step 35200 | 790 batches | lr 0.000309 | ms/batch 671.06 | loss 3.40 | ppl 29.950
| epoch 4 step 35400 | 990 batches | lr 0.000301 | ms/batch 673.00 | loss 3.38 | ppl 29.249
| epoch 4 step 35600 | 1190 batches | lr 0.000294 | ms/batch 673.68 | loss 3.39 | ppl 29.768
| epoch 4 step 35800 | 1390 batches | lr 0.000286 | ms/batch 671.24 | loss 3.38 | ppl 29.479
| epoch 4 step 36000 | 1590 batches | lr 0.000279 | ms/batch 660.61 | loss 3.37 | ppl 29.048
----------------------------------------------------------------------------------------------------
| Eval 9 at step 36000 | time: 2695.59s | valid loss 3.32 | valid ppl 27.645
----------------------------------------------------------------------------------------------------
| epoch 4 step 36200 | 1790 batches | lr 0.000271 | ms/batch 738.61 | loss 3.38 | ppl 29.267
| epoch 4 step 36400 | 1990 batches | lr 0.000264 | ms/batch 671.84 | loss 3.41 | ppl 30.128
| epoch 4 step 36600 | 2190 batches | lr 0.000257 | ms/batch 670.16 | loss 3.39 | ppl 29.614
| epoch 4 step 36800 | 2390 batches | lr 0.00025 | ms/batch 672.50 | loss 3.39 | ppl 29.549
| epoch 4 step 37000 | 2590 batches | lr 0.000242 | ms/batch 674.54 | loss 3.36 | ppl 28.867
| epoch 4 step 37200 | 2790 batches | lr 0.000235 | ms/batch 672.19 | loss 3.34 | ppl 28.314
| epoch 4 step 37400 | 2990 batches | lr 0.000229 | ms/batch 670.71 | loss 3.36 | ppl 28.677
| epoch 4 step 37600 | 3190 batches | lr 0.000222 | ms/batch 668.95 | loss 3.36 | ppl 28.682
| epoch 4 step 37800 | 3390 batches | lr 0.000215 | ms/batch 672.94 | loss 3.36 | ppl 28.683
| epoch 4 step 38000 | 3590 batches | lr 0.000208 | ms/batch 672.33 | loss 3.33 | ppl 27.802
| epoch 4 step 38200 | 3790 batches | lr 0.000202 | ms/batch 673.11 | loss 3.34 | ppl 28.335
| epoch 4 step 38400 | 3990 batches | lr 0.000195 | ms/batch 670.77 | loss 3.36 | ppl 28.747
| epoch 4 step 38600 | 4190 batches | lr 0.000189 | ms/batch 671.42 | loss 3.34 | ppl 28.160
| epoch 4 step 38800 | 4390 batches | lr 0.000183 | ms/batch 674.42 | loss 3.34 | ppl 28.212
| epoch 4 step 39000 | 4590 batches | lr 0.000176 | ms/batch 671.51 | loss 3.35 | ppl 28.619
| epoch 4 step 39200 | 4790 batches | lr 0.00017 | ms/batch 673.38 | loss 3.30 | ppl 27.241
| epoch 4 step 39400 | 4990 batches | lr 0.000164 | ms/batch 671.09 | loss 3.35 | ppl 28.548
| epoch 4 step 39600 | 5190 batches | lr 0.000158 | ms/batch 673.71 | loss 3.31 | ppl 27.271
| epoch 4 step 39800 | 5390 batches | lr 0.000153 | ms/batch 671.79 | loss 3.29 | ppl 26.839
| epoch 4 step 40000 | 5590 batches | lr 0.000147 | ms/batch 663.99 | loss 3.31 | ppl 27.419
----------------------------------------------------------------------------------------------------
| Eval 10 at step 40000 | time: 2695.51s | valid loss 3.28 | valid ppl 26.473
----------------------------------------------------------------------------------------------------
| epoch 4 step 40200 | 5790 batches | lr 0.000141 | ms/batch 737.94 | loss 3.33 | ppl 27.939
| epoch 4 step 40400 | 5990 batches | lr 0.000136 | ms/batch 674.02 | loss 3.30 | ppl 27.155
| epoch 4 step 40600 | 6190 batches | lr 0.00013 | ms/batch 671.99 | loss 3.30 | ppl 27.222
| epoch 4 step 40800 | 6390 batches | lr 0.000125 | ms/batch 674.33 | loss 3.33 | ppl 27.819
| epoch 4 step 41000 | 6590 batches | lr 0.00012 | ms/batch 672.00 | loss 3.26 | ppl 26.092
| epoch 4 step 41200 | 6790 batches | lr 0.000115 | ms/batch 670.91 | loss 3.29 | ppl 26.772
| epoch 4 step 41400 | 6990 batches | lr 0.00011 | ms/batch 670.93 | loss 3.30 | ppl 27.098
| epoch 4 step 41600 | 7190 batches | lr 0.000105 | ms/batch 672.93 | loss 3.25 | ppl 25.775
| epoch 4 step 41800 | 7390 batches | lr 9.98e-05 | ms/batch 673.77 | loss 3.28 | ppl 26.457
| epoch 4 step 42000 | 7590 batches | lr 9.51e-05 | ms/batch 672.27 | loss 3.25 | ppl 25.813
| epoch 4 step 42200 | 7790 batches | lr 9.05e-05 | ms/batch 671.48 | loss 3.28 | ppl 26.654
| epoch 4 step 42400 | 7990 batches | lr 8.6e-05 | ms/batch 671.27 | loss 3.28 | ppl 26.600
| epoch 4 step 42600 | 8190 batches | lr 8.16e-05 | ms/batch 673.39 | loss 3.27 | ppl 26.227
| epoch 4 step 42800 | 8390 batches | lr 7.73e-05 | ms/batch 673.21 | loss 3.29 | ppl 26.959
| epoch 4 step 43000 | 8590 batches | lr 7.32e-05 | ms/batch 675.70 | loss 3.27 | ppl 26.299
| epoch 4 step 43200 | 8790 batches | lr 6.91e-05 | ms/batch 673.58 | loss 3.29 | ppl 26.749
| epoch 4 step 43400 | 8990 batches | lr 6.52e-05 | ms/batch 673.15 | loss 3.28 | ppl 26.451
| epoch 4 step 43600 | 9190 batches | lr 6.13e-05 | ms/batch 671.88 | loss 3.26 | ppl 26.136
| epoch 4 step 43800 | 9390 batches | lr 5.76e-05 | ms/batch 673.32 | loss 3.28 | ppl 26.443
| epoch 4 step 44000 | 9590 batches | lr 5.4e-05 | ms/batch 662.94 | loss 3.29 | ppl 26.910
----------------------------------------------------------------------------------------------------
| Eval 11 at step 44000 | time: 2697.59s | valid loss 3.25 | valid ppl 25.763
----------------------------------------------------------------------------------------------------
| epoch 4 step 44200 | 9790 batches | lr 5.05e-05 | ms/batch 740.81 | loss 3.27 | ppl 26.191
| epoch 4 step 44400 | 9990 batches | lr 4.71e-05 | ms/batch 672.14 | loss 3.26 | ppl 26.166
| epoch 4 step 44600 | 10190 batches | lr 4.38e-05 | ms/batch 670.84 | loss 3.26 | ppl 26.037
| epoch 4 step 44800 | 10390 batches | lr 4.07e-05 | ms/batch 672.90 | loss 3.26 | ppl 26.088
| epoch 4 step 45000 | 10590 batches | lr 3.76e-05 | ms/batch 673.66 | loss 3.29 | ppl 26.884
| epoch 4 step 45200 | 10790 batches | lr 3.47e-05 | ms/batch 672.88 | loss 3.24 | ppl 25.586
| epoch 4 step 45400 | 10990 batches | lr 3.19e-05 | ms/batch 671.20 | loss 3.28 | ppl 26.487
| epoch 4 step 45600 | 11190 batches | lr 2.92e-05 | ms/batch 674.06 | loss 3.28 | ppl 26.688
| epoch 4 step 45800 | 11390 batches | lr 2.66e-05 | ms/batch 670.83 | loss 3.28 | ppl 26.449
| epoch 5 step 46000 | 120 batches | lr 2.41e-05 | ms/batch 671.63 | loss 3.26 | ppl 26.029
| epoch 5 step 46200 | 320 batches | lr 2.18e-05 | ms/batch 675.05 | loss 3.24 | ppl 25.647
| epoch 5 step 46400 | 520 batches | lr 1.96e-05 | ms/batch 671.64 | loss 3.28 | ppl 26.462
| epoch 5 step 46600 | 720 batches | lr 1.75e-05 | ms/batch 674.85 | loss 3.24 | ppl 25.535
| epoch 5 step 46800 | 920 batches | lr 1.55e-05 | ms/batch 672.46 | loss 3.24 | ppl 25.522
| epoch 5 step 47000 | 1120 batches | lr 1.36e-05 | ms/batch 672.98 | loss 3.28 | ppl 26.567
| epoch 5 step 47200 | 1320 batches | lr 1.19e-05 | ms/batch 669.86 | loss 3.24 | ppl 25.624
| epoch 5 step 47400 | 1520 batches | lr 1.02e-05 | ms/batch 673.34 | loss 3.25 | ppl 25.746
| epoch 5 step 47600 | 1720 batches | lr 8.72e-06 | ms/batch 673.91 | loss 3.24 | ppl 25.514
| epoch 5 step 47800 | 1920 batches | lr 7.33e-06 | ms/batch 672.36 | loss 3.27 | ppl 26.267
| epoch 5 step 48000 | 2120 batches | lr 6.06e-06 | ms/batch 663.53 | loss 3.29 | ppl 26.743
----------------------------------------------------------------------------------------------------
| Eval 12 at step 48000 | time: 2697.55s | valid loss 3.24 | valid ppl 25.471
----------------------------------------------------------------------------------------------------
| epoch 5 step 48200 | 2320 batches | lr 4.91e-06 | ms/batch 739.34 | loss 3.27 | ppl 26.196
| epoch 5 step 48400 | 2520 batches | lr 3.88e-06 | ms/batch 674.08 | loss 3.25 | ppl 25.864
| epoch 5 step 48600 | 2720 batches | lr 2.97e-06 | ms/batch 672.56 | loss 3.24 | ppl 25.526
| epoch 5 step 48800 | 2920 batches | lr 2.18e-06 | ms/batch 672.85 | loss 3.23 | ppl 25.302
| epoch 5 step 49000 | 3120 batches | lr 1.52e-06 | ms/batch 673.40 | loss 3.25 | ppl 25.757
| epoch 5 step 49200 | 3320 batches | lr 9.71e-07 | ms/batch 672.09 | loss 3.27 | ppl 26.197
| epoch 5 step 49400 | 3520 batches | lr 5.46e-07 | ms/batch 670.25 | loss 3.23 | ppl 25.175
| epoch 5 step 49600 | 3720 batches | lr 2.43e-07 | ms/batch 673.34 | loss 3.25 | ppl 25.791
| epoch 5 step 49800 | 3920 batches | lr 6.07e-08 | ms/batch 670.68 | loss 3.25 | ppl 25.720
| epoch 5 step 50000 | 4120 batches | lr 0 | ms/batch 475.96 | loss 3.25 | ppl 25.749
----------------------------------------------------------------------------------------------------
End of training
====================================================================================================
| End of training | test loss 3.27 | test ppl 26.217
====================================================================================================