-
Notifications
You must be signed in to change notification settings - Fork 139
Hyperparameter Optimization
Nicholas Léonard edited this page Jun 1, 2015
·
27 revisions
This page is for those wishing to optimize the hyperparameters of the different example scripts in dp.
th examples/convolutionneuralnetwork.lua --dataset Svhn --learningRate 0.1 --maxNormPeriod 1 --accUpdate --cuda --maxOutNorm 1 --batchSize 32
Hyper-parameters | Epoch | Train | Valid | Test |
---|---|---|---|---|
--activation ReLU --hiddenSize '{3000,2000}' --dropout --channelSize '{32,64}' --lecunlcn --normalInit | 17 | 0.9356 | 0.9263 | 0.9208 |
--activation ReLU --hiddenSize '{3000,2000}' --dropout --channelSize '{32,48,64}' --padding '{2,2,2}' --normalInit --lecunlcn | 10 | 0.9168 | 0.9211 | 0.9182 |
--activation ReLU --hiddenSize '{2000}' --dropout --channelSize '{32,64}' --lecunlcn --normalInit | 8 | 0.8858 | 0.9160 | 0.9038 |
--activation ReLU --hiddenSize '{1000}' --channelSize '{32,64}' --lecunlcn --normalInit | 27 | 0.9954 | 0.9135 | 0.9018 |
Hyper-parameter optimization of the deepinception.lua
script for training the Google Street View House Numbers (SVHN) dataset.
base command :
th examples/deepinception.lua --accUpdate --progress --cuda --batchSize 64 --learningRate 0.1 --activation ReLU
The following table contains different inflections of the above command
Hyper-parameters | Epoch | Train | Valid | Test |
---|---|---|---|---|
--hiddenSize '{4000,4000,4000}' --batchNorm | 47 | 0.9999 | 0.9803 | 0.9717 |
--hiddenSize '{4000,4000,4000}' --lecunlcn --dropout | 49 | 0.9707 | 0.9752 | 0.9629 |
Best so far. The key seems to be --forceForget
th examples/recurrentlanguagemodel.lua --batchSize 64
--trainEpochSize 200000000 --validEpochSize -1 --softmaxtree
--hiddenSize 500 --maxOutNorm 2 --useDevice 1 --rho 5 --cuda
--maxTries 100 --maxWait 1 --learningRate 2 --decayFactor 0.7
--forceForget
==> epoch # 62 for optimizer
==> epoch size = 200000000 examples
==> batch duration = 0.15673168840051 ms
==> epoch duration = 31346.337680101 s
==> example speed = 6380.3306798089 examples/s
==> batch speed = 99.698441071279 batches/s
localhost:1420735570:1:optimizer:loss avgError 5.0304999091405
localhost:1420735570:1:validator:loss avgError 4.8598861868066
localhost:1420735570:1:tester:loss avgError 4.8469628958766
localhost:1420735570:1:optimizer:perplexity perplexity = 153.00948441942
localhost:1420735570:1:validator:perplexity perplexity = 129.00951828653
localhost:1420735570:1:tester:perplexity perplexity = 127.35301752411
SaveToFile: saving to /var/lib/torch/save/localhost:1420735570:1.dat
This one was really unstable (ran for 60 epochs):
th examples/recurrentlanguagemodel.lua --batchSize 64 --trainEpochSize 200000000 --validEpochSize -1 --softmaxtree --hiddenSize 500 --maxOutNorm 2 --useDevice 2 --rho 5 --cuda --maxTries 100 --progress --maxWait 1 --learningRate 2 --decayFactor 0.7 --xpPath '/home/nicholas14/save/rhea:1418341272:1.dat'
==> epoch # 12 for optimizer
[================================ 200000000/200000000 =======================>] ETA: 0ms | Step: 0ms
==> epoch size = 200000000 examples
==> batch duration = 0.13946574515462 ms
==> epoch duration = 27893.149030924 s
==> example speed = 7170.2194606378 examples/s
==> batch speed = 112.04055865242 batches/s
[================================ 7937041/7937041 ===========================>] ETA: 0ms | Step: 0ms
rhea:1418341272:1:optimizer:loss avgError 5.5206494449397
rhea:1418341272:1:validator:loss avgError 5.5606322623734
rhea:1418341272:1:tester:loss avgError 5.5568762899011
rhea:1418341272:1:optimizer:perplexity perplexity = 249.79721405813
rhea:1418341272:1:validator:perplexity perplexity = 259.9871644699
rhea:1418341272:1:tester:perplexity perplexity = 259.01249140542