forked from CS3704-VT/hw0-github-starter-course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
testing.txt
171 lines (139 loc) · 4.26 KB
/
testing.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
Testing of different hyperparameters from https://github.com/adityavinodk/wood_chopper_minecraft
-------------------------------------------------
Default
-------------------------------------------------
weightsB.npy
Default parameters
epsilon=0.8, alpha=0.1, gamma=0.9
training
30 missions at 0.33 minutes (20sec)
logs cut: 10
testing
5 missions at 0.5 minutes
logs cut: 2
1 final reward: -360
2 final reward: -120*
3 final reward: -240
4 final reward: -20*
5 final reward: -280
notes:
* means it cut a log in testing
-------------------------------------------------
Lower Exploration Rate
-------------------------------------------------
weightsC.npy
lower epsilon parameter:
epsilon=0.4, alpha=0.1, gamma=0.9
training
30 missions at 0.33 minutes (20sec)
logs cut: 8
testing
5 missions at 0.5 minutes
logs cut: 2
1 final reward: 0
2 final reward: 980*
3 final reward: 0
4 final reward: 420*
5 final reward: 0
notes:
Focuses on exploitation of learned strategies, using the Q-network more often
does not move as much, and repeats movement/policy often
could experiment with epsilon decay
-------------------------------------------------
Higher Learning Rate
-------------------------------------------------
weightsD.npy
higher alpha parameter:
epsilon=0.8, alpha=0.4, gamma=0.9
training
30 missions at 0.33 minutes (20sec)
logs cut: 20
testing
5 missions at 0.5 minutes (30sec)
logs cut: 4
1 final reward: -720*
2 final reward: -620*
3 final reward: -620*
4 final reward: -720
5 final reward: -680*
notes:
Makes learning faster but can lead to instability or overshooting optimal Q-values
Able to cut trees in succession
Learned that moving left to right to scan for trees provides a higher reward
High amount of logs cut
Tends to move to the left and cuts trees on the way, more training required
-------------------------------------------------
Lower Discount Factor
-------------------------------------------------
weightsE.npy
lower gamma parameter:
epsilon=0.8, alpha=0.1, gamma=0.5
training
30 missions at 0.33 minutes (20sec)
logs cut: 22
testing
5 missions at 0.5 minutes (30sec)
logs cut: 5
1 final reward: 1140*
2 final reward: 1220*
3 final reward: 1480*
4 final reward: 0
5 final reward: 1380.0*
notes:
Focuses more on immediate rewards, useful for short-term tasks.
cuts a lot of wood
environment space is small, short, and finite which is good for a lower discount factor
-------------------------------------------------
Hybrid of best results
-------------------------------------------------
weightsF.npy
lower gamma parameter:
epsilon=0.7, alpha=0.3, gamma=0.5
training
30 missions at 0.33 minutes (20sec)
logs cut: 32
testing
5 missions at 0.5 minutes (30sec)
logs cut: 3
1 final reward: -680*
2 final reward: -680*
3 final reward: -780
4 final reward: -780
5 final reward: -680*
notes:
Hybrid of the best performing agents, using a higher learning rate, a lower discount rate, and a slightly lower exploration rate
Potentially use this agent with a decaying epsilon and a smaller learning rate to stabilize policy and produce a reliable amount of logs cut
learned to cut logs quickly, and can cut logs repeatedly
scans left to right and then moves into the center in an L pattern of movement
-------------------------------------------------
All parameters set to 0.0
-------------------------------------------------
weightsZero.npy
lower gamma parameter:
epsilon=0.0, alpha=0.0, gamma=0.0
training
30 missions at 0.33 minutes (20sec)
logs cut: 8
notes:
does nothing for some missions
If attack was not hardcoded to be all the time, then no logs would be cut
-------------------------------------------------
All parameters set to 1.0
-------------------------------------------------
weightsOne.npy
lower gamma parameter:
epsilon=0.0, alpha=0.0, gamma=0.0
training
30 missions at 0.33 minutes (20sec)
logs cut: 20
-------------------------------------------------
All parameters set to 0.4
-------------------------------------------------
weights04.npy
lower gamma parameter:
epsilon=0.4, alpha=0.4, gamma=0.4
training
30 missions at 0.33 minutes (20sec)
logs cut: 19
notes:
managed to get all logs cut in one mission