You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deepmind has some rules when sampling from the pool - make sure some transition with reward gets sampled. Might help learning rate.
index = torch.random(2, self.numEntries-self.recentMemSize)
if self.t[index+self.recentMemSize-1] == 0 then
valid = true
end
if self.nonTermProb < 1 and self.t[index+self.recentMemSize] == 0 and
torch.uniform() > self.nonTermProb then
-- Discard non-terminal states with probability (1-nonTermProb).
-- Note that this is the terminal flag for s_{t+1}.
valid = false
end
if self.nonEventProb < 1 and self.t[index+self.recentMemSize] == 0 and
self.r[index+self.recentMemSize-1] == 0 and
torch.uniform() > self.nonTermProb then
-- Discard non-terminal or non-reward states with
-- probability (1-nonTermProb).
valid = false
end
The text was updated successfully, but these errors were encountered:
Deepmind has some rules when sampling from the pool - make sure some transition with reward gets sampled. Might help learning rate.
The text was updated successfully, but these errors were encountered: