Implement optimization in sampling the pool #13

kailuowang · 2015-06-17T01:34:11Z

Deepmind has some rules when sampling from the pool - make sure some transition with reward gets sampled. Might help learning rate.

       index = torch.random(2, self.numEntries-self.recentMemSize)
        if self.t[index+self.recentMemSize-1] == 0 then
            valid = true
        end
        if self.nonTermProb < 1 and self.t[index+self.recentMemSize] == 0 and
            torch.uniform() > self.nonTermProb then
            -- Discard non-terminal states with probability (1-nonTermProb).
            -- Note that this is the terminal flag for s_{t+1}.
            valid = false
        end
        if self.nonEventProb < 1 and self.t[index+self.recentMemSize] == 0 and
            self.r[index+self.recentMemSize-1] == 0 and
            torch.uniform() > self.nonTermProb then
            -- Discard non-terminal or non-reward states with
            -- probability (1-nonTermProb).
            valid = false
        end

The text was updated successfully, but these errors were encountered:

kailuowang added the enhancement label Jun 17, 2015

kailuowang added this to the 0.2-ConvolutionDQN milestone Jun 17, 2015

kailuowang modified the milestones: 1.0 maturation, 0.2-ConvolutionDQN Dec 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement optimization in sampling the pool #13

Implement optimization in sampling the pool #13

kailuowang commented Jun 17, 2015

Implement optimization in sampling the pool #13

Implement optimization in sampling the pool #13

Comments

kailuowang commented Jun 17, 2015