-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)
Easy fix is:
if m_u == m_l:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
else:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)
Same goes for when the target is a delta function (termination state):
.... else:
m_prob[action[i]][i][int(m_u)] += 1
P4ckP4ck
Metadata
Metadata
Assignees
Labels
No labels