Skip to content

Issue when m_u equals m_l #5

@tesslerc

Description

@tesslerc

When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.

m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Easy fix is:

if m_u == m_l:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
else:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Same goes for when the target is a delta function (termination state):

.... else:
m_prob[action[i]][i][int(m_u)] += 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions