Issue when m_u equals m_l

When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.

> m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
> m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Easy fix is:
> if m_u == m_l:
>   m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
> else:
>   m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
>   m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Same goes for when the target is a delta function (termination state):
> .... else:
> m_prob[action[i]][i][int(m_u)] += 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue when m_u equals m_l #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue when m_u equals m_l #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions