Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.61 KB

README.md

File metadata and controls

35 lines (28 loc) · 1.61 KB

MDP-value-and-policy-iteration-

A Markov Decision Process (MDP) implementation using value and policy iteration to calculate the optimal policy.

Inputs used are a 4x3 world, a 5x5 world and a 11x11 world. The inputs can be easily modified as desired.

4x3 world:

([[-0.04, -0.04, -0.04, +1],
[-0.04, None, -0.04, -1],
[-0.04, -0.04, -0.04, -0.04]],
terminals=[(3, 2), (3, 1)])

5x5 world:

([[-0.04, -0.04, -0.04, -0.04, +1],
[-0.04, -0.04, -0.04, -0.4, -1],
[-0.04, -0.04, None, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04]],
terminals=[(4, 4), (4, 3)])

11x11 world:

([[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, +1],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -1],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, None, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, None, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, None, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04],
[-0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04, -0.04]],
terminals=[(10, 10), (10, 9)])