I'm experimenting with policyIteration params to see the differences between policyIteration and valueIteration. While tuning the params in my model, I'm noticing that sometimes policyIteration will end before a policy that leads to the terminal state is found. On line 121 of PolicyUtils, there is a do/while loop condition that states !env.isInTerminalState(). If my policy doesn't lead to the terminal state, this while loop will hang.
I feel like there should be a better way to follow the policy instead of checking to see if it reaches the terminal state.