You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was looking into the same thing and I think there should not be a realistic scenario in which this zero reward is needed as it would only happen if you call step on an already terminated environment. This can also be seen in lines 212 and following of the gymnasium cartpole implementation.
CartPole has this code which only sets reward = 0 if the previous state was terminal:
gymnax/gymnax/environments/classic_control/cartpole.py
Lines 83 to 84 in aef77d5
Environment.step has this code which means a terminal state is never passed in to CartPole.step_env:
gymnax/gymnax/environments/environment.py
Lines 45 to 50 in aef77d5
Hence CartPole.step() always returns 1 which makes it entirely broken.
The text was updated successfully, but these errors were encountered: