Auto reset in Environment.step() means CartPole always returns reward = 1 #82

garymm · 2024-08-03T03:36:19Z

CartPole has this code which only sets reward = 0 if the previous state was terminal:

gymnax/gymnax/environments/classic_control/cartpole.py

Lines 83 to 84 in aef77d5

    
           # Important: Reward is based on termination is previous step transition 
        
           reward = 1.0 - prev_terminal

Environment.step has this code which means a terminal state is never passed in to CartPole.step_env:

gymnax/gymnax/environments/environment.py

Lines 45 to 50 in aef77d5

    
           obs_st, state_st, reward, done, info = self.step_env(key, state, action, params) 
        
           obs_re, state_re = self.reset_env(key_reset, params) 
        
           # Auto-reset environment based on termination 
        
           state = jax.tree_map( 
        
               lambda x, y: jax.lax.select(done, x, y), state_re, state_st 
        
           )

Hence CartPole.step() always returns 1 which makes it entirely broken.

samuelstevens · 2024-08-28T12:26:38Z

I am experiencing the same issue.

Michael190502 · 2024-11-06T10:47:04Z

Hi, I was looking into the same thing and I think there should not be a realistic scenario in which this zero reward is needed as it would only happen if you call step on an already terminated environment. This can also be seen in lines 212 and following of the gymnasium cartpole implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto reset in Environment.step() means CartPole always returns reward = 1 #82

Auto reset in Environment.step() means CartPole always returns reward = 1 #82

garymm commented Aug 3, 2024

samuelstevens commented Aug 28, 2024

Michael190502 commented Nov 6, 2024

Auto reset in Environment.step() means CartPole always returns reward = 1 #82

Auto reset in Environment.step() means CartPole always returns reward = 1 #82

Comments

garymm commented Aug 3, 2024

samuelstevens commented Aug 28, 2024

Michael190502 commented Nov 6, 2024