You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity?
Action space is continuous. Typical case in Mujoco Ant Cheetah etc. it represents torque.
Suppose in one episode the action space is [1, -1]
Next episode it's [1.2, -0.8]
Next episode it's [1.4, -0.6]
...
...
Some episode in the future it's [2, 0]
..
The change in action space range is governed by some function and it changes over episodes before the beginning of each episode. What should be the expected behaviour of algorithms like ppo trpo ddpg sac td3? Will they be able to handle? Similar question for marl algorithms like mappo maddpg matrpo matd3 etc.
Is this non Stationarity due to changing dynamics? Is there any invalid action range as such. We can bound the overall range to some high low value but the range will change over episodes.
The text was updated successfully, but these errors were encountered:
What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity?
Action space is continuous. Typical case in Mujoco Ant Cheetah etc. it represents torque.
Suppose in one episode the action space is [1, -1]
Next episode it's [1.2, -0.8]
Next episode it's [1.4, -0.6]
...
...
Some episode in the future it's [2, 0]
..
The change in action space range is governed by some function and it changes over episodes before the beginning of each episode. What should be the expected behaviour of algorithms like ppo trpo ddpg sac td3? Will they be able to handle? Similar question for marl algorithms like mappo maddpg matrpo matd3 etc.
Is this non Stationarity due to changing dynamics? Is there any invalid action range as such. We can bound the overall range to some high low value but the range will change over episodes.
The text was updated successfully, but these errors were encountered: