Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Changing action space with time/episode #3284

Open
prinshul opened this issue Sep 3, 2024 · 1 comment
Open

[Question] Changing action space with time/episode #3284

prinshul opened this issue Sep 3, 2024 · 1 comment

Comments

@prinshul
Copy link

prinshul commented Sep 3, 2024

What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity?

Action space is continuous. Typical case in Mujoco Ant Cheetah etc. it represents torque.
Suppose in one episode the action space is [1, -1]

Next episode it's [1.2, -0.8]
Next episode it's [1.4, -0.6]
...
...
Some episode in the future it's [2, 0]
..

The change in action space range is governed by some function and it changes over episodes before the beginning of each episode. What should be the expected behaviour of algorithms like ppo trpo ddpg sac td3? Will they be able to handle? Similar question for marl algorithms like mappo maddpg matrpo matd3 etc.

Is this non Stationarity due to changing dynamics? Is there any invalid action range as such. We can bound the overall range to some high low value but the range will change over episodes.

@w1463442883
Copy link

w1463442883 commented Sep 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants