[Question] Changing action space with time/episode #3284

prinshul · 2024-09-03T12:12:05Z

What is the expected behaviour of on off policy algorithms when the action space itself changes with episodes. This leads to non Stationarity?

Action space is continuous. Typical case in Mujoco Ant Cheetah etc. it represents torque.
Suppose in one episode the action space is [1, -1]

Next episode it's [1.2, -0.8]
Next episode it's [1.4, -0.6]
...
...
Some episode in the future it's [2, 0]
..

The change in action space range is governed by some function and it changes over episodes before the beginning of each episode. What should be the expected behaviour of algorithms like ppo trpo ddpg sac td3? Will they be able to handle? Similar question for marl algorithms like mappo maddpg matrpo matd3 etc.

Is this non Stationarity due to changing dynamics? Is there any invalid action range as such. We can bound the overall range to some high low value but the range will change over episodes.

w1463442883 · 2024-09-03T12:13:50Z

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Changing action space with time/episode #3284

[Question] Changing action space with time/episode #3284

prinshul commented Sep 3, 2024 •

edited

Loading

w1463442883 commented Sep 3, 2024 via email

[Question] Changing action space with time/episode #3284

[Question] Changing action space with time/episode #3284

Comments

prinshul commented Sep 3, 2024 • edited Loading

w1463442883 commented Sep 3, 2024 via email

prinshul commented Sep 3, 2024 •

edited

Loading