StickyAction wrapper can repeat the old action for more than 1 step #1240

sparisi · 2024-11-07T09:46:25Z

The StickyAction wrapper now takes an optional argument to allow the old action to be repeated for more than 1 steps (default is 1). The original behavior is unchanged.

Description

No fix, this is an extra feature. It increases the difficulty of sticky actions and the "non-Markovianity" of the environment (the more steps the action is repeated, the more in the past the agent should look to predict the next state). It can be useful for RL of non-Markov decision processes.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Optional argument that allows to repeat the old action for more than 1 step.

test for repeated action for n steps

pseudo-rnd-thoughts · 2024-11-07T15:08:50Z

@sparisi If I understand correctly, this will limit the number of repeat actions that can be taken.
If this code is paired with a frame skip then this will allow the action to be repeated.
Maybe we should add that here

sparisi · 2024-11-07T16:03:59Z

@sparisi If I understand correctly, this will limit the number of repeat actions that can be taken. If this code is paired with a frame skip then this will allow the action to be repeated. Maybe we should add that here

I wouldn't say it "limits" the repeats. There is no limit to the repeats per episode.
Instead of repeating the action for just 1 steps, it allows to repeat it for more steps for sure: if repeats are triggered, they will always be repeated for n steps (not just 1).
Right now, at every step, there is a chance to repeat the past action. Now, you can also specify the duration of the repeats: if the random() triggers the if, the same action will be repeated for n steps, no matter what. After n steps, random() is called again and the agent has the chance to either do its action or to trigger another repeat.

Example.

StickyAction(env, 0.5, 4) means that with 50% probability, the agent will execute the old action for 4 steps (not just 1, as it was before).
An example of a trajectory would be:

Agent does action 1
Agent does action 2
Agent wants to do action 0 but random() triggers the repeat ---- action 2 will be done
Action 2 (regardless of what the agent tries)
Action 2 again
Action 2 again ----- last of the 4 steps
Agent wants to do action 0 --- random() is checked again and a new repeat sequence may start or not

The original behavior (n = 1) is still possible. If n = 1 and the repeat is triggered, then self.is_repeating = True (used to remember to keep doing repeats) and self.last_action_repeats ) = 1, but then the next if is triggered and resets self.is_repeating = False. So, at the next step the agent may or may not repeat again (just as in the original wrapper).

pseudo-rnd-thoughts · 2024-11-08T14:22:41Z

Ahh I understand better now thanks for the description
Originally, the implement was just a single sticky action step where, given a probability, there was a chance of taking the last action.
In Atari, this is paired with FrameSkip, i.e., the agent takes X steps with a 0.25% of taking the last action taken.

ALE implements two options for frameskip of a deterministic, v5 with 4 frames and v0 randomly between 2 and 5.

This PR is added frameskip for a random value up to X which is equivalent to the v0 style randomly between 1 and X.
Am I understanding correctly?
If I am, I would change the implementation to the ALE style implement with either a deterministic number of frames, int, or a stochastic number of frames with tuple[int, int]

sparisi · 2024-11-08T19:54:23Z

@pseudo-rnd-thoughts
I see.
I think the difference with FrameSkip is that the agent, when actions are repeated, does still try to execute an action. With FrameSkip, instead, the agent is not aware of the repeats.
The main reason I made this wrapper is to study how non-Markovianity affects classic RL.

"This PR is added frameskip for a random value up to X"
Actually, the number of steps the action is repeated is deterministic. What's random is when the action starts being repeated (the randomness in StickyAction).

Now I have added the possibility to have stochastic repeats within a range. When the agent starts a series of repeats, the duration is randomly determined within the range passed as argument to the wrapper.

pseudo-rnd-thoughts · 2024-11-13T13:10:01Z

@sparisi I came back the PR after a couple of days and hopefully better understand what you are talking about with the difference with this vs frameskip.
I've updated the code to change the variable names and added a test that checks the expected actions match the executed actions.
Could you check the code, then I would be happy to merge

sparisi · 2024-11-13T21:40:09Z

@pseudo-rnd-thoughts
Checked and tried on an environment, all good. Thanks!

sparisi added 3 commits November 7, 2024 01:43

sticky actions for more than 1 steps

407ddd1

Optional argument that allows to repeat the old action for more than 1 step.

Update test_sticky_action.py

744790e

test for repeated action for n steps

tests + docstring pre-commit

d219ee3

sticky actions can have random duration within a range

dbd47a3

sparisi and others added 3 commits November 8, 2024 12:56

pre-commit

d511fbb

Update stateful_action.py

9e3c807

Update test_sticky_action.py

af522ef

pseudo-rnd-thoughts approved these changes Nov 14, 2024

View reviewed changes

pseudo-rnd-thoughts merged commit ebe70a1 into Farama-Foundation:main Nov 14, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StickyAction wrapper can repeat the old action for more than 1 step #1240

StickyAction wrapper can repeat the old action for more than 1 step #1240

sparisi commented Nov 7, 2024

pseudo-rnd-thoughts commented Nov 7, 2024 •

edited

Loading

sparisi commented Nov 7, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Nov 8, 2024 •

edited

Loading

sparisi commented Nov 8, 2024

pseudo-rnd-thoughts commented Nov 13, 2024

sparisi commented Nov 13, 2024

StickyAction wrapper can repeat the old action for more than 1 step #1240

StickyAction wrapper can repeat the old action for more than 1 step #1240

Conversation

sparisi commented Nov 7, 2024

Description

Type of change

Checklist:

pseudo-rnd-thoughts commented Nov 7, 2024 • edited Loading

sparisi commented Nov 7, 2024 • edited Loading

pseudo-rnd-thoughts commented Nov 8, 2024 • edited Loading

sparisi commented Nov 8, 2024

pseudo-rnd-thoughts commented Nov 13, 2024

sparisi commented Nov 13, 2024

pseudo-rnd-thoughts commented Nov 7, 2024 •

edited

Loading

sparisi commented Nov 7, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Nov 8, 2024 •

edited

Loading