Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StickyAction wrapper can repeat the old action for more than 1 step #1240

Merged
merged 7 commits into from
Nov 14, 2024

Conversation

sparisi
Copy link
Contributor

@sparisi sparisi commented Nov 7, 2024

The StickyAction wrapper now takes an optional argument to allow the old action to be repeated for more than 1 steps (default is 1). The original behavior is unchanged.

Description

No fix, this is an extra feature. It increases the difficulty of sticky actions and the "non-Markovianity" of the environment (the more steps the action is repeated, the more in the past the agent should look to predict the next state). It can be useful for RL of non-Markov decision processes.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Optional argument that allows to repeat the old action for more than 1 step.
test for repeated action for n steps
@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 7, 2024

@sparisi If I understand correctly, this will limit the number of repeat actions that can be taken.
If this code is paired with a frame skip then this will allow the action to be repeated.
Maybe we should add that here

@sparisi
Copy link
Contributor Author

sparisi commented Nov 7, 2024

@sparisi If I understand correctly, this will limit the number of repeat actions that can be taken. If this code is paired with a frame skip then this will allow the action to be repeated. Maybe we should add that here

I wouldn't say it "limits" the repeats. There is no limit to the repeats per episode.
Instead of repeating the action for just 1 steps, it allows to repeat it for more steps for sure: if repeats are triggered, they will always be repeated for n steps (not just 1).
Right now, at every step, there is a chance to repeat the past action. Now, you can also specify the duration of the repeats: if the random() triggers the if, the same action will be repeated for n steps, no matter what. After n steps, random() is called again and the agent has the chance to either do its action or to trigger another repeat.

Example.

StickyAction(env, 0.5, 4) means that with 50% probability, the agent will execute the old action for 4 steps (not just 1, as it was before).
An example of a trajectory would be:

  • Agent does action 1
  • Agent does action 2
  • Agent wants to do action 0 but random() triggers the repeat ---- action 2 will be done
  • Action 2 (regardless of what the agent tries)
  • Action 2 again
  • Action 2 again ----- last of the 4 steps
  • Agent wants to do action 0 --- random() is checked again and a new repeat sequence may start or not

The original behavior (n = 1) is still possible. If n = 1 and the repeat is triggered, then self.is_repeating = True (used to remember to keep doing repeats) and self.last_action_repeats ) = 1, but then the next if is triggered and resets self.is_repeating = False. So, at the next step the agent may or may not repeat again (just as in the original wrapper).

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 8, 2024

Ahh I understand better now thanks for the description
Originally, the implement was just a single sticky action step where, given a probability, there was a chance of taking the last action.
In Atari, this is paired with FrameSkip, i.e., the agent takes X steps with a 0.25% of taking the last action taken.

ALE implements two options for frameskip of a deterministic, v5 with 4 frames and v0 randomly between 2 and 5.

This PR is added frameskip for a random value up to X which is equivalent to the v0 style randomly between 1 and X.
Am I understanding correctly?
If I am, I would change the implementation to the ALE style implement with either a deterministic number of frames, int, or a stochastic number of frames with tuple[int, int]

@sparisi
Copy link
Contributor Author

sparisi commented Nov 8, 2024

@pseudo-rnd-thoughts
I see.
I think the difference with FrameSkip is that the agent, when actions are repeated, does still try to execute an action. With FrameSkip, instead, the agent is not aware of the repeats.
The main reason I made this wrapper is to study how non-Markovianity affects classic RL.

"This PR is added frameskip for a random value up to X"
Actually, the number of steps the action is repeated is deterministic. What's random is when the action starts being repeated (the randomness in StickyAction).

Now I have added the possibility to have stochastic repeats within a range. When the agent starts a series of repeats, the duration is randomly determined within the range passed as argument to the wrapper.

@pseudo-rnd-thoughts
Copy link
Member

@sparisi I came back the PR after a couple of days and hopefully better understand what you are talking about with the difference with this vs frameskip.
I've updated the code to change the variable names and added a test that checks the expected actions match the executed actions.
Could you check the code, then I would be happy to merge

@sparisi
Copy link
Contributor Author

sparisi commented Nov 13, 2024

@pseudo-rnd-thoughts
Checked and tried on an environment, all good. Thanks!

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit ebe70a1 into Farama-Foundation:main Nov 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants