Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The action-mapping in DeepSea-bsuite does not behave like the original DeepSea environment #77

Open
Pascal314 opened this issue May 30, 2024 · 0 comments

Comments

@Pascal314
Copy link

In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".

Currently, Gymnax's DeepSea-bsuite implementation either:

  • Uses a deterministic action mapping if deterministic is True.
  • Randomly generates an action mapping for every reset if both sample_action_map is Trueand deterministic is False.
  • Uses a default action map, which is set to be a deterministic one, when sample_action_map is False and deterministic is False.

This poses a few problems:

  1. It is not possible to use a random mapping without making the transitions stochastic.
  2. Getting the default behaviour of BSuite, i.e. a fixed random mapping, requires workarounds such as generating the mapping by hand and setting env.action_mapping, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.

I think problem 1 is just a bug: the deterministic environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.

Problem 2 could perhaps be fixed by changing the default env.action_mapping, or adding the action_mapping to env_state.

Finally, the randomize_actions environment parameter is currently unused, and it is unclear to me why the option of sample_action_map exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant