You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".
Uses a deterministic action mapping if deterministic is True.
Randomly generates an action mapping for every reset if both sample_action_map is Trueand deterministic is False.
Uses a default action map, which is set to be a deterministic one, when sample_action_map is False and deterministic is False.
This poses a few problems:
It is not possible to use a random mapping without making the transitions stochastic.
Getting the default behaviour of BSuite, i.e. a fixed random mapping, requires workarounds such as generating the mapping by hand and setting env.action_mapping, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.
I think problem 1 is just a bug: the deterministic environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.
Problem 2 could perhaps be fixed by changing the default env.action_mapping, or adding the action_mapping to env_state.
Finally, the randomize_actions environment parameter is currently unused, and it is unclear to me why the option of sample_action_map exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?
The text was updated successfully, but these errors were encountered:
In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".
Currently, Gymnax's DeepSea-bsuite implementation either:
deterministic
isTrue
.sample_action_map
isTrue
anddeterministic
isFalse
.sample_action_map
is False anddeterministic
isFalse
.This poses a few problems:
env.action_mapping
, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.I think problem 1 is just a bug: the
deterministic
environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.Problem 2 could perhaps be fixed by changing the default
env.action_mapping
, or adding the action_mapping toenv_state
.Finally, the
randomize_actions
environment parameter is currently unused, and it is unclear to me why the option ofsample_action_map
exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?The text was updated successfully, but these errors were encountered: