You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to this validation environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy.
This seems a bit overzealous. Here are some things to consider:
gym did somewhat specify float (see), but it doesn't seem like this was enforced anywhere.
e.g. gym-minigrid can return 0 (as an integer) in some situations
gymnasium now specifies typing.SupportsFloat (see), which allows integers (and envs such as MiniGrid still use ints).
sb3 vec_envs are documented as returning float (see)
However, this is not enforced in either the VecEnvWrapper or make_vec_env, AFAICT. That means I can wrap a valid gymnasium env in a VecEnv and my venv will return integers.
The imitation doc does say the dtype of a reward in TrajectoryWithRew is float (see).
Solution
I suggest changing the validation to np.number or typing.SupportsFloat (exists since python3.5) or np.floating or np.integer and updating the doc appropriately. Granted, this is all a bit awkward, since there is no exact equivalent of typing.SupportsFloat in numpy.
If this is a desired change, I could open a PR on this.
Possible alternative solutions
If you want to support gym envs that return integer rewards but keep using floats internally, maybe we could convert integers to float at some point in the pipeline. This could for example be done in the RolloutInfoWrapper.
Currently, my workaround is a wrapper that converts all rewards to floats. Obviously not a big deal, but seems kind of unnecessary.
The text was updated successfully, but these errors were encountered:
Problem
Due to this validation environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy.
This seems a bit overzealous. Here are some things to consider:
gym-minigrid
can return0
(as an integer) in some situationstyping.SupportsFloat
(see), which allows integers (and envs such as MiniGrid still use ints).Solution
I suggest changing the validation to
np.number
ortyping.SupportsFloat
(exists since python3.5) ornp.floating or np.integer
and updating the doc appropriately. Granted, this is all a bit awkward, since there is no exact equivalent of typing.SupportsFloat in numpy.If this is a desired change, I could open a PR on this.
Possible alternative solutions
If you want to support gym envs that return integer rewards but keep using floats internally, maybe we could convert integers to float at some point in the pipeline. This could for example be done in the RolloutInfoWrapper.
Currently, my workaround is a wrapper that converts all rewards to floats. Obviously not a big deal, but seems kind of unnecessary.
The text was updated successfully, but these errors were encountered: