Differentiate step function ? #26

dzako · 2022-07-12T19:45:53Z

Hello,
is it possible to return the differential of the step reward function (with respect to the action) at least for the simplest envs like pendulum, cartple ?
Best, Jacek

RobertTLange · 2022-08-24T10:31:11Z

Hi @dzako, thank for your kind words and appreciation. You are right, for now obs and state are wrapped with a stop gradient operation. While I agree that this is a desirable feature for certain environments there are two main considerations:

Not all environment step transitions are differentiable. E.g. respawning in the minatar implementations is essentially a step function for certain pixel activations. Therefore this can't be a general feature.
This can have subtle (or not so) downstream effects. While one may want to differentiate through step transitions in the context of model-based RL or control/MPC/etc., this can also cause problems for standard model-free RL pipelines (using JAX grad) which assume that the environment is not "accessible".

I will see if it makes sense to add a stop_gradient option when calling gymnax.make. Let me know if you have ideas/opinions and what your particular use case could be.

carlosgmartin · 2023-03-30T03:41:47Z

I think it makes sense to remove all stop_gradients from the environments themselves, so that RL algorithms downstream have the option to use those gradients if desired.

It seems to me like it is the downstream responsibility of an RL algorithm to impose a stop_gradient if they happen to require it.

dominikstrb · 2023-10-09T12:33:19Z

I just wanted to bump this issue, because I think it would be very useful to have the ability to differentiate through dynamics and observation function. This would allow us to use gymnax for the purpose of model-based control and for explicit modeling of partially observable environments.

janakact · 2023-11-29T06:45:01Z

+1
Yeah. This would be really nice feature. Does anyone know a library that offers a differentiable step function?

dominikstrb · 2023-11-29T16:01:51Z

@janakact

Does anyone know a library that offers a differentiable step function?

Shameless self-plug: I have a package for non-linear inverse optimal control that makes use of differentiable step functions. However, the environments are custom partially-observable stochastic environments and therfore do not completely correspond to standard environments from gym.

carlosgmartin · 2023-11-29T21:41:11Z

@janakact Some libraries you might want to look into:

Not sure which of them satisfy your criterion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiate step function ? #26

Differentiate step function ? #26

dzako commented Jul 12, 2022

RobertTLange commented Aug 24, 2022

carlosgmartin commented Mar 30, 2023

dominikstrb commented Oct 9, 2023

janakact commented Nov 29, 2023

dominikstrb commented Nov 29, 2023

carlosgmartin commented Nov 29, 2023

Differentiate step function ? #26

Differentiate step function ? #26

Comments

dzako commented Jul 12, 2022

RobertTLange commented Aug 24, 2022

carlosgmartin commented Mar 30, 2023

dominikstrb commented Oct 9, 2023

janakact commented Nov 29, 2023

dominikstrb commented Nov 29, 2023

carlosgmartin commented Nov 29, 2023