You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case of continuous action space problem, we could use PPO\A2C algorithm to predict continuous aciton, but I want to custom softmax as my output activation function with net_arch=[256,256]. I have read and test the tutorial post. When I test the code below, I found the action is not sum up to one. the softmax function don't work. I found that the action_net in mode.policy, but I could not use softmax as the custom activation function.
In the case of continuous action space problem, we could use PPO\A2C algorithm to predict continuous aciton, but I want to custom softmax as my output activation function with net_arch=[256,256]. I have read and test the tutorial post. When I test the code below, I found the action is not sum up to one. the softmax function don't work. I found that the action_net in
mode.policy
, but I could not usesoftmax
as the custom activation function.How to use
softmax
as customized activation function of the action output layer?The text was updated successfully, but these errors were encountered: