-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM policies are broken for PPO1 and TRPO #140
Comments
Any hints for what exactly the masks are used for? This would help a lot! |
PPO2 works if we fix the issue with the number of minibatches. |
ACKTR is fixed on the enhancements branch, however I lost quite some time this month, and didn't have time to finish the branch (wanted to fix PPO1 and TRPO first). might try again over the next few weeks |
I am also interested in the combination of PPO1 + CnnLstmPolicy :) |
Any updates on this ? TRPO still doesn't support MlpLstmPolicy :'( |
@HareshMiriyala for now, we don't have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon. However, we appreciate contributions, especially to fix that kind of thing ;) |
Hello,
We would appreciate a PR, however, I think only @erniejunior is a bit familiar with the LSTM policy (I don't know much of that part of the code, which comes from OpenAI code and which is one of the most complex and undocumented part of the lib). |
See the
feature/fix_lstm
branch for a test which fails for the above mentioned algorithms.For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (
None
is passed). Then the ortho-initializer has issues.For PPO2 the assert in line 109 fails:
Since the PPO2 instance is created using
PPO2(policy, 'CartPole-v1')
, the default parameters of PPO2 seem to be broken somehow?For ACKTR the issue is somewhere in
get_factors
ofkfac.py
and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.The text was updated successfully, but these errors were encountered: