Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM policies are broken for PPO1 and TRPO #140

Open
ernestum opened this issue Dec 20, 2018 · 10 comments
Open

LSTM policies are broken for PPO1 and TRPO #140

ernestum opened this issue Dec 20, 2018 · 10 comments
Labels
bug Something isn't working help wanted Help from contributors is needed

Comments

@ernestum
Copy link
Collaborator

See the feature/fix_lstm branch for a test which fails for the above mentioned algorithms.

For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (None is passed). Then the ortho-initializer has issues.

For PPO2 the assert in line 109 fails:

assert self.n_envs % self.nminibatches == 0, "For recurrent policies, "\
                        "the number of environments run in parallel should be a multiple of nminibatches."

Since the PPO2 instance is created using PPO2(policy, 'CartPole-v1'), the default parameters of PPO2 seem to be broken somehow?

For ACKTR the issue is somewhere in get_factors of kfac.py and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.

@ernestum ernestum added bug Something isn't working help wanted Help from contributors is needed labels Dec 20, 2018
@araffin
Copy link
Collaborator

araffin commented Dec 20, 2018

Related to #60 #70

@ernestum
Copy link
Collaborator Author

Any hints for what exactly the masks are used for? This would help a lot!

@araffin
Copy link
Collaborator

araffin commented Dec 20, 2018

I did not look closely at the code, but I remember seeing masks and dones interchangeably in the code (see here fir ppo2).
Then it used in the policies and especially by lstm to reset states for new episodes.

@ernestum ernestum changed the title LSTM policies are broken for ACKTR, PPO1, PPO2 and TRPO LSTM policies are broken for ACKTR, PPO1 and TRPO Dec 20, 2018
@ernestum
Copy link
Collaborator Author

PPO2 works if we fix the issue with the number of minibatches.

@hill-a
Copy link
Owner

hill-a commented Dec 21, 2018

ACKTR is fixed on the enhancements branch, however I lost quite some time this month, and didn't have time to finish the branch (wanted to fix PPO1 and TRPO first). might try again over the next few weeks

@RGring
Copy link

RGring commented Dec 25, 2018

I am also interested in the combination of PPO1 + CnnLstmPolicy :)

@araffin araffin changed the title LSTM policies are broken for ACKTR, PPO1 and TRPO LSTM policies are broken for PPO1 and TRPO Feb 9, 2019
@HareshKarnan
Copy link

Any updates on this ? TRPO still doesn't support MlpLstmPolicy :'(

@araffin
Copy link
Collaborator

araffin commented Mar 6, 2019

@HareshMiriyala for now, we don't have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.

However, we appreciate contributions, especially to fix that kind of thing ;)

@hejujie
Copy link

hejujie commented Dec 13, 2019

I also mentioned this problem last year in issue #60. It seem the problem is not fixed up to now, is there any plan ?
I would like to fix the problem in this month, is there any comment or advice? @araffin @hill-a @erniejunior

@araffin
Copy link
Collaborator

araffin commented Dec 15, 2019

It seem the problem is not fixed up to now, is there any plan ?

Hello,
There is not plan for now, we are now focusing on the migration to tf2 (cf #366 ).

I would like to fix the problem in this month, is there any comment or advice?

We would appreciate a PR, however, I think only @erniejunior is a bit familiar with the LSTM policy (I don't know much of that part of the code, which comes from OpenAI code and which is one of the most complex and undocumented part of the lib).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Help from contributors is needed
Projects
None yet
Development

No branches or pull requests

6 participants