SB2 to SB3 Migration Log

No longer using a custom policy and custom PPO implentation
No longer using custom distributions and therefore not STD clipping anymore, which should be fine.
There are no 'nminibatches' parameter any more in SB3, which might be relevant for the mirroring approaches.

Fool Python to think there is no CUDA device available with from os import environ environ["CUDA_VISIBLE_DEVICES"]=""
To avoid massive slow-down when using torch with cpu, play with the number of threads torch is using import torch torch.set_num_threads(1)

a. INFO: We found setting the number of threads to the number of parallel environments we use to collect the experiences to result in the highest training speed.

Provide feedback