V3 new backend: PyTorch? and the future of Stable Baselines #733

araffin · 2020-03-08T17:20:31Z

Version3 is now online: https://github.com/DLR-RM/stable-baselines3

This issue summarizes the discussion between the maintainers (@hill-a , @erniejunior , @AdamGleave , @Miffyli and I) about the next backend and the future of stable baselines.

First, we recommend anyone to read the summary of design choices in #576

Backend Choice

This is the biggest design choice for next major version. In any case, we will drop tensorflow 1 for something else, among the candidate we have: pytorch, tensorflow 2, jax.

Maintainers opinion

The majority of the maintainers would favor PyTorch as they already work with it and the rest don't have strong feelings as they will have to switch to a new framework anyway.

As a transition, here is the final results from the poll I created some weeks ago on twitter:
Number of views: 4500
Votes: 319 (quite a lot!)
Results:

PyTorch - 69.9%
Tensorflow 2 - 13.8%
Jax - 9.4%
Does not matter - 6.9%

Disclaimer: doing a poll on Twitter restricts the audience but it's a good start

Tensorflow 2

Pros:

natural continuation from tf1 (although we don't plan to use the compat module), at least for our users
the eager mode is easy to use (especially numpy <-> tf conversion)
docs are better than tf1
native tensorboard support

Cons:

docs are better but remain messy (still three ways of writing the same thing, e.g. MSE loss)
tf.function can be tricky
early version
not sure that the tf1 community will follow, as it requires breaking changes anyway

Jax

Pros:

good design choices (e.g. to avoid side effects)
getting a lot of popularity recently
great potential
computation of higher order derivates (ex for meta RL)

Cons:

early stage of development
the eco-system is not ready yet (e.g. only experimental version of neural net lib)
none of the maintainers has experience with it, this would require more time

PyTorch

Pros:

the community/demand is growing
good documentation
good api
nice c++ frontend/ easy export
several companies switched to PyTorch (Chainer too)
I already have an internal (and working) pytorch version of Stable Baselines
the eco-system / api is now fairly stable/mature

Cons:

already a lot of library for RL using pytorch
tensorboard would be an optional dependency (because it requires tf) even tough pytorch now supports it

Side note: although the twitter poll is biased, the gap between first and second choice is striking.

Summary

As a summary, the first choice for the backend would be PyTorch for mainly two reasons:

community (most people use or want pytorch now)
2+ maintainers would favor it vs the rest being neutral

A second choice would be Jax because:

potential impact and growing community
almost equal popularity currently vs tf2

It seems that tensorflow 2 does not convince much people because it is a completely new framework (compared to tf1, even if it shares the name) but is fairly new and compared to PyTorch. It seems to have the same features but with less maturity.

Future of Stable-Baselines

PyTorch version

I currently have an internal PyTorch version of Stable Baselines, codename "Torchy Baselines" (and its zoo), that I use for my research (RL for robotics). It already has a working version of A2C, PPO, SAC and TD3.

I dropped python 3.5 support in order to use f-strings, more typing and have no issues with dicts. Python 3.5 end of life is coming soon anyway.

We agree with the other maintainers that this will be a good starting point but with some conditions:

I will remove all "research-specific" code (it will be in a separate branch)
the license should be permissive (MIT if possible)

Release date

The plan is to release an early version (and its zoo) as soon as possible (in the next two months, so before the end of April).

New name

Because of the big changes and also because it will be released under the DLR-RM team, we will update the name of the library:
Stable-Baselines3 will be its new name (so we keep the Stable Baselines name while having a different package to show the huge internal change)

V2 support

The plan (as soon as the V3 is released) would be to do only bug fixes for v2 for 6 months. We will give more details on that later.

The text was updated successfully, but these errors were encountered:

m-rph · 2020-03-24T18:28:11Z

Do you have any particular python version in mind? 3.7 introduced some quite useful typing features such as forward referencing.

araffin · 2020-03-24T18:39:08Z

it will be python 3.6+ as many users are relying on that version, even though 3.7+ are more typing features.

crobarcro · 2020-04-27T11:23:43Z

Would it be possible to publish the pytorch based version you mention? Perhaps even privately?

In return we could test against the original stable-baselines which we are using for our project. The main reason we'd like to try stable-baselines is because we think trained policy export might be easier using pytorch, and we would urgently like to try this. Happy also in return to write a model export guide for the pytorch based version if that was of interest.

Miffyli · 2020-04-27T11:28:10Z

@crobarcro

@araffin will share this once some necessary stuff is done, should not be tooooo long from now (can not give an exact date).

Meanwhile, you could take a look at this discussion on exporting models to PyTorch. This should not be too difficult, but you have to be careful with layer differences in TF and PyTorch, as well as some "default-behaviour" included in stable-baselines (like normalization of image inputs).

araffin · 2020-04-28T08:07:53Z

Would it be possible to publish the pytorch based version you mention? Perhaps even privately?

Publishing it is planned (the open source process is currently on-going), but I'm not allowed for legal reasons to share it even privately for now.

crobarcro · 2020-04-28T12:28:54Z

@araffin thanks, understood, in the mean time, the discussion @Miffyli pointed me to, with your pytorch conversion, has actually got me a long way towards ONNX export via pytorch which was my ultimate goal. I will report back with an example script once I have it streamlined.

jarlva · 2020-05-08T11:24:08Z

Hi! checking to see when it would be possible to kick the V3 pytorch tires?

araffin · 2020-05-08T11:29:08Z

@jarlva now?

https://github.com/DLR-RM/stable-baselines3

This is a beta version, I will write a roadmap to v1.0 issue soon (I'm waiting for that before making public announcement).

jarlva · 2020-05-08T11:32:03Z

Awesome Antonin!

araffin · 2021-03-02T11:29:08Z

Beta is over: https://github.com/DLR-RM/stable-baselines3/releases

araffin pinned this issue Mar 8, 2020

This was referenced Mar 8, 2020

Tensorflow 2.0 support? #366

Closed

V3.0 implementation design #576

Closed

araffin added the v3 Discussion about V3 label Mar 8, 2020

araffin mentioned this issue Mar 16, 2020

Multi Discrete for DQN #742

Closed

araffin mentioned this issue Mar 30, 2020

[Suggestion for V3] All RL algorithms should behave like current DDPG and automatically normalize input features #773

Closed

araffin mentioned this issue May 8, 2020

Roadmap to Stable-Baselines3 V1.0 DLR-RM/stable-baselines3#1

Closed

42 tasks

This was referenced Jun 9, 2020

[question]Saving Problem Stable-Baselines-Team/stable-baselines-tf2#2

Closed

MultiCategorial distribution Stable-Baselines-Team/stable-baselines-tf2#1

Closed

Comparison with tensorflow1/2 implementation DLR-RM/stable-baselines3#56

Closed

araffin mentioned this issue Aug 4, 2020

No one uses pyTorch / Torch in the Actual AI industry #966

Closed

Miffyli mentioned this issue Sep 27, 2020

Upgrade to Tensorflow 2 #1012

Open

This was referenced Oct 5, 2020

BadZipFile when running PPO2. araffin/rl-baselines-zoo#109

Closed

Library Conversion: Open AI Baselines tensorflow/tensorflow#25349

Closed

araffin mentioned this issue Jan 18, 2021

[question] EvalCallback using MPI #1069

Open

araffin closed this as completed Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V3 new backend: PyTorch? and the future of Stable Baselines #733

V3 new backend: PyTorch? and the future of Stable Baselines #733

araffin commented Mar 8, 2020 •

edited

Loading

m-rph commented Mar 24, 2020

araffin commented Mar 24, 2020

crobarcro commented Apr 27, 2020

Miffyli commented Apr 27, 2020

araffin commented Apr 28, 2020

crobarcro commented Apr 28, 2020

jarlva commented May 8, 2020

araffin commented May 8, 2020

jarlva commented May 8, 2020

araffin commented Mar 2, 2021

V3 new backend: PyTorch? and the future of Stable Baselines #733

V3 new backend: PyTorch? and the future of Stable Baselines #733

Comments

araffin commented Mar 8, 2020 • edited Loading

Backend Choice

Maintainers opinion

Tensorflow 2

Jax

PyTorch

Summary

Future of Stable-Baselines

PyTorch version

Release date

New name

V2 support

m-rph commented Mar 24, 2020

araffin commented Mar 24, 2020

crobarcro commented Apr 27, 2020

Miffyli commented Apr 27, 2020

araffin commented Apr 28, 2020

crobarcro commented Apr 28, 2020

jarlva commented May 8, 2020

araffin commented May 8, 2020

jarlva commented May 8, 2020

araffin commented Mar 2, 2021

araffin commented Mar 8, 2020 •

edited

Loading