-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
QoL improvements and more docs (#126)
This PR is (unfortunately) fairly massive, but is mostly renaming things, or removing things that aren't needed anymore. Considering all the renaming and breaking, bumping the version to 0.2.0 🎉 - Updated `README.md` - Added `docs/contributing.md` and `docs/getting_started.md` - Added (a link to) an introductory notebook, runnable on colab - Renamed `FlatRewards` into `ObjectProperties`, as well as related function names - Renamed `RewardScalar` into either `LogScalar` or `LinScalar`, as well as changed the function signature of functions accordingly - Removed all passing around of `rng` instances, replaced by `get_worker_rng` to remove any confusion around worker seeding - Similarly, `dev` should now be acquired via `get_worker_device` - `None`-proofed `cond_info` in models. There should now be a sensible default behavior. Eventually we should properly make sure the code support unconditional tasks - Made `cfg.git_hash` computational conditional on being in a git repo - Renamed generic functions and methods working with objects to `obj[s]` rather than `mol[s]` - Made `terminate` close log files (to avoid issues when a single process runs multiple trials) - Added a `read_all_results` function in `sqlite_log.py`. This is probably the fastest way to load log data - Fixed a number of `mypy` issues.
- Loading branch information
Showing
36 changed files
with
500 additions
and
433 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,34 +11,14 @@ GFlowNet-related training and environment code on graphs. | |
|
||
**Primer** | ||
|
||
[GFlowNet](https://yoshuabengio.org/2022/03/05/generative-flow-networks/), short for Generative Flow Network, is a novel generative modeling framework, particularly suited for discrete, combinatorial objects. Here in particular it is implemented for graph generation. | ||
GFlowNet [[1]](https://yoshuabengio.org/2022/03/05/generative-flow-networks/), [[2]](https://www.gflownet.org/), [[3]](https://github.com/zdhNarsil/Awesome-GFlowNets), short for Generative Flow Network, is a novel generative modeling framework, particularly suited for discrete, combinatorial objects. Here in particular it is implemented for graph generation. | ||
|
||
The idea behind GFN is to estimate flows in a (graph-theoretic) directed acyclic network*. The network represents all possible ways of constructing an object, and so knowing the flow gives us a policy which we can follow to sequentially construct objects. Such a sequence of partially constructed objects is a _trajectory_. *Perhaps confusingly, the _network_ in GFN refers to the state space, not a neural network architecture. | ||
The idea behind GFN is to estimate flows in a (graph-theoretic) directed acyclic network*. The network represents all possible ways of constructing objects, and so knowing the flow gives us a policy which we can follow to sequentially construct objects. Such a sequence of partially constructed objects is a _trajectory_. *Perhaps confusingly, the _network_ in GFN refers to the state space, not a neural network architecture. | ||
|
||
Here the objects we construct are themselves graphs (e.g. graphs of atoms), which are constructed node by node. To make policy predictions, we use a graph neural network. This GNN outputs per-node logits (e.g. add an atom to this atom, or add a bond between these two atoms), as well as per-graph logits (e.g. stop/"done constructing this object"). | ||
The main focus of this library (although it can do other things) is to construct graphs (e.g. graphs of atoms), which are constructed node by node. To make policy predictions, we use a graph neural network. This GNN outputs per-node logits (e.g. add an atom to this atom, or add a bond between these two atoms), as well as per-graph logits (e.g. stop/"done constructing this object"). | ||
|
||
The GNN model can be trained on a mix of existing data (offline) and self-generated data (online), the latter being obtained by querying the model sequentially to obtain trajectories. For offline data, we can easily generate trajectories since we know the end state. | ||
This library supports a variety of GFN algorithms (as well as some baselines), and supports training on a mix of existing data (offline) and self-generated data (online), the latter being obtained by querying the model sequentially to obtain trajectories. | ||
|
||
## Repo overview | ||
|
||
- [algo](src/gflownet/algo), contains GFlowNet algorithms implementations ([Trajectory Balance](https://arxiv.org/abs/2201.13259), [SubTB](https://arxiv.org/abs/2209.12782), [Flow Matching](https://arxiv.org/abs/2106.04399)), as well as some baselines. These implement how to sample trajectories from a model and compute the loss from trajectories. | ||
- [data](src/gflownet/data), contains dataset definitions, data loading and data sampling utilities. | ||
- [envs](src/gflownet/envs), contains environment classes; a graph-building environment base, and a molecular graph context class. The base environment is agnostic to what kind of graph is being made, and the context class specifies mappings from graphs to objects (e.g. molecules) and torch geometric Data. | ||
- [examples](docs/examples), contains simple example implementations of GFlowNet. | ||
- [models](src/gflownet/models), contains model definitions. | ||
- [tasks](src/gflownet/tasks), contains training code. | ||
- [qm9](src/gflownet/tasks/qm9/qm9.py), temperature-conditional molecule sampler based on QM9's HOMO-LUMO gap data as a reward. | ||
- [seh_frag](src/gflownet/tasks/seh_frag.py), reproducing Bengio et al. 2021, fragment-based molecule design targeting the sEH protein | ||
- [seh_frag_moo](src/gflownet/tasks/seh_frag_moo.py), same as the above, but with multi-objective optimization (incl. QED, SA, and molecule weight objectives). | ||
- [utils](src/gflownet/utils), contains utilities (multiprocessing, metrics, conditioning). | ||
- [`trainer.py`](src/gflownet/trainer.py), defines a general harness for training GFlowNet models. | ||
- [`online_trainer.py`](src/gflownet/online_trainer.py), defines a typical online-GFN training loop. | ||
|
||
See [implementation notes](docs/implementation_notes.md) for more. | ||
|
||
## Getting started | ||
|
||
A good place to get started is with the [sEH fragment-based MOO task](src/gflownet/tasks/seh_frag_moo.py). The file `seh_frag_moo.py` is runnable as-is (although you may want to change the default configuration in `main()`). | ||
|
||
## Installation | ||
|
||
|
@@ -62,6 +42,30 @@ pip install git+https://github.com/recursionpharma/[email protected] --find-l | |
|
||
If package dependencies seem not to work, you may need to install the exact frozen versions listed `requirements/`, i.e. `pip install -r requirements/main-3.10.txt`. | ||
|
||
## Getting started | ||
|
||
A good place to get started immediately is with the [sEH fragment-based MOO task](src/gflownet/tasks/seh_frag_moo.py). The file `seh_frag_moo.py` is runnable as-is (although you may want to change the default configuration in `main()`). | ||
|
||
For a gentler introduction to the library, see [Getting Started](docs/getting_started.md). For a more in-depth look at the library, see [Implementation Notes](docs/implementation_notes.md). | ||
|
||
## Repo overview | ||
|
||
- [algo](src/gflownet/algo), contains GFlowNet algorithms implementations ([Trajectory Balance](https://arxiv.org/abs/2201.13259), [SubTB](https://arxiv.org/abs/2209.12782), [Flow Matching](https://arxiv.org/abs/2106.04399)), as well as some baselines. These implement how to sample trajectories from a model and compute the loss from trajectories. | ||
- [data](src/gflownet/data), contains dataset definitions, data loading and data sampling utilities. | ||
- [envs](src/gflownet/envs), contains environment classes; the base environment is agnostic to what kind of graph is being made, and context classes specify mappings from graphs to objects (e.g. molecules) and torch geometric Data. | ||
- [examples](docs/examples), contains simple example implementations of GFlowNet. | ||
- [models](src/gflownet/models), contains model definitions. | ||
- [tasks](src/gflownet/tasks), contains training code. | ||
- [qm9](src/gflownet/tasks/qm9/qm9.py), temperature-conditional molecule sampler based on QM9's HOMO-LUMO gap data as a reward. | ||
- [seh_frag](src/gflownet/tasks/seh_frag.py), reproducing Bengio et al. 2021, fragment-based molecule design targeting the sEH protein | ||
- [seh_frag_moo](src/gflownet/tasks/seh_frag_moo.py), same as the above, but with multi-objective optimization (incl. QED, SA, and molecule weight objectives). | ||
- [utils](src/gflownet/utils), contains utilities (multiprocessing, metrics, conditioning). | ||
- [`trainer.py`](src/gflownet/trainer.py), defines a general harness for training GFlowNet models. | ||
- [`online_trainer.py`](src/gflownet/online_trainer.py), defines a typical online-GFN training loop. | ||
|
||
See [implementation notes](docs/implementation_notes.md) for more. | ||
|
||
|
||
## Developing & Contributing | ||
|
||
External contributions are welcome. | ||
|
@@ -73,3 +77,5 @@ pip install -e '.[dev]' --find-links https://data.pyg.org/whl/torch-2.1.2+cu121. | |
|
||
We use `tox` to run tests and linting, and `pre-commit` to run checks before committing. | ||
To ensure that these checks pass, simply run `tox -e style` and `tox run` to run linters and tests, respectively. | ||
|
||
For more information, see [Contributing](docs/contributing.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Contributing | ||
|
||
Contributions to the repository are welcome, and we encourage you to open issues and pull requests. In general, it is recommended to fork this repository and open a pull request from your fork to the `trunk` branch. PRs are encouraged to be short and focused, and to include tests and documentation where appropriate. | ||
|
||
## Installation | ||
|
||
To install the developers dependencies run: | ||
``` | ||
pip install -e '.[dev]' --find-links https://data.pyg.org/whl/torch-2.1.2+cu121.html | ||
``` | ||
|
||
## Dependencies | ||
|
||
Dependencies are defined in `pyproject.toml`, and frozen versions that are known to work are provided in `requirements/`. | ||
|
||
To regenerate the frozen versions, run `./generate_requirements.sh <ENV-NAME>`. See comments within. | ||
|
||
## Linting and testing | ||
|
||
We use `tox` to run tests and linting, and `pre-commit` to run checks before committing. | ||
To ensure that these checks pass, simply run `tox -e style` and `tox run` to run linters and tests, respectively. | ||
|
||
`tox` itself runs many linters, but the most important ones are `black`, `ruff`, `isort`, and `mypy`. The full list | ||
of linting tools is found in `.pre-commit-config.yaml`, while `tox.ini` defines the environments under which these | ||
linters (as well as tests) are run. | ||
|
||
## Github Actions | ||
|
||
We use Github Actions to run tests and linting on every push and pull request. The configuration for these actions is found in `.github/workflows/`. | ||
|
||
The cascade of events is as follows: | ||
- For `build-and-test`, `tox -> testenv:py310 -> pytest` is run. | ||
- For `code-quality`, `tox -e style -> testenv:style -> pre-commit -> {isort, black, mypy, bandit, ruff, & others}`. This and the "others" are defined in `.pre-commit-config.yaml` and include things like checking for secrets and trailing whitespace. | ||
|
||
## Style Guide | ||
|
||
On top of `black`-as-a-style-guide, we generally adhere to the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html). | ||
Our docstrings follow the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) format, and we use type hints throughout the codebase. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Getting Started | ||
|
||
For an introduction to the library, see [this colab notebook](https://colab.research.google.com/drive/1wANyo6Y-ceYEto9-p50riCsGRb_6U6eH). | ||
|
||
For an introduction to using `wandb` to log experiments, see [this demo](../src/gflownet/hyperopt/wandb_demo). | ||
|
||
For more general introductions to GFlowNets, check out the following: | ||
- The 2023 [GFlowNet workshop](https://gflownet.org/) has several introductory talks and colab tutorials. | ||
- This high-level [GFlowNet colab tutorial](https://colab.research.google.com/drive/1fUMwgu2OhYpQagpzU5mhe9_Esib3Q2VR) (updated versions of which were written for the 2023 workshop, in particular for continuous GFNs). | ||
|
||
A good place to get started immediately is with the [sEH fragment-based MOO task](src/gflownet/tasks/seh_frag_moo.py). The file `seh_frag_moo.py` is runnable as-is (although you may want to change the default configuration in `main()`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.