0.1.2 Release
0.1.2 Release Notes
The recent 0.1.2 release of TorchDrug is an update on Colab tutorials, data structures, functions, datasets and bug fixes. We are grateful to see growing interests and involvement from the community, especially on the retrosynthesis task. Welcome more in the future!
- Colab Tutorials
- New Data Structures
- New Functions
- New Datasets
- Bug Fixes
Colab Tutorials
To familiarize users with the logic and capacity of TorchDrug, we compile a full set of Colab tutorials, covering from basic usage to different drug discovery tasks. All the tutorials are fully interactive and may serve as boilerplate code for your own applications.
- Basic Usage and Pipeline shows the manipulation of data structures like
data.Graph
anddata.Molecule
, as well as the training and evaluation pipelines for property prediction models. - Pretrained Molecular Representations demonstrates the steps for self-supervised pretraining of a molecular representation model and finetuning it on downstream tasks.
- De novo Molecule Design illustrates the routine of training generative models for molecule generation and finetuning them with reinforcement learning for property optimization. Two popular models, GCPN and GraphAF, are covered in the tutorial.
- Retrosynthesis shows how to use the state-of-the-art model, G2Gs, to predict a set reactants for synthesizing a target molecule.
- Knowledge Graph Reasoning goes through the steps of training and evaluating models for knowledge graph completion, including both knowledge graph embeddings and neural inductive logic programming.
New Data Structures
- A new data structure
data.Dictionary
that stores key-value mapping of PyTorch tensors on either CPUs or GPUs. It enjoys O(n) memory consumption and O(1) query time, and supports parallelism over batch of queries. This API provides a great opportunity for implementing sparse lookup tables or set operations in a PyTorchic style. - A new method
data.Graph.match
to efficiently retrieve all edges of specific patterns on either CPUs or GPUs. It scales linearly w.r.t. the number of patterns plus the number of retrieved edges, regardless the size of the graph. Typical usage of this method includes querying the existence of edges, generating random walks or even extracting ego graphs.
New Functions
Batching irregular structures, such as graphs, sets or sequences with different sizes, is a common demand in drug discovery. Instead of clumsy padding-based implementation, TorchDrug provides a family of functions that efficiently manipulate batch of variadic-sized tensors without padding. The update contains the following new variadic functions.
variadic_arange
returns a 1-D tensor that contains integer intervals of variadic sizes.variadic_softmax
computes softmax over categories with variadic sizes.variadic_sort
sorts elements in sets with variadic sizes.variadic_randperm
returns random permutations for sets with variadic sizes, where thei
-th permutation contains integers from 0 tosize[i] - 1
.variadic_sample
draws samples with replacement from sets with variadic sizes.
New Datasets
- PCQM4M: A large-scale molecule property prediction dataset, originally used in OGB-LSC (thanks to @OPAYA )
Bug Fixes
- Fix import of sascorer in plogp evaluation (#18, #31)
- Fix atoms with stereo bonds in retrosynthesis (#42, #43)
- Fix lazy construction for molecule datasets (#30, thanks to @DaShenZi721 )
- Fix ChEMBLFiltered dataset (#36)
- Fix ZINC2m dataset (#33)
- Fix USPTO50k dataset (#32)
- Fix bugs in core.Configurable (#26)
- Fix/improve documentation (#16, #28, #41)
- Fix installation on macOS (#29)