Skip to content

0.1.2 Release

Compare
Choose a tag to compare
@KatarinaYuan KatarinaYuan released this 23 Oct 04:08
· 73 commits to master since this release

0.1.2 Release Notes

The recent 0.1.2 release of TorchDrug is an update on Colab tutorials, data structures, functions, datasets and bug fixes. We are grateful to see growing interests and involvement from the community, especially on the retrosynthesis task. Welcome more in the future!

  • Colab Tutorials
  • New Data Structures
  • New Functions
  • New Datasets
  • Bug Fixes

Colab Tutorials

To familiarize users with the logic and capacity of TorchDrug, we compile a full set of Colab tutorials, covering from basic usage to different drug discovery tasks. All the tutorials are fully interactive and may serve as boilerplate code for your own applications.

  • Basic Usage and Pipeline shows the manipulation of data structures like data.Graph and data.Molecule, as well as the training and evaluation pipelines for property prediction models.
  • Pretrained Molecular Representations demonstrates the steps for self-supervised pretraining of a molecular representation model and finetuning it on downstream tasks.
  • De novo Molecule Design illustrates the routine of training generative models for molecule generation and finetuning them with reinforcement learning for property optimization. Two popular models, GCPN and GraphAF, are covered in the tutorial.
  • Retrosynthesis shows how to use the state-of-the-art model, G2Gs, to predict a set reactants for synthesizing a target molecule.
  • Knowledge Graph Reasoning goes through the steps of training and evaluating models for knowledge graph completion, including both knowledge graph embeddings and neural inductive logic programming.

New Data Structures

  • A new data structure data.Dictionary that stores key-value mapping of PyTorch tensors on either CPUs or GPUs. It enjoys O(n) memory consumption and O(1) query time, and supports parallelism over batch of queries. This API provides a great opportunity for implementing sparse lookup tables or set operations in a PyTorchic style.
  • A new method data.Graph.match to efficiently retrieve all edges of specific patterns on either CPUs or GPUs. It scales linearly w.r.t. the number of patterns plus the number of retrieved edges, regardless the size of the graph. Typical usage of this method includes querying the existence of edges, generating random walks or even extracting ego graphs.

New Functions

Batching irregular structures, such as graphs, sets or sequences with different sizes, is a common demand in drug discovery. Instead of clumsy padding-based implementation, TorchDrug provides a family of functions that efficiently manipulate batch of variadic-sized tensors without padding. The update contains the following new variadic functions.

  • variadic_arange returns a 1-D tensor that contains integer intervals of variadic sizes.
  • variadic_softmax computes softmax over categories with variadic sizes.
  • variadic_sort sorts elements in sets with variadic sizes.
  • variadic_randperm returns random permutations for sets with variadic sizes, where the i-th permutation contains integers from 0 to size[i] - 1.
  • variadic_sample draws samples with replacement from sets with variadic sizes.

New Datasets

  • PCQM4M: A large-scale molecule property prediction dataset, originally used in OGB-LSC (thanks to @OPAYA )

Bug Fixes

  • Fix import of sascorer in plogp evaluation (#18, #31)
  • Fix atoms with stereo bonds in retrosynthesis (#42, #43)
  • Fix lazy construction for molecule datasets (#30, thanks to @DaShenZi721 )
  • Fix ChEMBLFiltered dataset (#36)
  • Fix ZINC2m dataset (#33)
  • Fix USPTO50k dataset (#32)
  • Fix bugs in core.Configurable (#26)
  • Fix/improve documentation (#16, #28, #41)
  • Fix installation on macOS (#29)