Releases: snap-stanford/ogb
Pandas 2.0 compatibility
Fix stuck import bug
ogbl-vessel and improved rank prediction
This release introduces the following two:
ogbl-vessel
dataset (described here) @jqmcginnis- Improved rank calculation for link prediction #357 @mberr
OGB-LSC dataset updates
We have included two updates:
- WikiKG90M --> WikiKG90Mv2
- PCQM4M --> PCQM4Mv2
Hosting LSC data on AWS
Thanks to the DGL Team, all the LSC data is now hosted on AWS. This significantly improves the download speed around the globe! The underlying data stays exactly the same.
Including datasets for KDD Cup 2021
This release includes the three large-scale datasets for OGB-LSC at KDD Cup 2021. Details of the datasets and the KDD Cup can be found here.
Fix download bug
The dataset downloading now uses http instead of https.
Deprecate ogbg-code and update to ogbg-code2
This version provides a major change in ogbg-code
.
ogbg-code
has been deprecated due to prediction target (i.e., method name) leakage in input AST.ogbg-code2
has been introduced that fixes the issue., where the method name and its recursive definition in AST are replaced with a special token_mask_
.
We sincerely thank Charles Sutton (@casutton) for finding the data leakage in our dataset.
Fix dataset bug, release new datasets
This release fixes the dataset bug in negative samples in ogbl-wikikg
and ogbl-citation
and releases new versions of them: ogbl-wikikg2
and ogbl-citation2
. The old versions are deprecated.
1.2.3
This release enhances the OGB package in the following ways.
- Made
ogbn-papers100M
data loading more tractable by using compressed binary files #46 - Introduced DatasetSaver module for external contributors #1
- Made dataset object compatible to DGL v0.5 (not backward compatible for heterogeneous graph datasets).