Releases: snap-stanford/ogb
Fix evaluation metric of ogbg-molpcba
This release is mainly for changing the evaluation metric of ogbg-molpcba
from PRC-AUC to Average Precision (AP). AP is shown to be more appropriate to summarize the non-convex nature of the Precision Recall Curve [1]. The leaderboard and our paper have been updated accordingly.
We also fix an issue and add a feature:
- Fixed an issue for saving a large library-agnostic data object. #48
- Added automatic version check feature so that users will get notified when the package version is outdated.
[1] Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. InInternational Conference on Machine Learning (ICML), pp. 233–240, 2006.
Minor fix and update
This release fixes bugs in a dataset, evaluator, and data loader.
- Duplicated edges in
ogbn-mag
are removed. The updated dataset will be downloaded and processed automatically as you run your script forogbn-mag
. #40 - Evaluators for
ogbl-collab
andogbl-ddi
are updated. Specifically,ogbl-collab
now uses Hits@50, andogbl-ddi
now uses Hits@20. - DGL data loader bug for
ogbn-mag
andogbl-biokg
is fixed. #36
Second major release
This is the second major release of OGB, in which we have curated many more exciting graph datasets, including heterogeneous graphs and a web-scale gigantic graph (100+ million nodes, 1+ billion edges).
First, we note that there is no change in the datasets released in version 1.1.1
. Therefore, any experimental results obtained using 1.1.1
on the existing datasets are compatible to version 1.2.0
.
In this new release, we have additionally released 5 new datasets listed below.
ogbn-papers100M
: Web-scale gigantic paper citation network.ogbn-mag
: Heterogeneous academic graph.ogbl-biokg
: Heterogeneous biomedical knowledge graph.ogbl-ddi
: Drug-drug interaction network.ogbg-code
: Source code Abstract Syntax Trees.
Automatic dataset update
OGB package can now automatically fetch the datasets if they have been updated.
First major release
First Major Release
This is the first major release of OGB.
A number of changes have been made to the datasets, which are summarized below.
- Re-indexed all the nodes in the node/link datasets (The graphs remain essentially the same).
- In dataset folders for all the datasets, added
mapping/
directory that contains information to map node/edge/graph/label indices to real-world entities (e.g., mapping from nodes in PPA to unique protein identifiers, mapping from molecular graphs into the SMILES strings.) - Deleted the
ogbn-proteins
node features, and put them in the species variable. - Deleted
ogbl-reviews
datasets. - Added 4 datasets:
ogbn-arxiv
,ogbl-citation
,ogbl-collab
,ogbl-wikikg
. - Renamed
ogbg-ppi
toogbg-ppa
. - Renamed
ogbg-mol-hiv
andogbg-mol-pcba
toogbg-molhiv
andogbg-molpcba
, respectively. - Changed the evaluation metric of imbalanced molecule dataset (e.g., pcba) from ROC-AUC to PRC-AUC.
- Changed the
get_split_edge()
interface inLinkPropPredDataset
. The downloaded dataset files are also changed accordingly. - Added
num_classes
attribute for multi-class classification datasets.
1.0.1
Minor Changes
OGB datasets can now be imported more conveniently, e.g.:
from ogb.graphproppred import GraphPropPredDataset
from ogb.graphproppred import PygGraphPropPredDataset
from ogb.graphproppred import DglGraphPropPredDataset
Note that this will throw an ImportError
if OGB can not find installations of Pyg or DGL, respectively.