add GLEM model, TAGDataset and example of GLEM #9662

ECMGit · 2024-09-15T21:27:03Z

reopened #9591

Feature summary:

Add GLEM as GNN & LLM Co-training model to PyG
adapt GLEM's LM to AutoModelForSequenceClassification from transformers
Lora support
LM/LLM support
ogbn-products/ogbn-arxiv testing finished
TAGDataset can be used as a wrapper class for any node classification dataset in PyG with LM tokenizer and associate raw text
external prediction as pseudo labels supported

for more information, see https://pre-commit.ci

codecov · 2024-09-15T22:09:51Z

Codecov Report

Attention: Patch coverage is 11.93182% with 155 lines in your changes missing coverage. Please review.

Project coverage is 86.92%. Comparing base (ba3b906) to head (a22742c).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
torch_geometric/nn/models/glem.py	11.42%	155 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9662      +/-   ##
==========================================
- Coverage   87.54%   86.92%   -0.62%     
==========================================
  Files         482      483       +1     
  Lines       31414    31585     +171     
==========================================
- Hits        27501    27455      -46     
- Misses       3913     4130     +217

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

puririshi98

LGTM just get CI green

puririshi98 · 2024-09-24T19:28:55Z

@rusty1s @akihironitta ready for your reviews

akihironitta

Could we have type annotations all over the PR? Also, I'd suggest splitting this PR into smaller ones.

akihironitta · 2024-10-16T15:27:41Z

examples/llm/glem.py

+# Add the parent directory to sys.path
+parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
+sys.path.append(parent_dir)


Why is this necessary?

akihironitta · 2024-10-16T15:29:16Z

examples/llm/README.md

+
+## Run GLEM for getting SOTA result on ogbn-products dataset
+
+`python glem.py`


Suggested change

## Run GLEM for getting SOTA result on ogbn-products dataset

`python glem.py`

akihironitta · 2024-10-16T15:30:14Z

examples/llm/glem.py

+ ext_pred_path = download_google_url(
+ id='15sO2m7BeW7C1Upmdw3Cx1JS__6nxTAzY',
+ folder='/work/users/junhaos/glem_data/ogbn_products/ext_preds',
+ filename='giant_sagn_scr.pt', log=True)


Let's use a relative path for other people to use.

akihironitta · 2024-10-16T15:31:08Z

examples/llm/glem.py

+ pretrain_augmented = True
+
+ seed_everything(42)
+ from ogb.nodeproppred import PygNodePropPredDataset


nit: Let's move the import statement at the start of the file.

examples/llm/glem.py

akihironitta · 2024-10-16T15:46:47Z

examples/llm/glem.py

+ if em_phase == 'gnn':
+ gnn_test_acc = max(gnn_test_acc, final_test_acc)
+ model.gnn = model.gnn.to('cpu', non_blocking=True)
+ em_phase = 'lm'
+ else:
+ lm_test_acc = max(lm_test_acc, final_test_acc)
+ model.lm = model.lm.to('cpu', non_blocking=True)
+ em_phase = 'gnn'
+ torch.cuda.empty_cache()
+ print(f'Best GNN acc: {gnn_test_acc}, LM acc: {lm_test_acc}')


This is the same comment as #9467 (comment), but we shouldn't pick the best metric evaluated on the test set at the end of every EM step.

akihironitta

I haven't had a look outside the example script yet, but this addition is exciting! 🚀

ECMGit and others added 4 commits September 15, 2024 16:23

add GLEM model, TAGDataset and example of GLEM

696fe7a

[pre-commit.ci] auto fixes from pre-commit.com hooks

430c4fd

for more information, see https://pre-commit.ci

fix docstring unexpected intentation

78d9781

[pre-commit.ci] auto fixes from pre-commit.com hooks

a22742c

for more information, see https://pre-commit.ci

puririshi98 self-requested a review September 16, 2024 15:27

puririshi98 approved these changes Sep 24, 2024

View reviewed changes

Merge branch 'master' into gnn-llm-integration-glem

c74cbcb

puririshi98 marked this pull request as ready for review September 24, 2024 19:28

puririshi98 requested review from wsad1 and EdisonLeeeee as code owners September 24, 2024 19:28

puririshi98 requested review from rusty1s and akihironitta September 24, 2024 22:18

puririshi98 assigned ECMGit Sep 24, 2024

puririshi98 added feature example labels Sep 24, 2024

puririshi98 mentioned this pull request Oct 7, 2024

[Community Sprint] GNNs<>LLMs 🚀 #9694

Open

4 tasks

puririshi98 and others added 3 commits October 8, 2024 08:09

Merge branch 'master' into gnn-llm-integration-glem

599e3d8

Merge branch 'master' into gnn-llm-integration-glem

558bac9

Merge branch 'master' into gnn-llm-integration-glem

10a4e40

akihironitta reviewed Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GLEM model, TAGDataset and example of GLEM #9662

add GLEM model, TAGDataset and example of GLEM #9662

ECMGit commented Sep 15, 2024

codecov bot commented Sep 15, 2024 •

edited

Loading

puririshi98 left a comment

puririshi98 commented Sep 24, 2024

akihironitta left a comment

akihironitta Oct 16, 2024

akihironitta Oct 16, 2024

akihironitta Oct 16, 2024

akihironitta Oct 16, 2024

akihironitta Oct 16, 2024

akihironitta left a comment


		## Run GLEM for getting SOTA result on ogbn-products dataset

		`python glem.py`

add GLEM model, TAGDataset and example of GLEM #9662

Are you sure you want to change the base?

add GLEM model, TAGDataset and example of GLEM #9662

Conversation

ECMGit commented Sep 15, 2024

codecov bot commented Sep 15, 2024 • edited Loading

Codecov Report

puririshi98 left a comment

Choose a reason for hiding this comment

puririshi98 commented Sep 24, 2024

akihironitta left a comment

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

akihironitta left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 15, 2024 •

edited

Loading