Not enough memory for large graph #94

RomanLoop · 2023-12-04T10:08:18Z

RomanLoop
Dec 4, 2023

Hi

I have created a custom dataset for quite a large graph. The graph contains 307,925 nodes with 22 attributes and 1,045,286 edges

from torch_geometric.data import Data
data = Data(x=x, edge_index=edges)
data

Output:
Data(x=[307925, 22], edge_index=[2, 1045286])

I initiated a DOMINANT detector and tried to fit it, when I ran into a RunTimeError.

detector = DOMINANT(hid_dim=64, num_layers=2)
detector.fit(data)

Error trace:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[60], line 4
      1 detector = DOMINANT(hid_dim=64, num_layers=2) #379271222500 
      2 # detector = DOMINANT(hid_dim=16, num_layers=3, epoch=100)
----> 4 detector.fit(data)

File [c:\Users\rol\Desktop\master-thesis\.venv\lib\site-packages\pygod\detector\base.py:432](file:///C:/Users/rol/Desktop/master-thesis/.venv/lib/site-packages/pygod/detector/base.py:432), in DeepDetector.fit(self, data, label)
    429 def fit(self, data, label=None):
    431     self.num_nodes, self.in_dim = data.x.shape
--> 432     self.process_graph(data)
    433     if self.batch_size == 0:
    434         self.batch_size = data.x.shape[0]

File [c:\Users\rol\Desktop\master-thesis\.venv\lib\site-packages\pygod\detector\dominant.py:139](file:///C:/Users/rol/Desktop/master-thesis/.venv/lib/site-packages/pygod/detector/dominant.py:139), in DOMINANT.process_graph(self, data)
    138 def process_graph(self, data):
--> 139     DOMINANTBase.process_graph(data)

File [c:\Users\rol\Desktop\master-thesis\.venv\lib\site-packages\pygod\nn\dominant.py:132](file:///C:/Users/rol/Desktop/master-thesis/.venv/lib/site-packages/pygod/nn/dominant.py:132), in DOMINANTBase.process_graph(data)
    122 @staticmethod
    123 def process_graph(data):
    124     """
    125     Obtain the dense adjacency matrix of the graph.
    126 
   (...)
...
---> 70     return src.new_zeros(size).scatter_add_(dim, index, src)
     72 if reduce == 'mean':
     73     count = src.new_zeros(dim_size)

RuntimeError: [enforce fail at alloc_cpu.cpp:80] data. DefaultCPUAllocator: not enough memory: you tried to allocate 379271222500 bytes.

The problem seems to be, that the process_graph method in the DOMINANTBase class transforms the sparse edge matrix to a dense adjecency matrix, which requires in my case ~353GB memory.

Is there an option to avoid converting the edge matrix into a dense adj matrix?

Thanks for your support.

Answered by kayzliu

Dec 10, 2023

Thanks for your question. DOMINANT requires the reconstruction of the dense adjacency matrix, which is hard to apply to large graphs directly. One potential solution is to partition large graphs into small subgraphs with torch_geometric.loader.ClusterLoader and feed the small subgraphs into DOMINANT one by one.

View full answer

kayzliu · 2023-12-10T00:26:08Z

kayzliu
Dec 10, 2023
Maintainer

Thanks for your question. DOMINANT requires the reconstruction of the dense adjacency matrix, which is hard to apply to large graphs directly. One potential solution is to partition large graphs into small subgraphs with torch_geometric.loader.ClusterLoader and feed the small subgraphs into DOMINANT one by one.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not enough memory for large graph #94

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Not enough memory for large graph #94

RomanLoop Dec 4, 2023

Replies: 1 comment

kayzliu Dec 10, 2023 Maintainer

RomanLoop
Dec 4, 2023

kayzliu
Dec 10, 2023
Maintainer