Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption of VoxelNet limits the number of muons and voxels that can be used #97

Open
GilesStrong opened this issue Mar 25, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request Functionality Issue adds to the functionality of the package medium priority Should be fixed soon, but doesn't disastrously impact project

Comments

@GilesStrong
Copy link
Owner

Problem

VoxelNet acts on tensors of the size (volumes, voxels, muons, features) and as part of its graph construction expands these into (volumes, voxels, muons, muons, new features) before collapsing back to the original shape. Although the forward method runs a loop over the volumes, (so the actual shape is just (voxels, muons, features)), the memory consumption is still very high.

Potential solutions

Loop over voxels in the first part of the network

The first part of the network computes a muon representation per voxel (voxels, muon representation) and this computation is performed irrespective of the other voxels. Meaning that the muon reps. could be computed serially rather than in parallel. This reduces the memory consumption at the cost of processing time.

Compile parts of the network

PyTorch makes it "easy" to compile parts of the network in c++ and CUDA. According to Jan Kieseler this heavily reduces memory consumption and processing time at the cost of development time and model flexibility. He has sent me some examples, and I have also gone through the official PyTorch tutorial for writing and compiling kernels. The main difficulty is that the backwards pass to compute the gradients must also be written manually, and the optimality of the writing of this can have a heavy impact on performance: in my testing of PyTorch's examples, the backwards pass was actually slower when compiled, but the forwards pass was slightly quicker.

There are several parts of the GNN that care candidates for compilation:

  • When expanding to the (voxels, muons, muons, new features) tensor, what we actually want is (voxels, muons, k-nearest muons, new features), but we still have to compute the distances between all the muons. This kNN indexing could be compiled to reduce memory consumption.
  • (voxels, muons, k-nearest muons, new features) is then later collapsed by aggregating across the k-nearest muons into (voxels, muons, aggregate features). The whole kNN+aggregation could be compiled to save even more memory, at the cost of some model flexibility.
  • In the graph collapse stage where we convert (voxels, muons, aggregate features) to (voxels, muon representation), the muon features go through a self-attention step which internally computes a temporary (voxels, muons, muons) tensor. This could also be compiled to save memory.
@GilesStrong GilesStrong added enhancement New feature or request medium priority Should be fixed soon, but doesn't disastrously impact project Functionality Issue adds to the functionality of the package labels Mar 25, 2022
@GilesStrong GilesStrong self-assigned this Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Functionality Issue adds to the functionality of the package medium priority Should be fixed soon, but doesn't disastrously impact project
Projects
None yet
Development

No branches or pull requests

1 participant