-
Notifications
You must be signed in to change notification settings - Fork 26
Perf: accelerate feature_to_block with torch_scatter #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe changes refactor Hamiltonian feature-to-block conversion logic from explicit Python loops to vectorized scatter-based operations, with supporting precomputation methods added to OrbitalMapper for index mapping. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dptb/data/interfaces/ham_to_feature.py (1)
420-437: Critical bug in self-loop block storage: wrong index logic inatom_i == atom_jcase.Lines 426-431 have inverted logic. When processing a self-loop edge (i, i, R_shift), the code checks if the reverse key (i, i, -R_shift) exists in blocks, then conditionally stores or accumulates—but always at the wrong index:
- If r_index doesn't exist: stores block at
blocks[block_index]✓- If r_index exists: stores at
blocks[r_index]instead ofblocks[block_index]✗This means the block for (i, i, R_shift) is never properly stored; it either overwrites or gets incorrectly accumulated into the opposite R_shift entry. For Hermitian systems with periodic boundaries, self-loop blocks with different shifts corrupt each other.
Fix: Treat self-loops like other edges—always store/accumulate at the current edge's block_index:
elif atom_i == atom_j: if blocks.get(block_index, None) is None: blocks[block_index] = block else: blocks[block_index] += block
🧹 Nitpick comments (4)
dptb/data/transforms.py (2)
848-905: Consider specifying device for precomputed index tensors.The precomputed index tensors are created on CPU by default. When used in
feature_to_block, they're moved to the target device on every call (lines 373-377 in ham_to_feature.py). For better performance, consider either:
- Creating these tensors on
self.deviceduring precomputation, or- Caching the device-specific versions after first use
Option 1: Create on self.device during precomputation
self._node_feature_to_block_indices[symbol] = { - 'src': torch.tensor(src_indices, dtype=torch.long), - 'dst': torch.tensor(dst_indices, dtype=torch.long), - 'dst_T': torch.tensor(dst_indices_T, dtype=torch.long), - 'is_diag': torch.tensor(is_diag, dtype=torch.bool), + 'src': torch.tensor(src_indices, dtype=torch.long, device=self.device), + 'dst': torch.tensor(dst_indices, dtype=torch.long, device=self.device), + 'dst_T': torch.tensor(dst_indices_T, dtype=torch.long, device=self.device), + 'is_diag': torch.tensor(is_diag, dtype=torch.bool, device=self.device), 'norb': norb }
907-969: Consider specifying device for precomputed index tensors.Similar to the node indices, the edge index tensors are created on CPU by default and moved to device on every call. Consider creating them on
self.deviceduring precomputation for better performance.Proposed fix
self._edge_feature_to_block_indices[bond_type] = { - 'src': torch.tensor(src_indices, dtype=torch.long), - 'dst': torch.tensor(dst_indices, dtype=torch.long), - 'scale': torch.tensor(scale_factors, dtype=torch.float32), + 'src': torch.tensor(src_indices, dtype=torch.long, device=self.device), + 'dst': torch.tensor(dst_indices, dtype=torch.long, device=self.device), + 'scale': torch.tensor(scale_factors, dtype=torch.float32, device=self.device), 'norb_i': norb_i, 'norb_j': norb_j }dptb/data/interfaces/ham_to_feature.py (2)
362-365: Optimize atom symbol lookup to avoid repeated single-element untransform calls.The current implementation calls
idp.untransform()for each atom individually within a list comprehension. Sinceuntransformsupports batch operations, you can compute all symbols in one call and then convert to chemical symbols.Proposed optimization
# Pre-compute atom symbols for all atoms (vectorized lookup) atom_types = data[_keys.ATOM_TYPE_KEY] + atomic_numbers = idp.untransform(atom_types) atom_symbols = [ - ase.data.chemical_symbols[idp.untransform(atom_types[i].reshape(-1))] + ase.data.chemical_symbols[int(atomic_numbers[i])] for i in range(len(atom_types)) ]
373-377: Index device movement happens on every call.The precomputed indices are moved to the target device on every
feature_to_blockcall for each symbol/bond type. While the comment at line 372 says "only on first use per symbol," the implementation doesn't cache device-specific versions. This could impact performance when processing multiple batches.Consider implementing device-specific caching or creating indices on the target device during precomputation (as suggested in the transforms.py review).
Also applies to: 408-412
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dptb/data/interfaces/ham_to_feature.pydptb/data/transforms.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (5)
dptb/data/transforms.py (2)
845-846: LGTM!The blank line improves readability by visually separating method definitions.
947-959: Clarify the scale factor logic and comment inconsistency.The comment on line 929 states "0.5 for diagonal pairs," but line 947 checks
is_same_basis(whether the basis pair is identical), not whether an element is on the matrix diagonal. These are not equivalent. Additionally, the edge version uses scale factors to handle symmetry, whereas the node version uses explicit transposed indices—the rationale for this design choice and the specific 0.5 factor for same basis pairs should be documented.dptb/data/interfaces/ham_to_feature.py (3)
9-9: LGTM!Correctly removed unused
anglrMIdimport after refactoring to scatter-based operations. The angular momentum handling is now encapsulated in the precomputed index methods.
323-335: LGTM!Clear and comprehensive docstring that explains the vectorized approach and performance benefits.
391-391: No issues found. The block_index format is consistent with the auto-detection logic inblock_to_feature. Line 391 writes 0-indexed blocks ([atom, atom, 0, 0, 0]), and thestart_idlogic at lines 50–58 correctly detects this format and retrieves blocks with matching indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the feature_to_block function by introducing pre-computed scatter indices for vectorized operations. The optimization replaces nested Python loops with efficient PyTorch scatter operations, improving performance when converting feature vectors to Hamiltonian/overlap block matrices.
Key changes:
- Added two new caching methods (
get_node_feature_to_block_indicesandget_edge_feature_to_block_indices) to pre-compute scatter index mappings - Refactored
feature_to_blockto use vectorized scatter operations instead of nested loops - Removed unused
anglrMIdimport fromham_to_feature.py
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| dptb/data/transforms.py | Added get_node_feature_to_block_indices and get_edge_feature_to_block_indices methods to pre-compute and cache scatter indices for vectorized block matrix construction |
| dptb/data/interfaces/ham_to_feature.py | Refactored feature_to_block function to use pre-computed scatter indices with vectorized operations; removed unused import; added comprehensive docstring |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Move indices to correct device (only on first use per symbol) | ||
| src_idx = idx_info['src'].to(device) | ||
| dst_idx = idx_info['dst'].to(device) | ||
| dst_idx_T = idx_info['dst_T'].to(device) | ||
| is_diag = idx_info['is_diag'].to(device) | ||
| norb = idx_info['norb'] |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices are being transferred to the device on every iteration, even for atoms of the same symbol. This creates redundant device transfers. Consider caching the device-transferred indices per symbol to avoid repeated transfers for atoms of the same type. The same issue exists in the edge processing loop at lines 407-412.
| # Move indices to correct device | ||
| src_idx = idx_info['src'].to(device) | ||
| dst_idx = idx_info['dst'].to(device) | ||
| scale = idx_info['scale'].to(device=device, dtype=dtype) | ||
| norb_i = idx_info['norb_i'] | ||
| norb_j = idx_info['norb_j'] |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices are being transferred to the device on every iteration, even for edges with the same bond type. This creates redundant device transfers. Consider caching the device-transferred indices per bond type to avoid repeated transfers.
| symbol_i = atom_symbols[atom_i] | ||
| symbol_j = atom_symbols[atom_j] | ||
| bond_type = f"{symbol_i}-{symbol_j}" | ||
|
|
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for KeyError when bond_type is not found in edge_indices. If a bond type exists in the data but wasn't pre-computed (e.g., due to dynamic data), this will raise an unhelpful KeyError. Consider adding a check with a descriptive error message, or ensure the pre-computation covers all possible bond types from the data.
| if bond_type not in edge_indices: | |
| available = ", ".join(sorted(map(str, edge_indices.keys()))) | |
| msg = ( | |
| f"Missing precomputed edge indices for bond type '{bond_type}'. " | |
| f"Encountered edge between atoms {atom_i} ({symbol_i}) and " | |
| f"{atom_j} ({symbol_j}) with cell shift {list(map(int, R_shift))}. " | |
| f"Available bond types in edge_indices: {available if available else 'none'}." | |
| ) | |
| log.error(msg) | |
| raise KeyError(msg) |
| block[slice_i, slice_j] = block_ij | ||
| if slice_i != slice_j: | ||
| block[slice_j, slice_i] = block_ij.T | ||
| # Move indices to correct device (only on first use per symbol) |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "only on first use per symbol" is misleading since the indices are actually transferred to the device on every iteration through the loop, not just the first time each symbol is encountered. Either the comment should be updated to reflect the actual behavior, or the code should be optimized to cache device-transferred indices per symbol as the comment suggests.
| # Move indices to correct device (only on first use per symbol) | |
| # Move indices to the correct device for this atom |
| self._edge_feature_to_block_indices[bond_type] = { | ||
| 'src': torch.tensor(src_indices, dtype=torch.long), | ||
| 'dst': torch.tensor(dst_indices, dtype=torch.long), | ||
| 'scale': torch.tensor(scale_factors, dtype=torch.float32), |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scale tensor is created with dtype=torch.float32 hardcoded, but it should match the dtype of the edge features to avoid potential type conversion issues or precision mismatches. Consider using the same dtype as the feature vectors, or at least the default dtype from the configuration.
| 'scale': torch.tensor(scale_factors, dtype=torch.float32), | |
| 'scale': torch.tensor(scale_factors, dtype=torch.get_default_dtype()), |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.