Skip to content

Conversation

@ccam80
Copy link

@ccam80 ccam80 commented Jan 2, 2026

PR description

This PR contains changes to the InlineWorker called during the InlineInlinables and InlineOverloads passes. The inline_ir method accepts a callee_ir FunctionIR object, which is deepcopied for safety before being mutated. The deepcopy operation is taxing in kernels with many large nested inlined device functions. Further information, including compile time results, is available in #688.

This PR contains two changes to the safety-copy portion of the inline_ir method and it's call site in inline_function.

  1. The callee_ir object was copied once to create a mutable version, then that mutable copy was copied again to preserve the original. 4279abd saves a reference to the incoming callee_ir on entry, then copies this for the mutable copy, saving one copy operation. This affords a ~40% performance improvement for the same outcome.
+       # save a reference to the unmutated input callee_ir to         # return
+       callee_ir_original = callee_ir

        # Always copy the callee IR, it gets mutated
        def copy_ir(the_ir):
            kernel_copy = the_ir.copy()
            kernel_copy.blocks = {}
            for block_label, block in the_ir.blocks.items():
                new_block = copy.deepcopy(the_ir.blocks[block_label])
                kernel_copy.blocks[block_label] = new_block
            return kernel_copy

        callee_ir = copy_ir(callee_ir)

        # check that the contents of the callee IR is something that can be
        # inlined if a validator is present
        if self.validator is not None:
            self.validator(callee_ir)

-       # save an unmutated copy of the callee_ir to return
-       callee_ir_original = copy_ir(callee_ir)
  1. In the InlineInlinables pass, inline_ir is called from the inline_function entry point. inline_function runs all untyped passes on the function to be inlined, generating a new callee_ir object per inline. The preserved unmutated callee_ir_original is returned, but never consumed. There is therefore no need to preserve the unmutated callee_ir object when entering from inline_function. 893d8a8 adds a preserve_ir=True argument to inline_ir and calls it with preserve_ir=False.

inline_ir is also called from the InlineOverload pass, which does keep and reuse callee_ir. This (and all future) entry points fall back to the default persist_ir=True argument, so the provided ir is not mutated.

 # Copy the IR if it should be preserved.
+       if preserve_ir:
            def copy_ir(the_ir):
                kernel_copy = the_ir.copy()
            kernel_copy.blocks = {}
            for block_label, block in the_ir.blocks.items():
                new_block = copy.deepcopy(the_ir.blocks[block_label])
                kernel_copy.blocks[block_label] = new_block
            return kernel_copy
            callee_ir = copy_ir(callee_ir)

Tests

No tests have been added, as I'm not sure what would change except for performance if this test regressed. All tests in the numba-cuda suite pass, as do regression tests for ir mutation in numba's test_ir_inlining suite.

Copilot AI review requested due to automatic review settings January 2, 2026 23:12
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 2, 2026

Greptile Summary

  • Optimizes InlineWorker.inline_ir method to reduce compile time overhead by eliminating unnecessary deepcopy operations during function inlining
  • Adds preserve_ir parameter to conditionally control copying behavior based on whether the original IR needs to persist
  • Modifies inline_function to call inline_ir with preserve_ir=False since the preserved IR is never consumed in that context

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/core/inline_closurecall.py Adds optional preserve_ir parameter to inline_ir method and reorganizes copying logic to avoid duplicate operations

Confidence score: 3/5

  • This PR has moderate risk due to changes in critical compilation infrastructure that could affect inlining correctness
  • Score reflects concerns about replacing copy.deepcopy() with Block.copy() which may not preserve all statement state and the lack of tests to verify correctness
  • Pay close attention to the copying logic changes and ensure thorough testing before merging

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/core/inline_closurecall.py, line 357-365 (link)

    style: Commented-out code should be removed before merging or a clear decision made about its necessity. Are you planning to remove this commented code block or is it intentionally left for further testing?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP PR aims to reduce copy overhead in CUDA inline passes by optimizing how the callee_ir object is copied during inlining operations. The changes focus on eliminating redundant copy operations and making copying conditional based on whether the original IR needs to be preserved.

Key Changes

  • Reordered logic to save a reference to the original callee_ir before copying, eliminating one redundant copy operation
  • Replaced copy.deepcopy() with Block.copy() for a lighter-weight copying approach
  • Added a preserve_ir parameter to make copying conditional, with preserve_ir=False when calling from inline_function where the callee_ir is freshly generated

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 4, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@ccam80
Copy link
Author

ccam80 commented Jan 4, 2026

After finding some history on this issue in the core Numba repo, I have removed a replacement of the deepcopy() call with a shallow-copy operation. The PR now passes regression tests in Numba's test_ir_inlining.py. I have updated the PR description to match current behaviour.

@ccam80 ccam80 changed the title [WIP] perf: Reduce copy overhead in inline passes perf: Reduce copy overhead in inline passes Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant