Skip to content

Conversation

@Ubospica
Copy link
Collaborator

@Ubospica Ubospica commented Jan 11, 2026

This PR adds NCU tool to profile solutions. This can also be called by LLMs to build agents.

Signed-off-by: Ubospica ubospica@gmail.com

Summary by CodeRabbit

Release Notes

  • New Features

    • Integrated Nsight Compute (NCU) profiling support with configurable analysis options and filtering capabilities.
    • Added AI agent tool schema generation for seamless integration with language models in kernel development workflows.
    • New utilities for discovering profiling options and managing profiling workflows.
  • Dependencies

    • Added docstring-parser library support.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Ubospica <ubospica@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 11, 2026

Warning

Rate limit exceeded

@Ubospica has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c2ba5ef and e12b2e8.

📒 Files selected for processing (4)
  • docs/api/index.md
  • docs/api/rst/agents.md
  • flashinfer_bench/agents/ffi_prompt.py
  • flashinfer_bench/agents/ncu.py

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Introduces agent tools for LLM-based kernel development and debugging, including an Nsight Compute profiling workflow, solution runner script, and OpenAI/Anthropic-compatible tool schema generation. Adds module exports, input validation, command construction, subprocess execution, and type-aware schema generation with docstring parsing support.

Changes

Cohort / File(s) Summary
Module Initialization
flashinfer_bench/agents/__init__.py
Adds module docstring, converts to explicit relative imports, exports new public functions (flashinfer_bench_list_ncu_options, flashinfer_bench_run_ncu, function_to_schema, get_all_tool_schemas) and constants (FFI_PROMPT, FFI_PROMPT_SIMPLE).
NCU Profiling Orchestration
flashinfer_bench/agents/ncu.py
Implements Nsight Compute profiling workflow: validates inputs, generates JSON payloads, builds NCU commands, executes subprocess with timeout, handles output truncation. Public functions: flashinfer_bench_run_ncu() and flashinfer_bench_list_ncu_options(). Introduces VALID_PAGES constant.
Solution Runner Script
flashinfer_bench/agents/_solution_runner.py
CLI script invoked by NCU profiler; loads Definition/Solution/Workload from JSON, builds executable solution, generates/allocates tensors, executes warmup and profiling runs within NVTX range.
Schema Generation
flashinfer_bench/agents/schema.py
Generates OpenAI/Anthropic-compatible tool schemas from Python functions: converts type hints to JSON Schema (supports Optional, Union, List, Dict, Pydantic models), extracts docstring descriptions, assembles parameter definitions. Public functions: function_to_schema() and get_all_tool_schemas().
Dependencies
pyproject.toml
Adds runtime dependency: docstring-parser>=0.16.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as LLM Agent
    participant Runner as flashinfer_bench_run_ncu()
    participant NCU as Nsight Compute
    participant Script as _solution_runner.py
    participant Device as GPU Device
    
    Agent->>Runner: Call with solution & workload
    Runner->>Runner: Validate inputs & pages
    Runner->>Runner: Create temp directory
    Runner->>Runner: Write JSON payloads
    Runner->>Runner: Build NCU command
    Runner->>NCU: Execute profiling command
    NCU->>Script: Invoke solution runner
    Script->>Script: Load Definition/Solution/Workload
    Script->>Script: Build executable solution
    Script->>Script: Generate input tensors
    Script->>Device: Warmup run (JIT compile)
    Script->>Device: NVTX profiling run
    Script->>Device: CUDA synchronize
    Script-->>NCU: Return results
    NCU-->>Runner: Profiling output
    Runner->>Runner: Truncate output (if needed)
    Runner-->>Agent: Return profiling report
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • YiyanZhai

Poem

🐰 Hops of joy through profiling paths,
With schemas stitched from docstrings' craft,
NCU commands in tempfiles stored,
Solution runners on GPU soared! ✨
LLM agents now debug with glee. 🎯

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support NCU tool' directly matches the PR's primary objective to add and expose the NCU tool for profiling solutions.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Ubospica, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates NVIDIA Nsight Compute (NCU) profiling capabilities into the flashinfer_bench framework. It enables users to profile the performance of defined solutions and workloads using NCU, facilitating performance analysis and optimization. The changes include a new agent tool for orchestrating NCU runs, a standalone runner script, and an environment variable for configuring the NCU executable path.

Highlights

  • NCU Tool Integration: Introduces a new agent tool to support NVIDIA Nsight Compute (NCU) profiling for solutions within the framework.
  • Standalone NCU Runner: Adds _ncu_runner.py, a new script designed to execute solutions in a controlled environment specifically for NCU profiling.
  • Programmatic Profiling API: Provides the run_ncu function in ncu.py for programmatically invoking NCU with various configurable options such as profiling sets, sections, kernel filters, and timeouts.
  • Configurable NCU Path: Adds get_fib_ncu_path in env.py to allow specifying the NCU executable path via an environment variable (FIB_NCU_PATH), defaulting to 'ncu' if not set.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for NVIDIA's Nsight Compute (NCU) profiler, allowing for detailed performance analysis of kernels. This is a significant addition, providing a new run_ncu tool within the flashinfer_bench.agents module. The implementation includes a dedicated runner script (_ncu_runner.py) that is invoked by the main ncu.py module, which constructs the NCU command and handles its execution.

My review focuses on improving the maintainability and robustness of the new NCU integration. I've suggested renaming a parameter that shadows a Python built-in and proposed a significant improvement to the error handling logic to ensure that profiling failures are easier to debug. Overall, the changes are well-structured and provide a valuable new capability.

solution: Solution,
workload: Workload,
*,
set: str = "detailed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter set shadows the built-in set type in Python. It's a best practice to avoid using names of built-ins for variables. I suggest renaming it to ncu_set for clarity and to prevent potential issues.

This change should be applied consistently in the following places:

  • run_ncu signature (this line)
  • run_ncu body (lines 161, 196)
  • _build_ncu_command signature (line 50)
  • _build_ncu_command body (line 58)
Suggested change
set: str = "detailed",
ncu_set: str = "detailed",

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 16-45: VALID_SETS is missing the "source" entry causing incorrect
validation; update the VALID_SETS set in agents/ncu.py to include the string
"source" (i.e., add "source" to the VALID_SETS definition) and optionally add a
comment noting NCU version dependency or replace static validation with a
runtime query via `ncu --list-sets`/`ncu --list-sections` if you prefer dynamic
validation; ensure references to VALID_SETS elsewhere in the module remain
unchanged.
🧹 Nitpick comments (2)
flashinfer_bench/agents/ncu.py (2)

80-89: Consider removing leading newline in truncation message.

The truncation message on line 88 includes a leading \n, which may add an unintended blank line. Consider:

-    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]")

214-217: Clarify error handling logic.

The error check at lines 214-217 tests for errors but then does nothing (pass). The comment "Keep error messages visible" suggests the intent is to return the output even on error, but this could be more explicit:

-    # Check for errors
-    if result.returncode != 0 and "==ERROR==" in output:
-        # Keep error messages visible
-        pass
+    # Note: Return output even on error to show NCU error messages to the user

Alternatively, if specific error handling is needed, implement it here.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bb3ecc and ae742ee.

📒 Files selected for processing (4)
  • flashinfer_bench/agents/__init__.py
  • flashinfer_bench/agents/_ncu_runner.py
  • flashinfer_bench/agents/ncu.py
  • flashinfer_bench/env.py
🧰 Additional context used
🧬 Code graph analysis (2)
flashinfer_bench/agents/__init__.py (1)
flashinfer_bench/agents/ncu.py (1)
  • run_ncu (92-223)
flashinfer_bench/agents/ncu.py (2)
flashinfer_bench/data/trace_set.py (2)
  • TraceSet (24-478)
  • from_path (86-146)
flashinfer_bench/env.py (1)
  • get_fib_ncu_path (60-70)
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py

162-162: Avoid specifying long messages outside the exception class

(TRY003)


168-168: Avoid specifying long messages outside the exception class

(TRY003)


173-176: Avoid specifying long messages outside the exception class

(TRY003)


205-205: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
🔇 Additional comments (4)
flashinfer_bench/env.py (1)

60-70: LGTM!

The function follows the established pattern in this file and correctly provides configurable NCU path resolution with a sensible default.

flashinfer_bench/agents/__init__.py (1)

1-5: LGTM!

The module correctly re-exports run_ncu for public use, following standard Python packaging conventions.

flashinfer_bench/agents/_ncu_runner.py (1)

14-51: LGTM!

The runner script correctly implements the NCU profiling workflow:

  • Properly validates input data using Pydantic models
  • Handles conditional safetensors loading
  • Performs JIT warmup before profiling
  • Uses appropriate CUDA synchronization

The lack of explicit error handling is acceptable for a subprocess runner where exceptions should propagate to the caller.

flashinfer_bench/agents/ncu.py (1)

48-77: LGTM!

The command builder correctly constructs the NCU invocation with appropriate flags and validation. The use of -u for unbuffered Python output ensures real-time visibility during profiling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 80-89: The _truncate_output function currently appends a
truncation message with a leading newline after slicing lines[:max_lines],
causing the final output to exceed max_lines; fix it by ensuring the truncation
message is counted within the max_lines limit—either replace the last retained
line with the message (e.g., set truncated[-1] to the formatted truncation
message) or slice one fewer line (lines[:max_lines-1]) and then append the
message without a leading newline; remove the leading "\n" before the message so
the final join returns exactly max_lines lines when truncation occurs.
🧹 Nitpick comments (4)
flashinfer_bench/agents/ncu.py (4)

48-54: Rename parameter set to avoid shadowing Python builtin.

The parameter name set shadows the built-in set type, which can cause confusion and makes the builtin inaccessible within the function scope.

♻️ Rename to `metric_set` or `ncu_set`
 def _build_ncu_command(
     data_path: Path,
-    set: str,
+    metric_set: str,
     sections: Optional[List[str]],
     kernel_name: Optional[str],
     report_path: Path,
 ) -> List[str]:
     """Build the NCU command line."""
     ncu_path = get_fib_ncu_path()

-    cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", set]
+    cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", metric_set]

Also update the call site at line 196 and the run_ncu function signature at line 96.


92-103: Rename parameter set to avoid shadowing Python builtin.

Consistent with the comment on _build_ncu_command, the set parameter should be renamed to avoid shadowing the built-in type.

See the refactoring suggestion in the earlier comment on lines 48-54.


98-98: Consider validating kernel_name parameter.

The kernel_name parameter is passed directly to NCU without validation. While NCU uses this as a regex filter and should handle invalid input gracefully, consider adding basic validation to provide clearer error messages for common mistakes.

💡 Optional validation example
if kernel_name is not None:
    if not kernel_name.strip():
        raise ValueError("kernel_name cannot be empty or whitespace")
    # Additional checks like detecting unescaped special regex chars could be added

This is optional since NCU should handle invalid patterns, but it could improve the user experience.


15-45: Consider implementing runtime validation of NCU sets and sections rather than hardcoded constants.

The current VALID_SETS and VALID_SECTIONS constants appear accurate for recent Nsight Compute versions, but they lack version checking. If a different NCU version is used (via FIB_NCU_PATH), these hardcoded values may become incorrect or incomplete. Consider querying ncu --list-sets and ncu --list-sections at runtime to dynamically validate user input against the actual installed NCU, or explicitly document which NCU version(s) these constants support.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae742ee and 718cce9.

📒 Files selected for processing (1)
  • flashinfer_bench/agents/ncu.py
🧰 Additional context used
🧬 Code graph analysis (1)
flashinfer_bench/agents/ncu.py (2)
flashinfer_bench/data/trace_set.py (2)
  • TraceSet (24-478)
  • from_path (86-146)
flashinfer_bench/env.py (1)
  • get_fib_ncu_path (60-70)
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py

162-162: Avoid specifying long messages outside the exception class

(TRY003)


168-168: Avoid specifying long messages outside the exception class

(TRY003)


173-176: Avoid specifying long messages outside the exception class

(TRY003)


205-205: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
🔇 Additional comments (5)
flashinfer_bench/agents/ncu.py (5)

1-13: LGTM!

The imports are well-organized and appropriate for NCU profiling functionality.


161-177: LGTM! Comprehensive input validation.

The validation logic properly checks the set and sections parameters against predefined constants and verifies the definition exists in the trace database. The error messages are clear and actionable.

Note: The static analysis hint (TRY003) about exception messages is a style suggestion that can be safely ignored for this use case, as the inline messages provide good context.


204-209: Document security implications of FIB_NCU_PATH environment variable.

The subprocess call uses ncu_path from get_fib_ncu_path(), which reads the FIB_NCU_PATH environment variable. If an attacker controls this variable, they can execute arbitrary commands. While this is a design tradeoff for tool flexibility, it should be clearly documented.

Consider adding a security note to the docstring:

    Environment Variables
    ---------------------
    FIB_DATASET_PATH : str
        Path to the trace dataset. The solution's definition must exist in this dataset.
    FIB_NCU_PATH : str, optional
        Path to the NCU executable. Defaults to "ncu".
        ⚠️  Security: Only set this variable in trusted environments, as it controls
        which executable is invoked.

The static analysis warning (S603) is valid but acceptable given the tool's design. The use of a list-based command (avoiding shell=True) prevents shell injection attacks on other parameters.


180-192: LGTM! Proper data serialization.

The temporary directory creation and JSON data preparation are well-structured. Using model_dump(mode="json") ensures proper Pydantic model serialization.


211-223: Good error handling and output management.

The error handling correctly:

  • Combines stdout and stderr for comprehensive output
  • Returns full output on errors (line 217) to aid debugging
  • Applies truncation only to successful runs

Comment on lines +80 to +89
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
lines = output.split("\n")
if len(lines) <= max_lines:
return output

truncated = lines[:max_lines]
remaining = len(lines) - max_lines
truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
return "\n".join(truncated)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Truncation produces more lines than max_lines specifies.

The function appends a truncation message with a leading newline after taking max_lines lines, resulting in max_lines + 2 total lines in the output. This may confuse users who expect exactly max_lines.

🔧 Fix to respect `max_lines` limit

Option 1: Include the message in the count

 def _truncate_output(output: str, max_lines: int) -> str:
     """Truncate output to max_lines."""
     lines = output.split("\n")
     if len(lines) <= max_lines:
         return output

-    truncated = lines[:max_lines]
+    truncated = lines[:max_lines - 1]
     remaining = len(lines) - max_lines
-    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]")
     return "\n".join(truncated)

Option 2: Clarify the behavior in docstring and parameter name

-def _truncate_output(output: str, max_lines: int) -> str:
-    """Truncate output to max_lines."""
+def _truncate_output(output: str, max_lines: int) -> str:
+    """Truncate output to approximately max_lines (may include truncation notice)."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
lines = output.split("\n")
if len(lines) <= max_lines:
return output
truncated = lines[:max_lines]
remaining = len(lines) - max_lines
truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
return "\n".join(truncated)
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
lines = output.split("\n")
if len(lines) <= max_lines:
return output
truncated = lines[:max_lines - 1]
remaining = len(lines) - max_lines
truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]")
return "\n".join(truncated)
🤖 Prompt for AI Agents
In @flashinfer_bench/agents/ncu.py around lines 80 - 89, The _truncate_output
function currently appends a truncation message with a leading newline after
slicing lines[:max_lines], causing the final output to exceed max_lines; fix it
by ensuring the truncation message is counted within the max_lines limit—either
replace the last retained line with the message (e.g., set truncated[-1] to the
formatted truncation message) or slice one fewer line (lines[:max_lines-1]) and
then append the message without a leading newline; remove the leading "\n"
before the message so the final join returns exactly max_lines lines when
truncation occurs.

Signed-off-by: Ubospica <ubospica@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 84-94: The _truncate_output function incorrectly skips truncation
for max_lines=0 and inserts an extra blank line before the truncation marker;
change the logic to treat max_lines=None or max_lines<0 as "no truncation"
(return the full output), but allow max_lines=0 to return no original lines plus
a single truncation marker, and remove the leading newline in the marker so it
is appended as its own line (e.g., use "[Output truncated: ...]" not
"\n[Output...]"); apply the same fix to the other truncation occurrence around
lines 233-236 (update the same function/logic there).
- Around line 51-81: In _build_ncu_command replace the literal "python" string
used to invoke the runner with the current interpreter via sys.executable:
import sys at the module top if not already present and change the cmd extension
that currently contains ["python", "-u", "-m",
"flashinfer_bench.agents._ncu_runner", "--data", str(data_path)] to use
sys.executable instead of "python" so the virtualenv/interpreter in use is
honored.
🧹 Nitpick comments (1)
flashinfer_bench/agents/ncu.py (1)

96-160: Align error-handling with the docstring (currently returns strings/output; does not raise RuntimeError).
Either update the docstring to explicitly document “returns an ERROR: … string / returns output on failure”, or switch to raising exceptions for timeout / missing NCU / non-zero exit.

Also applies to: 214-237

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 718cce9 and 640bae5.

📒 Files selected for processing (1)
  • flashinfer_bench/agents/ncu.py
🧰 Additional context used
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py

169-169: Avoid specifying long messages outside the exception class

(TRY003)


173-173: Avoid specifying long messages outside the exception class

(TRY003)


179-179: Avoid specifying long messages outside the exception class

(TRY003)


184-187: Avoid specifying long messages outside the exception class

(TRY003)


216-216: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
🔇 Additional comments (2)
flashinfer_bench/agents/ncu.py (2)

167-180: Validation approach looks solid (allowlists + helpful messages).
Good tradeoff for an LLM-exposed tool: constrain set/page/sections to known-safe values.


51-81: Validation for --page, --set, and --section is already in place; no further changes needed.

The codebase already defines valid values for these flags (lines 15-48) and validates them at runtime (lines 168-179) with clear error messages. The documented supported values align with NVIDIA's Nsight Compute CLI documentation. The runtime validation approach is sufficient for handling compatibility.

Comment on lines +84 to +94
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
lines = output.split("\n")
if len(lines) <= max_lines:
return output

truncated = lines[:max_lines]
remaining = len(lines) - max_lines
truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
return "\n".join(truncated)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix truncation edge-cases (max_lines=0, negatives) and remove the extra blank line in the truncation marker.
if max_lines: skips truncation for 0, and the f"\n[Output truncated...]" introduces an extra blank line.

Proposed diff
@@
 def _truncate_output(output: str, max_lines: int) -> str:
     """Truncate output to max_lines."""
+    if max_lines < 0:
+        raise ValueError("max_lines must be >= 0")
     lines = output.split("\n")
     if len(lines) <= max_lines:
         return output
@@
     truncated = lines[:max_lines]
     remaining = len(lines) - max_lines
-    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]")
     return "\n".join(truncated)
@@
-        if max_lines:
+        if max_lines is not None:
             output = _truncate_output(output, max_lines)

Also applies to: 233-236

🤖 Prompt for AI Agents
In @flashinfer_bench/agents/ncu.py around lines 84 - 94, The _truncate_output
function incorrectly skips truncation for max_lines=0 and inserts an extra blank
line before the truncation marker; change the logic to treat max_lines=None or
max_lines<0 as "no truncation" (return the full output), but allow max_lines=0
to return no original lines plus a single truncation marker, and remove the
leading newline in the marker so it is appended as its own line (e.g., use
"[Output truncated: ...]" not "\n[Output...]"); apply the same fix to the other
truncation occurrence around lines 233-236 (update the same function/logic
there).

@Ubospica Ubospica marked this pull request as draft January 11, 2026 07:46
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Ubospica <ubospica@gmail.com>
@Ubospica Ubospica marked this pull request as ready for review January 15, 2026 03:07
Signed-off-by: Ubospica <ubospica@gmail.com>
@Ubospica
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the NCU profiling tool, primarily for use by LLM agents. It adds a standalone solution runner for profiling, new agent functions to interact with NCU, and a schema generator to make these functions available as tools. The code is well-structured, documented, and includes good error handling. I have a couple of suggestions to further improve the robustness and API design of the new NCU module.

Comment on lines +50 to +58
try:
sets_result = subprocess.run(
[ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10
)
sections_result = subprocess.run(
[ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10
)
except subprocess.TimeoutExpired:
return "ERROR: NCU command timed out."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation doesn't check the return codes of the ncu subprocess calls. If a command fails, its error output is concatenated into the result string, but the function doesn't explicitly signal an error (e.g., by returning a string prefixed with ERROR:). This is inconsistent with the error handling in flashinfer_bench_run_ncu and can be misleading for the calling agent. Checking the returncode and returning a formatted error message would make the tool more robust and provide clearer feedback.

Suggested change
try:
sets_result = subprocess.run(
[ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10
)
sections_result = subprocess.run(
[ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10
)
except subprocess.TimeoutExpired:
return "ERROR: NCU command timed out."
try:
sets_result = subprocess.run(
[ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10
)
if sets_result.returncode != 0:
return f"ERROR: 'ncu --list-sets' failed:\n{sets_result.stdout}{sets_result.stderr}"
sections_result = subprocess.run(
[ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10
)
if sections_result.returncode != 0:
return f"ERROR: 'ncu --list-sections' failed:\n{sections_result.stdout}{sections_result.stderr}"
except subprocess.TimeoutExpired:
return "ERROR: NCU command timed out."


def flashinfer_bench_run_ncu(
solution: Union[Solution, str],
workload: Union[Workload, str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function has many optional parameters. To improve the API's clarity and prevent accidental misuse (like passing arguments in the wrong order), it's a good practice to make these optional parameters keyword-only. You can achieve this by adding a * after the last positional argument (workload).

Suggested change
workload: Union[Workload, str],
workload: Union[Workload, str],
*,

Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Ubospica <ubospica@gmail.com>
@Ubospica Ubospica merged commit 4035a6e into flashinfer-ai:main Jan 19, 2026
9 checks passed
@Ubospica
Copy link
Collaborator Author

Merging this first for agile development

@coderabbitai coderabbitai bot mentioned this pull request Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant