Support NCU tool #139

Ubospica · 2026-01-11T07:00:06Z

This PR adds NCU tool to profile solutions. This can also be called by LLMs to build agents.

Signed-off-by: Ubospica ubospica@gmail.com

Summary by CodeRabbit

Release Notes

New Features
- Integrated Nsight Compute (NCU) profiling support with configurable analysis options and filtering capabilities.
- Added AI agent tool schema generation for seamless integration with language models in kernel development workflows.
- New utilities for discovering profiling options and managing profiling workflows.
Dependencies
- Added docstring-parser library support.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Ubospica <ubospica@gmail.com>

coderabbitai · 2026-01-11T07:00:15Z

Warning

Rate limit exceeded

@Ubospica has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c2ba5ef and e12b2e8.

📒 Files selected for processing (4)

docs/api/index.md
docs/api/rst/agents.md
flashinfer_bench/agents/ffi_prompt.py
flashinfer_bench/agents/ncu.py

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Introduces agent tools for LLM-based kernel development and debugging, including an Nsight Compute profiling workflow, solution runner script, and OpenAI/Anthropic-compatible tool schema generation. Adds module exports, input validation, command construction, subprocess execution, and type-aware schema generation with docstring parsing support.

Changes

Cohort / File(s)	Summary
Module Initialization `flashinfer_bench/agents/__init__.py`	Adds module docstring, converts to explicit relative imports, exports new public functions (`flashinfer_bench_list_ncu_options`, `flashinfer_bench_run_ncu`, `function_to_schema`, `get_all_tool_schemas`) and constants (`FFI_PROMPT`, `FFI_PROMPT_SIMPLE`).
NCU Profiling Orchestration `flashinfer_bench/agents/ncu.py`	Implements Nsight Compute profiling workflow: validates inputs, generates JSON payloads, builds NCU commands, executes subprocess with timeout, handles output truncation. Public functions: `flashinfer_bench_run_ncu()` and `flashinfer_bench_list_ncu_options()`. Introduces `VALID_PAGES` constant.
Solution Runner Script `flashinfer_bench/agents/_solution_runner.py`	CLI script invoked by NCU profiler; loads Definition/Solution/Workload from JSON, builds executable solution, generates/allocates tensors, executes warmup and profiling runs within NVTX range.
Schema Generation `flashinfer_bench/agents/schema.py`	Generates OpenAI/Anthropic-compatible tool schemas from Python functions: converts type hints to JSON Schema (supports Optional, Union, List, Dict, Pydantic models), extracts docstring descriptions, assembles parameter definitions. Public functions: `function_to_schema()` and `get_all_tool_schemas()`.
Dependencies `pyproject.toml`	Adds runtime dependency: `docstring-parser>=0.16`.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as LLM Agent
    participant Runner as flashinfer_bench_run_ncu()
    participant NCU as Nsight Compute
    participant Script as _solution_runner.py
    participant Device as GPU Device
    
    Agent->>Runner: Call with solution & workload
    Runner->>Runner: Validate inputs & pages
    Runner->>Runner: Create temp directory
    Runner->>Runner: Write JSON payloads
    Runner->>Runner: Build NCU command
    Runner->>NCU: Execute profiling command
    NCU->>Script: Invoke solution runner
    Script->>Script: Load Definition/Solution/Workload
    Script->>Script: Build executable solution
    Script->>Script: Generate input tensors
    Script->>Device: Warmup run (JIT compile)
    Script->>Device: NVTX profiling run
    Script->>Device: CUDA synchronize
    Script-->>NCU: Return results
    NCU-->>Runner: Profiling output
    Runner->>Runner: Truncate output (if needed)
    Runner-->>Agent: Return profiling report

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

YiyanZhai

Poem

🐰 Hops of joy through profiling paths,
With schemas stitched from docstrings' craft,
NCU commands in tempfiles stored,
Solution runners on GPU soared! ✨
LLM agents now debug with glee. 🎯

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Support NCU tool' directly matches the PR's primary objective to add and expose the NCU tool for profiling solutions.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-11T07:00:23Z

Summary of Changes

Hello @Ubospica, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates NVIDIA Nsight Compute (NCU) profiling capabilities into the flashinfer_bench framework. It enables users to profile the performance of defined solutions and workloads using NCU, facilitating performance analysis and optimization. The changes include a new agent tool for orchestrating NCU runs, a standalone runner script, and an environment variable for configuring the NCU executable path.

Highlights

NCU Tool Integration: Introduces a new agent tool to support NVIDIA Nsight Compute (NCU) profiling for solutions within the framework.
Standalone NCU Runner: Adds _ncu_runner.py, a new script designed to execute solutions in a controlled environment specifically for NCU profiling.
Programmatic Profiling API: Provides the run_ncu function in ncu.py for programmatically invoking NCU with various configurable options such as profiling sets, sections, kernel filters, and timeouts.
Configurable NCU Path: Adds get_fib_ncu_path in env.py to allow specifying the NCU executable path via an environment variable (FIB_NCU_PATH), defaulting to 'ncu' if not set.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for NVIDIA's Nsight Compute (NCU) profiler, allowing for detailed performance analysis of kernels. This is a significant addition, providing a new run_ncu tool within the flashinfer_bench.agents module. The implementation includes a dedicated runner script (_ncu_runner.py) that is invoked by the main ncu.py module, which constructs the NCU command and handles its execution.

My review focuses on improving the maintainability and robustness of the new NCU integration. I've suggested renaming a parameter that shadows a Python built-in and proposed a significant improvement to the error handling logic to ensure that profiling failures are easier to debug. Overall, the changes are well-structured and provide a valuable new capability.

flashinfer_bench/agents/ncu.py

gemini-code-assist · 2026-01-11T07:02:09Z

flashinfer_bench/agents/ncu.py

+    solution: Solution,
+    workload: Workload,
+    *,
+    set: str = "detailed",


The parameter set shadows the built-in set type in Python. It's a best practice to avoid using names of built-ins for variables. I suggest renaming it to ncu_set for clarity and to prevent potential issues.

This change should be applied consistently in the following places:

run_ncu signature (this line)

run_ncu body (lines 161, 196)

_build_ncu_command signature (line 50)

_build_ncu_command body (line 58)

Suggested change

set: str = "detailed",

ncu_set: str = "detailed",

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @flashinfer_bench/agents/ncu.py:
- Around line 16-45: VALID_SETS is missing the "source" entry causing incorrect
validation; update the VALID_SETS set in agents/ncu.py to include the string
"source" (i.e., add "source" to the VALID_SETS definition) and optionally add a
comment noting NCU version dependency or replace static validation with a
runtime query via `ncu --list-sets`/`ncu --list-sections` if you prefer dynamic
validation; ensure references to VALID_SETS elsewhere in the module remain
unchanged.

🧹 Nitpick comments (2)

flashinfer_bench/agents/ncu.py (2)
80-89: Consider removing leading newline in truncation message.

The truncation message on line 88 includes a leading \n, which may add an unintended blank line. Consider:
-    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]")
214-217: Clarify error handling logic.

The error check at lines 214-217 tests for errors but then does nothing (pass). The comment "Keep error messages visible" suggests the intent is to return the output even on error, but this could be more explicit:
-    # Check for errors
-    if result.returncode != 0 and "==ERROR==" in output:
-        # Keep error messages visible
-        pass
+    # Note: Return output even on error to show NCU error messages to the user
Alternatively, if specific error handling is needed, implement it here.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bb3ecc and ae742ee.

📒 Files selected for processing (4)

flashinfer_bench/agents/__init__.py
flashinfer_bench/agents/_ncu_runner.py
flashinfer_bench/agents/ncu.py
flashinfer_bench/env.py

🧰 Additional context used

🧬 Code graph analysis (2)

flashinfer_bench/agents/__init__.py (1)

flashinfer_bench/agents/ncu.py (1)

run_ncu (92-223)

flashinfer_bench/agents/ncu.py (2)

flashinfer_bench/data/trace_set.py (2)

TraceSet (24-478)

from_path (86-146)

flashinfer_bench/env.py (1)

get_fib_ncu_path (60-70)

🪛 Ruff (0.14.10)

flashinfer_bench/agents/ncu.py

162-162: Avoid specifying long messages outside the exception class

(TRY003)

168-168: Avoid specifying long messages outside the exception class

(TRY003)

173-176: Avoid specifying long messages outside the exception class

(TRY003)

205-205: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
GitHub Check: Run unit tests on ubuntu-latest and Python 3.13

🔇 Additional comments (4)

flashinfer_bench/env.py (1)

60-70: LGTM!

The function follows the established pattern in this file and correctly provides configurable NCU path resolution with a sensible default.

flashinfer_bench/agents/__init__.py (1)

1-5: LGTM!

The module correctly re-exports run_ncu for public use, following standard Python packaging conventions.

flashinfer_bench/agents/_ncu_runner.py (1)

14-51: LGTM!

The runner script correctly implements the NCU profiling workflow:

Properly validates input data using Pydantic models

Handles conditional safetensors loading

Performs JIT warmup before profiling

Uses appropriate CUDA synchronization

The lack of explicit error handling is acceptable for a subprocess runner where exceptions should propagate to the caller.

flashinfer_bench/agents/ncu.py (1)

48-77: LGTM!

The command builder correctly constructs the NCU invocation with appropriate flags and validation. The use of -u for unbuffered Python output ensures real-time visibility during profiling.

flashinfer_bench/agents/ncu.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @flashinfer_bench/agents/ncu.py:
- Around line 80-89: The _truncate_output function currently appends a
truncation message with a leading newline after slicing lines[:max_lines],
causing the final output to exceed max_lines; fix it by ensuring the truncation
message is counted within the max_lines limit—either replace the last retained
line with the message (e.g., set truncated[-1] to the formatted truncation
message) or slice one fewer line (lines[:max_lines-1]) and then append the
message without a leading newline; remove the leading "\n" before the message so
the final join returns exactly max_lines lines when truncation occurs.

🧹 Nitpick comments (4)

flashinfer_bench/agents/ncu.py (4)
48-54: Rename parameter set to avoid shadowing Python builtin.

The parameter name set shadows the built-in set type, which can cause confusion and makes the builtin inaccessible within the function scope.
♻️ Rename to `metric_set` or `ncu_set`
 def _build_ncu_command(
     data_path: Path,
-    set: str,
+    metric_set: str,
     sections: Optional[List[str]],
     kernel_name: Optional[str],
     report_path: Path,
 ) -> List[str]:
     """Build the NCU command line."""
     ncu_path = get_fib_ncu_path()

-    cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", set]
+    cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", metric_set]
Also update the call site at line 196 and the run_ncu function signature at line 96.
92-103: Rename parameter set to avoid shadowing Python builtin.

Consistent with the comment on _build_ncu_command, the set parameter should be renamed to avoid shadowing the built-in type.

See the refactoring suggestion in the earlier comment on lines 48-54.

98-98: Consider validating kernel_name parameter.

The kernel_name parameter is passed directly to NCU without validation. While NCU uses this as a regex filter and should handle invalid input gracefully, consider adding basic validation to provide clearer error messages for common mistakes.
💡 Optional validation example
if kernel_name is not None:
    if not kernel_name.strip():
        raise ValueError("kernel_name cannot be empty or whitespace")
    # Additional checks like detecting unescaped special regex chars could be added
This is optional since NCU should handle invalid patterns, but it could improve the user experience.
15-45: Consider implementing runtime validation of NCU sets and sections rather than hardcoded constants.

The current VALID_SETS and VALID_SECTIONS constants appear accurate for recent Nsight Compute versions, but they lack version checking. If a different NCU version is used (via FIB_NCU_PATH), these hardcoded values may become incorrect or incomplete. Consider querying ncu --list-sets and ncu --list-sections at runtime to dynamically validate user input against the actual installed NCU, or explicitly document which NCU version(s) these constants support.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae742ee and 718cce9.

📒 Files selected for processing (1)

flashinfer_bench/agents/ncu.py

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer_bench/agents/ncu.py (2)

flashinfer_bench/data/trace_set.py (2)

TraceSet (24-478)

from_path (86-146)

flashinfer_bench/env.py (1)

get_fib_ncu_path (60-70)

🪛 Ruff (0.14.10)

flashinfer_bench/agents/ncu.py

162-162: Avoid specifying long messages outside the exception class

(TRY003)

168-168: Avoid specifying long messages outside the exception class

(TRY003)

173-176: Avoid specifying long messages outside the exception class

(TRY003)

205-205: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
GitHub Check: Run unit tests on ubuntu-latest and Python 3.12

🔇 Additional comments (5)

flashinfer_bench/agents/ncu.py (5)
1-13: LGTM!

The imports are well-organized and appropriate for NCU profiling functionality.

161-177: LGTM! Comprehensive input validation.

The validation logic properly checks the set and sections parameters against predefined constants and verifies the definition exists in the trace database. The error messages are clear and actionable.

Note: The static analysis hint (TRY003) about exception messages is a style suggestion that can be safely ignored for this use case, as the inline messages provide good context.

204-209: Document security implications of FIB_NCU_PATH environment variable.

The subprocess call uses ncu_path from get_fib_ncu_path(), which reads the FIB_NCU_PATH environment variable. If an attacker controls this variable, they can execute arbitrary commands. While this is a design tradeoff for tool flexibility, it should be clearly documented.

Consider adding a security note to the docstring:
    Environment Variables
    ---------------------
    FIB_DATASET_PATH : str
        Path to the trace dataset. The solution's definition must exist in this dataset.
    FIB_NCU_PATH : str, optional
        Path to the NCU executable. Defaults to "ncu".
        ⚠️  Security: Only set this variable in trusted environments, as it controls
        which executable is invoked.
The static analysis warning (S603) is valid but acceptable given the tool's design. The use of a list-based command (avoiding shell=True) prevents shell injection attacks on other parameters.

180-192: LGTM! Proper data serialization.

The temporary directory creation and JSON data preparation are well-structured. Using model_dump(mode="json") ensures proper Pydantic model serialization.

211-223: Good error handling and output management.

The error handling correctly:

Combines stdout and stderr for comprehensive output

Returns full output on errors (line 217) to aid debugging

Applies truncation only to successful runs

coderabbitai · 2026-01-11T07:12:02Z

flashinfer_bench/agents/ncu.py

+def _truncate_output(output: str, max_lines: int) -> str:
+    """Truncate output to max_lines."""
+    lines = output.split("\n")
+    if len(lines) <= max_lines:
+        return output
+
+    truncated = lines[:max_lines]
+    remaining = len(lines) - max_lines
+    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    return "\n".join(truncated)


⚠️ Potential issue | 🟡 Minor

Truncation produces more lines than max_lines specifies.

The function appends a truncation message with a leading newline after taking max_lines lines, resulting in max_lines + 2 total lines in the output. This may confuse users who expect exactly max_lines.

🔧 Fix to respect `max_lines` limit

Option 1: Include the message in the count

def _truncate_output(output: str, max_lines: int) -> str: """Truncate output to max_lines.""" lines = output.split("\n") if len(lines) <= max_lines: return output - truncated = lines[:max_lines] + truncated = lines[:max_lines - 1] remaining = len(lines) - max_lines - truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") + truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]") return "\n".join(truncated)

Option 2: Clarify the behavior in docstring and parameter name

-def _truncate_output(output: str, max_lines: int) -> str: - """Truncate output to max_lines.""" +def _truncate_output(output: str, max_lines: int) -> str: + """Truncate output to approximately max_lines (may include truncation notice)."""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _truncate_output(output: str, max_lines: int) -> str:

"""Truncate output to max_lines."""

lines = output.split("\n")

if len(lines) <= max_lines:

return output

truncated = lines[:max_lines]

remaining = len(lines) - max_lines

truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")

return "\n".join(truncated)

def _truncate_output(output: str, max_lines: int) -> str:

"""Truncate output to max_lines."""

lines = output.split("\n")

if len(lines) <= max_lines:

return output

truncated = lines[:max_lines - 1]

remaining = len(lines) - max_lines

truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]")

return "\n".join(truncated)

🤖 Prompt for AI Agents

In @flashinfer_bench/agents/ncu.py around lines 80 - 89, The _truncate_output function currently appends a truncation message with a leading newline after slicing lines[:max_lines], causing the final output to exceed max_lines; fix it by ensuring the truncation message is counted within the max_lines limit—either replace the last retained line with the message (e.g., set truncated[-1] to the formatted truncation message) or slice one fewer line (lines[:max_lines-1]) and then append the message without a leading newline; remove the leading "\n" before the message so the final join returns exactly max_lines lines when truncation occurs.

Signed-off-by: Ubospica <ubospica@gmail.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @flashinfer_bench/agents/ncu.py:
- Around line 84-94: The _truncate_output function incorrectly skips truncation
for max_lines=0 and inserts an extra blank line before the truncation marker;
change the logic to treat max_lines=None or max_lines<0 as "no truncation"
(return the full output), but allow max_lines=0 to return no original lines plus
a single truncation marker, and remove the leading newline in the marker so it
is appended as its own line (e.g., use "[Output truncated: ...]" not
"\n[Output...]"); apply the same fix to the other truncation occurrence around
lines 233-236 (update the same function/logic there).
- Around line 51-81: In _build_ncu_command replace the literal "python" string
used to invoke the runner with the current interpreter via sys.executable:
import sys at the module top if not already present and change the cmd extension
that currently contains ["python", "-u", "-m",
"flashinfer_bench.agents._ncu_runner", "--data", str(data_path)] to use
sys.executable instead of "python" so the virtualenv/interpreter in use is
honored.

🧹 Nitpick comments (1)

flashinfer_bench/agents/ncu.py (1)

96-160: Align error-handling with the docstring (currently returns strings/output; does not raise RuntimeError).
Either update the docstring to explicitly document “returns an ERROR: … string / returns output on failure”, or switch to raising exceptions for timeout / missing NCU / non-zero exit.

Also applies to: 214-237

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 718cce9 and 640bae5.

📒 Files selected for processing (1)

flashinfer_bench/agents/ncu.py

🧰 Additional context used

🪛 Ruff (0.14.10)

flashinfer_bench/agents/ncu.py

169-169: Avoid specifying long messages outside the exception class

(TRY003)

173-173: Avoid specifying long messages outside the exception class

(TRY003)

179-179: Avoid specifying long messages outside the exception class

(TRY003)

184-187: Avoid specifying long messages outside the exception class

(TRY003)

216-216: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
GitHub Check: Run unit tests on ubuntu-latest and Python 3.10

🔇 Additional comments (2)

flashinfer_bench/agents/ncu.py (2)

167-180: Validation approach looks solid (allowlists + helpful messages).
Good tradeoff for an LLM-exposed tool: constrain set/page/sections to known-safe values.

51-81: Validation for --page, --set, and --section is already in place; no further changes needed.

The codebase already defines valid values for these flags (lines 15-48) and validates them at runtime (lines 168-179) with clear error messages. The documented supported values align with NVIDIA's Nsight Compute CLI documentation. The runtime validation approach is sufficient for handling compatibility.

flashinfer_bench/agents/ncu.py

coderabbitai · 2026-01-11T07:21:00Z

flashinfer_bench/agents/ncu.py

+def _truncate_output(output: str, max_lines: int) -> str:
+    """Truncate output to max_lines."""
+    lines = output.split("\n")
+    if len(lines) <= max_lines:
+        return output
+
+    truncated = lines[:max_lines]
+    remaining = len(lines) - max_lines
+    truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+    return "\n".join(truncated)
+


⚠️ Potential issue | 🟠 Major

Fix truncation edge-cases (max_lines=0, negatives) and remove the extra blank line in the truncation marker.
if max_lines: skips truncation for 0, and the f"\n[Output truncated...]" introduces an extra blank line.

Proposed diff

@@ def _truncate_output(output: str, max_lines: int) -> str: """Truncate output to max_lines.""" + if max_lines < 0: + raise ValueError("max_lines must be >= 0") lines = output.split("\n") if len(lines) <= max_lines: return output @@ truncated = lines[:max_lines] remaining = len(lines) - max_lines - truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") + truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]") return "\n".join(truncated) @@ - if max_lines: + if max_lines is not None: output = _truncate_output(output, max_lines)

Also applies to: 233-236

🤖 Prompt for AI Agents

In @flashinfer_bench/agents/ncu.py around lines 84 - 94, The _truncate_output function incorrectly skips truncation for max_lines=0 and inserts an extra blank line before the truncation marker; change the logic to treat max_lines=None or max_lines<0 as "no truncation" (return the full output), but allow max_lines=0 to return no original lines plus a single truncation marker, and remove the leading newline in the marker so it is appended as its own line (e.g., use "[Output truncated: ...]" not "\n[Output...]"); apply the same fix to the other truncation occurrence around lines 233-236 (update the same function/logic there).

Signed-off-by: Ubospica <ubospica@gmail.com>

Ubospica · 2026-01-15T03:08:05Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the NCU profiling tool, primarily for use by LLM agents. It adds a standalone solution runner for profiling, new agent functions to interact with NCU, and a schema generator to make these functions available as tools. The code is well-structured, documented, and includes good error handling. I have a couple of suggestions to further improve the robustness and API design of the new NCU module.

gemini-code-assist · 2026-01-15T03:09:40Z

flashinfer_bench/agents/ncu.py

+    try:
+        sets_result = subprocess.run(
+            [ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10
+        )
+        sections_result = subprocess.run(
+            [ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10
+        )
+    except subprocess.TimeoutExpired:
+        return "ERROR: NCU command timed out."


The current implementation doesn't check the return codes of the ncu subprocess calls. If a command fails, its error output is concatenated into the result string, but the function doesn't explicitly signal an error (e.g., by returning a string prefixed with ERROR:). This is inconsistent with the error handling in flashinfer_bench_run_ncu and can be misleading for the calling agent. Checking the returncode and returning a formatted error message would make the tool more robust and provide clearer feedback.

Suggested change

try:

sets_result = subprocess.run(

[ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10

)

sections_result = subprocess.run(

[ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10

)

except subprocess.TimeoutExpired:

return "ERROR: NCU command timed out."

try:

sets_result = subprocess.run(

[ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10

)

if sets_result.returncode != 0:

return f"ERROR: 'ncu --list-sets' failed:\n{sets_result.stdout}{sets_result.stderr}"

sections_result = subprocess.run(

[ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10

)

if sections_result.returncode != 0:

return f"ERROR: 'ncu --list-sections' failed:\n{sections_result.stdout}{sections_result.stderr}"

except subprocess.TimeoutExpired:

return "ERROR: NCU command timed out."

gemini-code-assist · 2026-01-15T03:09:40Z

flashinfer_bench/agents/ncu.py

+
+def flashinfer_bench_run_ncu(
+    solution: Union[Solution, str],
+    workload: Union[Workload, str],


This function has many optional parameters. To improve the API's clarity and prevent accidental misuse (like passing arguments in the wrong order), it's a good practice to make these optional parameters keyword-only. You can achieve this by adding a * after the last positional argument (workload).

Suggested change

workload: Union[Workload, str],

workload: Union[Workload, str],

*,

Signed-off-by: Ubospica <ubospica@gmail.com>

Ubospica · 2026-01-19T07:16:43Z

Merging this first for agile development

finish

ae742ee

Signed-off-by: Ubospica <ubospica@gmail.com>

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

Update flashinfer_bench/agents/ncu.py

718cce9

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

flashinfer_bench/agents/ncu.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

add pages

640bae5

Signed-off-by: Ubospica <ubospica@gmail.com>

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

Ubospica marked this pull request as draft January 11, 2026 07:46

Ubospica added 3 commits January 14, 2026 22:03

update

2405769

Signed-off-by: Ubospica <ubospica@gmail.com>

Merge branch 'main' into main-dev/2026-01-10-ncu

37fe763

fix linting

97e4e3f

Signed-off-by: Ubospica <ubospica@gmail.com>

Ubospica marked this pull request as ready for review January 15, 2026 03:07

updat

c2ba5ef

Signed-off-by: Ubospica <ubospica@gmail.com>

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

Ubospica added 5 commits January 14, 2026 22:12

update

d78958f

Signed-off-by: Ubospica <ubospica@gmail.com>

fix

238e2a6

Signed-off-by: Ubospica <ubospica@gmail.com>

updaet

708dba8

Signed-off-by: Ubospica <ubospica@gmail.com>

update

e8a5bd9

Signed-off-by: Ubospica <ubospica@gmail.com>

update

e12b2e8

Signed-off-by: Ubospica <ubospica@gmail.com>

Ubospica merged commit 4035a6e into flashinfer-ai:main Jan 19, 2026
9 checks passed

coderabbitai bot mentioned this pull request Jan 20, 2026

Support Agent Tools #140

Merged

	workload: Union[Workload, str],
	workload: Union[Workload, str],
	*,

Support NCU tool #139

Support NCU tool #139

Uh oh!

Conversation

Ubospica commented Jan 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Jan 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Ubospica commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ubospica commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ubospica commented Jan 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 11, 2026 •

edited

Loading