-
Notifications
You must be signed in to change notification settings - Fork 20
Support NCU tool #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NCU tool #139
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughIntroduces agent tools for LLM-based kernel development and debugging, including an Nsight Compute profiling workflow, solution runner script, and OpenAI/Anthropic-compatible tool schema generation. Adds module exports, input validation, command construction, subprocess execution, and type-aware schema generation with docstring parsing support. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as LLM Agent
participant Runner as flashinfer_bench_run_ncu()
participant NCU as Nsight Compute
participant Script as _solution_runner.py
participant Device as GPU Device
Agent->>Runner: Call with solution & workload
Runner->>Runner: Validate inputs & pages
Runner->>Runner: Create temp directory
Runner->>Runner: Write JSON payloads
Runner->>Runner: Build NCU command
Runner->>NCU: Execute profiling command
NCU->>Script: Invoke solution runner
Script->>Script: Load Definition/Solution/Workload
Script->>Script: Build executable solution
Script->>Script: Generate input tensors
Script->>Device: Warmup run (JIT compile)
Script->>Device: NVTX profiling run
Script->>Device: CUDA synchronize
Script-->>NCU: Return results
NCU-->>Runner: Profiling output
Runner->>Runner: Truncate output (if needed)
Runner-->>Agent: Return profiling report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Ubospica, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates NVIDIA Nsight Compute (NCU) profiling capabilities into the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for NVIDIA's Nsight Compute (NCU) profiler, allowing for detailed performance analysis of kernels. This is a significant addition, providing a new run_ncu tool within the flashinfer_bench.agents module. The implementation includes a dedicated runner script (_ncu_runner.py) that is invoked by the main ncu.py module, which constructs the NCU command and handles its execution.
My review focuses on improving the maintainability and robustness of the new NCU integration. I've suggested renaming a parameter that shadows a Python built-in and proposed a significant improvement to the error handling logic to ensure that profiling failures are easier to debug. Overall, the changes are well-structured and provide a valuable new capability.
| solution: Solution, | ||
| workload: Workload, | ||
| *, | ||
| set: str = "detailed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter set shadows the built-in set type in Python. It's a best practice to avoid using names of built-ins for variables. I suggest renaming it to ncu_set for clarity and to prevent potential issues.
This change should be applied consistently in the following places:
run_ncusignature (this line)run_ncubody (lines 161, 196)_build_ncu_commandsignature (line 50)_build_ncu_commandbody (line 58)
| set: str = "detailed", | |
| ncu_set: str = "detailed", |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 16-45: VALID_SETS is missing the "source" entry causing incorrect
validation; update the VALID_SETS set in agents/ncu.py to include the string
"source" (i.e., add "source" to the VALID_SETS definition) and optionally add a
comment noting NCU version dependency or replace static validation with a
runtime query via `ncu --list-sets`/`ncu --list-sections` if you prefer dynamic
validation; ensure references to VALID_SETS elsewhere in the module remain
unchanged.
🧹 Nitpick comments (2)
flashinfer_bench/agents/ncu.py (2)
80-89: Consider removing leading newline in truncation message.The truncation message on line 88 includes a leading
\n, which may add an unintended blank line. Consider:- truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") + truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]")
214-217: Clarify error handling logic.The error check at lines 214-217 tests for errors but then does nothing (
pass). The comment "Keep error messages visible" suggests the intent is to return the output even on error, but this could be more explicit:- # Check for errors - if result.returncode != 0 and "==ERROR==" in output: - # Keep error messages visible - pass + # Note: Return output even on error to show NCU error messages to the userAlternatively, if specific error handling is needed, implement it here.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
flashinfer_bench/agents/__init__.pyflashinfer_bench/agents/_ncu_runner.pyflashinfer_bench/agents/ncu.pyflashinfer_bench/env.py
🧰 Additional context used
🧬 Code graph analysis (2)
flashinfer_bench/agents/__init__.py (1)
flashinfer_bench/agents/ncu.py (1)
run_ncu(92-223)
flashinfer_bench/agents/ncu.py (2)
flashinfer_bench/data/trace_set.py (2)
TraceSet(24-478)from_path(86-146)flashinfer_bench/env.py (1)
get_fib_ncu_path(60-70)
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py
162-162: Avoid specifying long messages outside the exception class
(TRY003)
168-168: Avoid specifying long messages outside the exception class
(TRY003)
173-176: Avoid specifying long messages outside the exception class
(TRY003)
205-205: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
🔇 Additional comments (4)
flashinfer_bench/env.py (1)
60-70: LGTM!The function follows the established pattern in this file and correctly provides configurable NCU path resolution with a sensible default.
flashinfer_bench/agents/__init__.py (1)
1-5: LGTM!The module correctly re-exports
run_ncufor public use, following standard Python packaging conventions.flashinfer_bench/agents/_ncu_runner.py (1)
14-51: LGTM!The runner script correctly implements the NCU profiling workflow:
- Properly validates input data using Pydantic models
- Handles conditional safetensors loading
- Performs JIT warmup before profiling
- Uses appropriate CUDA synchronization
The lack of explicit error handling is acceptable for a subprocess runner where exceptions should propagate to the caller.
flashinfer_bench/agents/ncu.py (1)
48-77: LGTM!The command builder correctly constructs the NCU invocation with appropriate flags and validation. The use of
-ufor unbuffered Python output ensures real-time visibility during profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 80-89: The _truncate_output function currently appends a
truncation message with a leading newline after slicing lines[:max_lines],
causing the final output to exceed max_lines; fix it by ensuring the truncation
message is counted within the max_lines limit—either replace the last retained
line with the message (e.g., set truncated[-1] to the formatted truncation
message) or slice one fewer line (lines[:max_lines-1]) and then append the
message without a leading newline; remove the leading "\n" before the message so
the final join returns exactly max_lines lines when truncation occurs.
🧹 Nitpick comments (4)
flashinfer_bench/agents/ncu.py (4)
48-54: Rename parametersetto avoid shadowing Python builtin.The parameter name
setshadows the built-insettype, which can cause confusion and makes the builtin inaccessible within the function scope.♻️ Rename to `metric_set` or `ncu_set`
def _build_ncu_command( data_path: Path, - set: str, + metric_set: str, sections: Optional[List[str]], kernel_name: Optional[str], report_path: Path, ) -> List[str]: """Build the NCU command line.""" ncu_path = get_fib_ncu_path() - cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", set] + cmd = [ncu_path, "--export", str(report_path), "--page", "details", "--set", metric_set]Also update the call site at line 196 and the
run_ncufunction signature at line 96.
92-103: Rename parametersetto avoid shadowing Python builtin.Consistent with the comment on
_build_ncu_command, thesetparameter should be renamed to avoid shadowing the built-in type.See the refactoring suggestion in the earlier comment on lines 48-54.
98-98: Consider validatingkernel_nameparameter.The
kernel_nameparameter is passed directly to NCU without validation. While NCU uses this as a regex filter and should handle invalid input gracefully, consider adding basic validation to provide clearer error messages for common mistakes.💡 Optional validation example
if kernel_name is not None: if not kernel_name.strip(): raise ValueError("kernel_name cannot be empty or whitespace") # Additional checks like detecting unescaped special regex chars could be addedThis is optional since NCU should handle invalid patterns, but it could improve the user experience.
15-45: Consider implementing runtime validation of NCU sets and sections rather than hardcoded constants.The current
VALID_SETSandVALID_SECTIONSconstants appear accurate for recent Nsight Compute versions, but they lack version checking. If a different NCU version is used (viaFIB_NCU_PATH), these hardcoded values may become incorrect or incomplete. Consider queryingncu --list-setsandncu --list-sectionsat runtime to dynamically validate user input against the actual installed NCU, or explicitly document which NCU version(s) these constants support.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer_bench/agents/ncu.py
🧰 Additional context used
🧬 Code graph analysis (1)
flashinfer_bench/agents/ncu.py (2)
flashinfer_bench/data/trace_set.py (2)
TraceSet(24-478)from_path(86-146)flashinfer_bench/env.py (1)
get_fib_ncu_path(60-70)
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py
162-162: Avoid specifying long messages outside the exception class
(TRY003)
168-168: Avoid specifying long messages outside the exception class
(TRY003)
173-176: Avoid specifying long messages outside the exception class
(TRY003)
205-205: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
🔇 Additional comments (5)
flashinfer_bench/agents/ncu.py (5)
1-13: LGTM!The imports are well-organized and appropriate for NCU profiling functionality.
161-177: LGTM! Comprehensive input validation.The validation logic properly checks the
setandsectionsparameters against predefined constants and verifies the definition exists in the trace database. The error messages are clear and actionable.Note: The static analysis hint (TRY003) about exception messages is a style suggestion that can be safely ignored for this use case, as the inline messages provide good context.
204-209: Document security implications ofFIB_NCU_PATHenvironment variable.The subprocess call uses
ncu_pathfromget_fib_ncu_path(), which reads theFIB_NCU_PATHenvironment variable. If an attacker controls this variable, they can execute arbitrary commands. While this is a design tradeoff for tool flexibility, it should be clearly documented.Consider adding a security note to the docstring:
Environment Variables --------------------- FIB_DATASET_PATH : str Path to the trace dataset. The solution's definition must exist in this dataset. FIB_NCU_PATH : str, optional Path to the NCU executable. Defaults to "ncu". ⚠️ Security: Only set this variable in trusted environments, as it controls which executable is invoked.The static analysis warning (S603) is valid but acceptable given the tool's design. The use of a list-based command (avoiding
shell=True) prevents shell injection attacks on other parameters.
180-192: LGTM! Proper data serialization.The temporary directory creation and JSON data preparation are well-structured. Using
model_dump(mode="json")ensures proper Pydantic model serialization.
211-223: Good error handling and output management.The error handling correctly:
- Combines stdout and stderr for comprehensive output
- Returns full output on errors (line 217) to aid debugging
- Applies truncation only to successful runs
| def _truncate_output(output: str, max_lines: int) -> str: | ||
| """Truncate output to max_lines.""" | ||
| lines = output.split("\n") | ||
| if len(lines) <= max_lines: | ||
| return output | ||
|
|
||
| truncated = lines[:max_lines] | ||
| remaining = len(lines) - max_lines | ||
| truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") | ||
| return "\n".join(truncated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truncation produces more lines than max_lines specifies.
The function appends a truncation message with a leading newline after taking max_lines lines, resulting in max_lines + 2 total lines in the output. This may confuse users who expect exactly max_lines.
🔧 Fix to respect `max_lines` limit
Option 1: Include the message in the count
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
lines = output.split("\n")
if len(lines) <= max_lines:
return output
- truncated = lines[:max_lines]
+ truncated = lines[:max_lines - 1]
remaining = len(lines) - max_lines
- truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+ truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]")
return "\n".join(truncated)Option 2: Clarify the behavior in docstring and parameter name
-def _truncate_output(output: str, max_lines: int) -> str:
- """Truncate output to max_lines."""
+def _truncate_output(output: str, max_lines: int) -> str:
+ """Truncate output to approximately max_lines (may include truncation notice)."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _truncate_output(output: str, max_lines: int) -> str: | |
| """Truncate output to max_lines.""" | |
| lines = output.split("\n") | |
| if len(lines) <= max_lines: | |
| return output | |
| truncated = lines[:max_lines] | |
| remaining = len(lines) - max_lines | |
| truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") | |
| return "\n".join(truncated) | |
| def _truncate_output(output: str, max_lines: int) -> str: | |
| """Truncate output to max_lines.""" | |
| lines = output.split("\n") | |
| if len(lines) <= max_lines: | |
| return output | |
| truncated = lines[:max_lines - 1] | |
| remaining = len(lines) - max_lines | |
| truncated.append(f"[Output truncated: {remaining + 1} more lines, use max_lines=None to see all]") | |
| return "\n".join(truncated) |
🤖 Prompt for AI Agents
In @flashinfer_bench/agents/ncu.py around lines 80 - 89, The _truncate_output
function currently appends a truncation message with a leading newline after
slicing lines[:max_lines], causing the final output to exceed max_lines; fix it
by ensuring the truncation message is counted within the max_lines limit—either
replace the last retained line with the message (e.g., set truncated[-1] to the
formatted truncation message) or slice one fewer line (lines[:max_lines-1]) and
then append the message without a leading newline; remove the leading "\n"
before the message so the final join returns exactly max_lines lines when
truncation occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @flashinfer_bench/agents/ncu.py:
- Around line 84-94: The _truncate_output function incorrectly skips truncation
for max_lines=0 and inserts an extra blank line before the truncation marker;
change the logic to treat max_lines=None or max_lines<0 as "no truncation"
(return the full output), but allow max_lines=0 to return no original lines plus
a single truncation marker, and remove the leading newline in the marker so it
is appended as its own line (e.g., use "[Output truncated: ...]" not
"\n[Output...]"); apply the same fix to the other truncation occurrence around
lines 233-236 (update the same function/logic there).
- Around line 51-81: In _build_ncu_command replace the literal "python" string
used to invoke the runner with the current interpreter via sys.executable:
import sys at the module top if not already present and change the cmd extension
that currently contains ["python", "-u", "-m",
"flashinfer_bench.agents._ncu_runner", "--data", str(data_path)] to use
sys.executable instead of "python" so the virtualenv/interpreter in use is
honored.
🧹 Nitpick comments (1)
flashinfer_bench/agents/ncu.py (1)
96-160: Align error-handling with the docstring (currently returns strings/output; does not raiseRuntimeError).
Either update the docstring to explicitly document “returns an ERROR: … string / returns output on failure”, or switch to raising exceptions for timeout / missing NCU / non-zero exit.Also applies to: 214-237
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer_bench/agents/ncu.py
🧰 Additional context used
🪛 Ruff (0.14.10)
flashinfer_bench/agents/ncu.py
169-169: Avoid specifying long messages outside the exception class
(TRY003)
173-173: Avoid specifying long messages outside the exception class
(TRY003)
179-179: Avoid specifying long messages outside the exception class
(TRY003)
184-187: Avoid specifying long messages outside the exception class
(TRY003)
216-216: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
🔇 Additional comments (2)
flashinfer_bench/agents/ncu.py (2)
167-180: Validation approach looks solid (allowlists + helpful messages).
Good tradeoff for an LLM-exposed tool: constrainset/page/sectionsto known-safe values.
51-81: Validation for--page,--set, and--sectionis already in place; no further changes needed.The codebase already defines valid values for these flags (lines 15-48) and validates them at runtime (lines 168-179) with clear error messages. The documented supported values align with NVIDIA's Nsight Compute CLI documentation. The runtime validation approach is sufficient for handling compatibility.
| def _truncate_output(output: str, max_lines: int) -> str: | ||
| """Truncate output to max_lines.""" | ||
| lines = output.split("\n") | ||
| if len(lines) <= max_lines: | ||
| return output | ||
|
|
||
| truncated = lines[:max_lines] | ||
| remaining = len(lines) - max_lines | ||
| truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]") | ||
| return "\n".join(truncated) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix truncation edge-cases (max_lines=0, negatives) and remove the extra blank line in the truncation marker.
if max_lines: skips truncation for 0, and the f"\n[Output truncated...]" introduces an extra blank line.
Proposed diff
@@
def _truncate_output(output: str, max_lines: int) -> str:
"""Truncate output to max_lines."""
+ if max_lines < 0:
+ raise ValueError("max_lines must be >= 0")
lines = output.split("\n")
if len(lines) <= max_lines:
return output
@@
truncated = lines[:max_lines]
remaining = len(lines) - max_lines
- truncated.append(f"\n[Output truncated: {remaining} more lines, use max_lines=None to see all]")
+ truncated.append(f"[Output truncated: {remaining} more lines, use max_lines=None to see all]")
return "\n".join(truncated)
@@
- if max_lines:
+ if max_lines is not None:
output = _truncate_output(output, max_lines)Also applies to: 233-236
🤖 Prompt for AI Agents
In @flashinfer_bench/agents/ncu.py around lines 84 - 94, The _truncate_output
function incorrectly skips truncation for max_lines=0 and inserts an extra blank
line before the truncation marker; change the logic to treat max_lines=None or
max_lines<0 as "no truncation" (return the full output), but allow max_lines=0
to return no original lines plus a single truncation marker, and remove the
leading newline in the marker so it is appended as its own line (e.g., use
"[Output truncated: ...]" not "\n[Output...]"); apply the same fix to the other
truncation occurrence around lines 233-236 (update the same function/logic
there).
Signed-off-by: Ubospica <ubospica@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the NCU profiling tool, primarily for use by LLM agents. It adds a standalone solution runner for profiling, new agent functions to interact with NCU, and a schema generator to make these functions available as tools. The code is well-structured, documented, and includes good error handling. I have a couple of suggestions to further improve the robustness and API design of the new NCU module.
| try: | ||
| sets_result = subprocess.run( | ||
| [ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10 | ||
| ) | ||
| sections_result = subprocess.run( | ||
| [ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10 | ||
| ) | ||
| except subprocess.TimeoutExpired: | ||
| return "ERROR: NCU command timed out." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation doesn't check the return codes of the ncu subprocess calls. If a command fails, its error output is concatenated into the result string, but the function doesn't explicitly signal an error (e.g., by returning a string prefixed with ERROR:). This is inconsistent with the error handling in flashinfer_bench_run_ncu and can be misleading for the calling agent. Checking the returncode and returning a formatted error message would make the tool more robust and provide clearer feedback.
| try: | |
| sets_result = subprocess.run( | |
| [ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10 | |
| ) | |
| sections_result = subprocess.run( | |
| [ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10 | |
| ) | |
| except subprocess.TimeoutExpired: | |
| return "ERROR: NCU command timed out." | |
| try: | |
| sets_result = subprocess.run( | |
| [ncu_path, "--list-sets"], capture_output=True, text=True, timeout=10 | |
| ) | |
| if sets_result.returncode != 0: | |
| return f"ERROR: 'ncu --list-sets' failed:\n{sets_result.stdout}{sets_result.stderr}" | |
| sections_result = subprocess.run( | |
| [ncu_path, "--list-sections"], capture_output=True, text=True, timeout=10 | |
| ) | |
| if sections_result.returncode != 0: | |
| return f"ERROR: 'ncu --list-sections' failed:\n{sections_result.stdout}{sections_result.stderr}" | |
| except subprocess.TimeoutExpired: | |
| return "ERROR: NCU command timed out." |
|
|
||
| def flashinfer_bench_run_ncu( | ||
| solution: Union[Solution, str], | ||
| workload: Union[Workload, str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has many optional parameters. To improve the API's clarity and prevent accidental misuse (like passing arguments in the wrong order), it's a good practice to make these optional parameters keyword-only. You can achieve this by adding a * after the last positional argument (workload).
| workload: Union[Workload, str], | |
| workload: Union[Workload, str], | |
| *, |
|
Merging this first for agile development |
This PR adds NCU tool to profile solutions. This can also be called by LLMs to build agents.
Signed-off-by: Ubospica ubospica@gmail.com
Summary by CodeRabbit
Release Notes
New Features
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.