-
Notifications
You must be signed in to change notification settings - Fork 485
feat(swe-bench): implement iterative predictor for SWE-bench #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat(swe-bench): implement iterative predictor for SWE-bench #1414
Conversation
- Add IterativeAgent - Add config_iterative.yml - Add git tools - Add SweBenchPredictorIterativeConfig - Register iterative predictor and git tool - Update README.md Signed-off-by: Jerry Guan <[email protected]>
WalkthroughAdds an iterative agent-based SWE-bench predictor that runs shell commands step-by-step against a checked-out repo, observes outputs, iteratively refines fixes via an LLM until completion or limits, and provides a git-diff patch; includes repo management tools and config/registration updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as SWE-Bench Client
participant Predictor as SweBenchPredictor
participant Repo as RepoManager
participant Agent as IterativeAgent
participant LLM as LLM Backend
participant Executor as Command Executor
Client->>Predictor: predict_fn(swebench_input)
Predictor->>Repo: setup_repository(repo_url, commit)
Repo-->>Predictor: RepoContext
Predictor->>Agent: instantiate with config & builder
Predictor->>Agent: run(task_description, repo_path)
loop until COMPLETE or limits
Agent->>LLM: _query_llm(prompt/messages)
LLM-->>Agent: response (one bash code block)
Agent->>Executor: _execute_action(bash_command)
Executor->>Repo: run command in repo workspace
Repo-->>Executor: stdout/stderr/return_code
Executor-->>Agent: observation (truncated if needed)
Agent->>Agent: add_message(assistant,response)
Agent->>Agent: add_message(user,observation)
Agent->>Agent: check for COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT
end
alt Completed
Agent-->>Predictor: (patch, status)
else Error/Timeout/Limits
Agent-->>Predictor: (error_message, status)
end
Predictor->>Repo: cleanup()
Predictor-->>Client: final patch or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml`:
- Around line 1-6: Add the standard SPDX Apache-2.0 license header as the very
first lines of the YAML (before the "llms" key); update the top of the file
containing the "llms" / "claude_sonnet_llm" entries to begin with the SPDX
Apache-2.0 header so the file complies with the repo policy.
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py`:
- Around line 76-79: The async function clone_repository uses the synchronous
blocking call Repo.clone_from which will block the event loop; change the
implementation to run Repo.clone_from in a background thread (e.g., via
asyncio.to_thread) and await that result so the function remains async and
non-blocking. Locate the clone_repository function and replace the direct call
to Repo.clone_from(repo_url, target_path) with an awaited asyncio.to_thread call
(or equivalent executor) that invokes Repo.clone_from, and keep the logger.info
call as-is.
- Around line 82-85: The checkout_commit function performs blocking I/O by
calling the synchronous repo.git.checkout; change checkout_commit to have an
explicit return type hint (-> None) and call the blocking operation inside
asyncio.to_thread (e.g., await asyncio.to_thread(repo.git.checkout,
commit_hash)) so the checkout runs off the event loop; keep the logger.info call
and docstring unchanged and reference the function name checkout_commit and the
blocking call repo.git.checkout when making the change.
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py`:
- Around line 41-53: The git_operations function lacks input validation and
error handling: catch JSONDecodeError around json.loads(args_str) and return or
raise a clear error message, validate presence of 'operation' and for operation
== "setup" ensure required keys 'repo_url' and 'base_commit' exist before
calling repo_manager.setup_repository (raise ValueError or return a descriptive
error if missing), and wrap the repo_manager.setup_repository and
repo_manager.cleanup calls to catch and log exceptions so callers receive
actionable error messages referencing git_operations and
repo_manager.setup_repository/cleanup.
🧹 Nitpick comments (11)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (1)
24-25: Unused import:FunctionRef
FunctionRefis imported but not used in this file. OnlyLLMRefis used for thellm_namefield.🧹 Remove unused import
from nat.data_models.common import TypedBaseModel -from nat.data_models.component_ref import FunctionRef from nat.data_models.component_ref import LLMRefexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (2)
37-67: Add type hints per coding guidelines.The class is missing type hints on
__init__,active_repos, andcleanup(). Per coding guidelines, all public APIs require type hints.📝 Add type hints
class RepoManager: + active_repos: dict[str, RepoContext] - def __init__(self, workspace_dir: str): + def __init__(self, workspace_dir: str) -> None: self.workspace = Path(workspace_dir) self.workspace.mkdir(parents=True, exist_ok=True) - self.active_repos = {} + self.active_repos: dict[str, RepoContext] = {} # ... setup_repository unchanged ... - async def cleanup(self): + async def cleanup(self) -> None: """Clean up all managed repositories."""
25-34: Misleading docstring: not a context manager.The docstring states "Context manager for repository operations" but
RepoContextis a plain dataclass without__enter__/__exit__methods. Consider updating the docstring to reflect its actual purpose as a data container.📝 Fix docstring
`@dataclass` class RepoContext: - """Context manager for repository operations.""" + """Data container holding repository state and paths.""" repo_url: strexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (2)
25-29: Redundant_typefield.The
_typefield is redundant sinceTypedBaseModel(parent ofFunctionBaseConfig) already manages the type discriminator vianame="git_repo_tool". This creates potential confusion with two type fields.🧹 Remove redundant field
class GitRepoToolConfig(FunctionBaseConfig, name="git_repo_tool"): """Configuration for git repository management tool.""" - _type: typing.Literal["git_repo_tool"] = "git_repo_tool" workspace_dir: str = "./.workspace" # Base directory for cloning repositories cleanup_on_exit: bool = True # Whether to clean up repos after use
32-60: Unusedbuilderparameter is acceptable for interface consistency.The
builderparameter is unused (as flagged by static analysis) but is likely required by theregister_functiondecorator's expected signature. The cleanup pattern usingtry/finallyis well implemented.Consider adding a return type hint for the async generator:
📝 Add return type hint
+from collections.abc import AsyncGenerator + `@register_function`(config_type=GitRepoToolConfig) -async def git_repo_tool(tool_config: GitRepoToolConfig, builder: Builder): +async def git_repo_tool(tool_config: GitRepoToolConfig, builder: Builder) -> AsyncGenerator[FunctionInfo, None]: """Git repository management tool for SWE Bench."""examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py (6)
76-81: Consider adding docstrings for the configuration fields.The dataclass lacks documentation for its fields. While the class docstring exists, individual field descriptions would improve clarity.
📝 Suggested improvement
`@dataclass` class IterativeAgentConfig: """Configuration for the iterative agent.""" - step_limit: int = 250 - timeout: int = 60 - max_output_length: int = 10000 + step_limit: int = 250 # Maximum number of agent steps before termination + timeout: int = 60 # Command execution timeout in seconds + max_output_length: int = 10000 # Maximum characters before output truncation
105-110: Add type hint forllmparameter.The
llmparameter lacks a type annotation. Per coding guidelines, all public APIs require type hints on parameters.📝 Suggested fix
- def __init__(self, llm, repo_path: Path, config: IterativeAgentConfig): + def __init__(self, llm: typing.Any, repo_path: Path, config: IterativeAgentConfig): self.llm = llm self.repo_path = repo_path self.config = config - self.messages: list = [] + self.messages: list[SystemMessage | HumanMessage | AIMessage] = [] self.n_steps = 0Note: Add
import typingat the top if not already present. Ideally, use the actual LLM interface type if available from the framework.
360-363: Chain exception and use explicit conversion.Per coding guidelines, use
raise ... from errto preserve the exception chain and use explicit conversion flag instead ofstr(e).🔧 Proposed fix
except Exception as e: logger.error("LLM invocation failed: %s", e, exc_info=True) - # recoverable error, let the agent continue - raise NonTerminatingException(f"LLM call failed: {str(e)}") + # recoverable error, let the agent continue + raise NonTerminatingException(f"LLM call failed: {e!s}") from e
414-427: Chain exceptions and narrow the exception type.Multiple issues flagged by static analysis:
- Missing exception chaining at lines 425 and 427
- Catching broad
Exceptionat line 426 masks specific errors🔧 Proposed fix
except (TimeoutError, subprocess.TimeoutExpired) as e: # Extract output from exception if available (only subprocess.TimeoutExpired has output attribute) if isinstance(e, subprocess.TimeoutExpired) and hasattr(e, "output") and e.output: output = e.output.decode("utf-8", errors="replace") else: output = "" # Format timeout message using template timeout_message = self._TIMEOUT_TEMPLATE.format( action=command, output=output ) - raise ExecutionTimeoutError(timeout_message) - except Exception as e: - raise NonTerminatingException(f"Error executing command: {str(e)}") + raise ExecutionTimeoutError(timeout_message) from e + except OSError as e: + raise NonTerminatingException(f"Error executing command: {e!s}") from eUsing
OSError(orsubprocess.SubprocessError) is more appropriate than catching all exceptions, as it covers typical subprocess failures without masking unexpected errors.
462-464: Remove redundant exception object fromlogger.exception.When using
logger.exception(), the exception info is automatically included. Includingeas an argument is redundant (TRY401).🔧 Proposed fix
except Exception as e: - logger.exception("Failed to setup repository: %s", e) - return f"Error: Failed to setup repository - {str(e)}" + logger.exception("Failed to setup repository") + return f"Error: Failed to setup repository - {e!s}"
493-495: Remove redundant exception object and use explicit conversion.Same pattern as above -
logger.exception()automatically includes exception info.🔧 Proposed fix
except Exception as e: - logger.exception(f"Error processing {swebench_input.instance_id}: {e}") - return f"Error: {str(e)}" + logger.exception("Error processing %s", swebench_input.instance_id) + return f"Error: {e!s}"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
examples/evaluation_and_profiling/swe_bench/README.mdexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/__init__.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/__init__.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.py
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{md,mdx}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{md,mdx}: Use 'NVIDIA NeMo Agent toolkit' for full name (first use), 'NeMo Agent toolkit' or 'the toolkit' for subsequent references, and 'Toolkit' (capital T) in titles/headings, 'toolkit' (lowercase t) in body text
Never use deprecated names: 'Agent Intelligence toolkit', 'aiqtoolkit', 'AgentIQ', 'AIQ', or 'aiq' in documentation; update any occurrences unless intentionally referring to deprecated versions or implementing compatibility layers
Files:
examples/evaluation_and_profiling/swe_bench/README.md
**/*.{md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{md,mdx,rst}: Documentation must be clear, comprehensive, and free of TODOs, FIXMEs, placeholder text, offensive or outdated terms, and spelling mistakes
Do not use words listed in 'ci/vale/styles/config/vocabularies/nat/reject.txt' in documentation
Words listed in 'ci/vale/styles/config/vocabularies/nat/accept.txt' are acceptable even if they appear to be spelling mistakes
Files:
examples/evaluation_and_profiling/swe_bench/README.md
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
examples/evaluation_and_profiling/swe_bench/README.mdexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
examples/evaluation_and_profiling/swe_bench/README.mdexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
examples/evaluation_and_profiling/swe_bench/README.mdexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/swe_bench/README.mdexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
🧠 Learnings (1)
📚 Learning: 2025-12-12T20:49:44.305Z
Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
🧬 Code graph analysis (5)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (2)
src/nat/data_models/common.py (3)
TypedBaseModel(96-171)static_type(157-158)discriminator(165-171)src/nat/data_models/component_ref.py (2)
FunctionRef(94-102)LLMRef(116-124)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (3)
src/nat/builder/function_info.py (2)
FunctionInfo(290-625)from_fn(552-625)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (3)
RepoManager(37-67)setup_repository(44-58)cleanup(60-67)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
src/nat/runtime/runner.py (1)
context(93-94)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.py (1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py (1)
SweBenchPredictor(431-502)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py (3)
src/nat/builder/builder.py (1)
Builder(84-811)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-27)examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (1)
SweBenchWorkflowConfig(51-52)
🪛 Ruff (0.14.11)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
33-33: Unused function argument: builder
(ARG001)
53-53: Avoid specifying long messages outside the exception class
(TRY003)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py
127-127: Avoid specifying long messages outside the exception class
(TRY003)
327-327: Avoid specifying long messages outside the exception class
(TRY003)
359-359: Consider moving this statement to an else block
(TRY300)
363-363: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
363-363: Avoid specifying long messages outside the exception class
(TRY003)
363-363: Use explicit conversion flag
Replace with conversion flag
(RUF010)
379-379: subprocess call with shell=True identified, security issue
(S602)
412-412: Consider moving this statement to an else block
(TRY300)
425-425: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
426-426: Do not catch blind exception: Exception
(BLE001)
427-427: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
427-427: Avoid specifying long messages outside the exception class
(TRY003)
427-427: Use explicit conversion flag
Replace with conversion flag
(RUF010)
463-463: Redundant exception object included in logging.exception call
(TRY401)
464-464: Use explicit conversion flag
Replace with conversion flag
(RUF010)
494-494: Redundant exception object included in logging.exception call
(TRY401)
495-495: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (11)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (1)
40-49: LGTM!The
SweBenchPredictorIterativeConfigfollows the established pattern, with appropriate type hints and field descriptions. The discriminated union is correctly extended to include the new iterative variant.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
70-73: LGTM!Simple helper with proper type hints.
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/register_tools.py (1)
19-19: LGTM!The import correctly triggers registration of the
git_repo_toolvia its decorator, following the established pattern in this file.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/register.py (1)
20-20: The import follows the established pattern for predictor registration.The addition of
IterativePredictoron line 20 mirrors the existingGoldPredictorimport on line 19, with properflake8: noqadirectives to allow unused imports (which are intentionally present to trigger registration side-effects). The file maintains proper Apache 2.0 licensing and copyright headers.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/predict_iterative.py (4)
1-14: LGTM!License header is correctly formatted with SPDX identifier and Apache 2.0 license text.
52-74: LGTM!Well-structured exception hierarchy with clear separation between recoverable (
NonTerminatingException) and terminal (TerminatingException) conditions. This enables clean control flow in the agent loop.
376-389: Acknowledged:shell=Trueis intentional for this agent design.The static analysis flags
shell=Trueas a security concern (S602). However, this is expected behavior since the agent is explicitly designed to execute arbitrary bash commands provided by the LLM. The commands are sandboxed within the repository workspace directory.
497-502: LGTM!The
_build_task_descriptionmethod is clean and well-typed with appropriate type hints.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml (2)
23-35: LGTM!Workflow and functions configuration is well-structured with appropriate iterative predictor settings and git tool configuration. The 2-space indentation follows repository conventions. Based on learnings, this matches the existing YAML style in the repository.
37-70: LGTM!Evaluation configuration is properly structured with dataset filtering for focused testing. The single-instance allowlist is appropriate for initial development/testing.
examples/evaluation_and_profiling/swe_bench/README.md (1)
159-159: LGTM!Documentation entry for the iterative predictor follows the established pattern and provides clear, concise description with appropriate links to implementation and configuration files.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
Show resolved
Hide resolved
...ion_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
Outdated
Show resolved
Hide resolved
| async def checkout_commit(repo: Repo, commit_hash: str): | ||
| """Checkout a specific commit in the repository.""" | ||
| logger.info("Checking out commit %s", commit_hash) | ||
| repo.git.checkout(commit_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same blocking I/O issue; also missing return type hint.
repo.git.checkout() is synchronous. Wrap with asyncio.to_thread(). Per coding guidelines, add return type hint.
🔧 Proposed fix
-async def checkout_commit(repo: Repo, commit_hash: str):
+async def checkout_commit(repo: Repo, commit_hash: str) -> None:
"""Checkout a specific commit in the repository."""
logger.info("Checking out commit %s", commit_hash)
- repo.git.checkout(commit_hash)
+ await asyncio.to_thread(repo.git.checkout, commit_hash)🤖 Prompt for AI Agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py`
around lines 82 - 85, The checkout_commit function performs blocking I/O by
calling the synchronous repo.git.checkout; change checkout_commit to have an
explicit return type hint (-> None) and call the blocking operation inside
asyncio.to_thread (e.g., await asyncio.to_thread(repo.git.checkout,
commit_hash)) so the checkout runs off the event loop; keep the logger.info call
and docstring unchanged and reference the function name checkout_commit and the
blocking call repo.git.checkout when making the change.
...ion_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py`:
- Around line 69-74: Docstring for SweBenchWorkflowConfig incorrectly lists a
"full" predictor option that doesn't exist; update the docstring to reflect the
actual predictor options defined in SweBenchPredictorConfig (e.g., "gold,
skeleton, or iterative") or enumerate the exact union variants from
SweBenchPredictorConfig so the docs and code are consistent; locate the class
SweBenchWorkflowConfig and its docstring and replace "full, gold, skeleton, or
iterative" with the correct set of predictor types from SweBenchPredictorConfig.
- Around line 24-25: Remove the unused import FunctionRef from the top of the
module: delete the "FunctionRef" import token in the import statement that
currently reads "from nat.data_models.component_ref import FunctionRef" so only
LLMRef remains imported (referenced symbol: FunctionRef).
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py`:
- Around line 71-74: get_repo_path currently builds a path from only the repo
name causing collisions; update get_repo_path to parse the repo URL and extract
the owner/organization component (e.g., the segment immediately preceding the
repo name for HTTPS and the part after ":" for SSH forms) and return
Path(workspace_dir) / owner / repo_name so repositories with the same name under
different orgs are distinct; ensure you handle URLs like
"https://host/org/repo.git" and "git@host:org/repo.git" and strip ".git" from
repo_name.
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py`:
- Around line 32-38: The git_repo_tool function declares an unused parameter
named builder; rename it to _builder to follow the codebase convention for
intentionally unused parameters (update the function signature async def
git_repo_tool(tool_config: GitRepoToolConfig, _builder: Builder): and any
references in the decorator/register_function call if necessary) so
linters/readers know it is intentionally unused.
♻️ Duplicate comments (1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml (1)
16-28: Duplicatellmskey causes configuration to be overwritten.The YAML has two separate
llms:keys (lines 16 and 23). In YAML, duplicate keys at the same level cause the second to overwrite the first, meaningnim_llmwill be silently discarded and onlyclaude_sonnet_llmwill be available.Additionally,
nim_llmuses 1-space indentation whileclaude_sonnet_llmuses 2-space indentation. Per learnings, the repository uses 2-space indentation for YAML files.🔧 Proposed fix - merge into single llms block with consistent 2-space indentation
-llms: - nim_llm: - _type: nim - model_name: mistralai/mistral-nemotron - temperature: 0.6 - max_tokens: 4096 - -llms: - claude_sonnet_llm: - _type: litellm - model_name: anthropic/claude-sonnet-4-5-20250929 - temperature: 0.0 - api_key: "${ANTHROPIC_API_KEY}" # Set this environment variable before running +llms: + nim_llm: + _type: nim + model_name: mistralai/mistral-nemotron + temperature: 0.6 + max_tokens: 4096 + + claude_sonnet_llm: + _type: litellm + model_name: anthropic/claude-sonnet-4-5-20250929 + temperature: 0.0 + api_key: "${ANTHROPIC_API_KEY}" # Set this environment variable before running
🧹 Nitpick comments (3)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (2)
84-88: Add return type hint forcheckout_commit.Per coding guidelines, all public APIs require type hints on return values. This function returns
Noneimplicitly.🔧 Proposed fix
-async def checkout_commit(repo: Repo, commit_hash: str): +async def checkout_commit(repo: Repo, commit_hash: str) -> None: """Checkout a specific commit in the repository.""" logger.info("Checking out commit %s", commit_hash) # Use asyncio.to_thread to avoid blocking the event loop during checkout await asyncio.to_thread(repo.git.checkout, commit_hash)
38-43: Add type hint foractive_reposdictionary.Per coding guidelines, type hints are required. The dictionary maps repo paths to
RepoContextobjects.🔧 Proposed fix
class RepoManager: def __init__(self, workspace_dir: str): self.workspace = Path(workspace_dir) self.workspace.mkdir(parents=True, exist_ok=True) - self.active_repos = {} + self.active_repos: dict[str, RepoContext] = {}examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml (1)
80-82: Remove trailing blank lines.YAMLlint reports too many blank lines at the end of the file. Files should end with a single newline.
🔧 Proposed fix
evaluators: swe_bench: _type: swe_bench run_id: nat_iterative_1 clean: true -
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
🧠 Learnings (6)
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst} : Every file must start with the standard SPDX Apache-2.0 header
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst} : All source files must include the SPDX Apache-2.0 header template
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-12-03T18:42:23.494Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-11-05T11:45:35.119Z
Learnt from: thepatrickchin
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1152
File: examples/config_inheritance/pyproject.toml:1-25
Timestamp: 2025-11-05T11:45:35.119Z
Learning: In the examples/ directory, pyproject.toml files typically do not include SPDX license headers, with only one exception (adk_demo). This is an established pattern that differs from the general guideline requiring SPDX headers in all .toml files.
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-12-12T20:49:44.305Z
Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.py : Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
🧬 Code graph analysis (3)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (2)
src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (3)
RepoManager(38-68)setup_repository(45-59)cleanup(61-68)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (3)
src/nat/data_models/common.py (4)
BaseModelRegistryTag(91-93)TypedBaseModel(96-171)static_type(157-158)discriminator(165-171)src/nat/data_models/component_ref.py (2)
FunctionRef(94-102)LLMRef(116-124)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
src/nat/runtime/runner.py (1)
context(93-94)
🪛 Ruff (0.14.11)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
33-33: Unused function argument: builder
(ARG001)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
51-51: Avoid specifying long messages outside the exception class
(TRY003)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
🪛 YAMLlint (1.37.1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
[error] 23-23: duplication of key "llms" in mapping
(key-duplicates)
[warning] 82-82: too many blank lines (2 > 0)
(empty-lines)
🔇 Additional comments (5)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
77-81: LGTM - blocking I/O properly handled.The use of
asyncio.to_threadcorrectly offloads the blockingRepo.clone_fromcall to a thread pool, preventing event loop blocking.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (2)
41-59: Input validation improvements look good.The JSON parsing is now wrapped in a try-except, and required keys are validated before use. This addresses the past review feedback.
61-66: Good resource cleanup pattern withfinallyblock.The use of
try/finallyensuresrepo_manager.cleanup()is called even if an exception occurs during the tool's lifecycle, preventing orphaned repository directories.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (1)
51-61: LGTM -SweBenchPredictorIterativeConfigis well-structured.The configuration properly uses
LLMReffor the LLM reference, andFieldwith descriptions forstep_limitandtimeout. The defaults (250 steps, 60s timeout) are reasonable for iterative problem solving.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml (1)
1-14: LGTM - SPDX license header is present.The required Apache-2.0 license header has been added, addressing the past review feedback.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py
Outdated
Show resolved
Hide resolved
...ion_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
Outdated
Show resolved
Hide resolved
| @register_function(config_type=GitRepoToolConfig) | ||
| async def git_repo_tool(tool_config: GitRepoToolConfig, builder: Builder): | ||
| """Git repository management tool for SWE Bench.""" | ||
| import json | ||
|
|
||
| from .git_tool import RepoManager | ||
| repo_manager = RepoManager(tool_config.workspace_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find and read the register.py file
fd "register.py" -p "examples/evaluation_and_profiling/swe_bench"
# Get the full content to understand context
find . -path "*/examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py" -type f -exec wc -l {} \;Repository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 450
🏁 Script executed:
# Search for all `@register_function` decorated functions to check builder parameter usage
rg "@register_function" -A 3 --type=pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 50381
🏁 Script executed:
# Find the definition of register_function decorator
rg "def register_function" -B 2 -A 10 --type=pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 6591
🏁 Script executed:
cat -n examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 3277
Prefix unused builder parameter with underscore.
The builder parameter is required by the @register_function interface but is not used in this function. Follow the established pattern in the codebase by renaming it to _builder to indicate intentional non-use.
🔧 Suggested fix
`@register_function`(config_type=GitRepoToolConfig)
-async def git_repo_tool(tool_config: GitRepoToolConfig, builder: Builder):
+async def git_repo_tool(tool_config: GitRepoToolConfig, _builder: Builder):
"""Git repository management tool for SWE Bench."""🧰 Tools
🪛 Ruff (0.14.11)
33-33: Unused function argument: builder
(ARG001)
🤖 Prompt for AI Agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py`
around lines 32 - 38, The git_repo_tool function declares an unused parameter
named builder; rename it to _builder to follow the codebase convention for
intentionally unused parameters (update the function signature async def
git_repo_tool(tool_config: GitRepoToolConfig, _builder: Builder): and any
references in the decorator/register_function call if necessary) so
linters/readers know it is intentionally unused.
Signed-off-by: Jerry Guan <[email protected]>
a39f138 to
dbc2dd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml`:
- Around line 16-28: The YAML defines two separate top-level llms mappings which
causes the first (nim_llm) to be overwritten by the second (claude_sonnet_llm);
merge both entries under a single llms key so both nim_llm and claude_sonnet_llm
are present, and fix nim_llm’s indentation to match the file’s 2-space style;
locate the nim_llm and claude_sonnet_llm blocks and combine them into one llms
mapping preserving their model_name, temperature, max_tokens and api_key fields.
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py`:
- Around line 33-35: RepoContext.__post_init__ sets repo_path = base_path /
repo_name which omits the organization and disagrees with get_repo_path; update
__post_init__ so repo_path includes the org (e.g., repo_path = base_path /
self.org_name / self.repo_name) or call the existing get_repo_path logic to
compute it, ensuring RepoContext.repo_path matches the path used by
setup_repository and clone operations.
♻️ Duplicate comments (1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
88-92: Add missing return type hint.Per coding guidelines, all public functions require type hints. This async function returns
None.🔧 Proposed fix
-async def checkout_commit(repo: Repo, commit_hash: str): +async def checkout_commit(repo: Repo, commit_hash: str) -> None: """Checkout a specific commit in the repository."""
🧹 Nitpick comments (4)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml (1)
81-82: Remove extra trailing blank line.YAMLlint reports too many blank lines at the end. Files should end with exactly one newline.
🔧 Proposed fix
clean: true -examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (2)
1-1: Copyright year should be updated to 2025-2026.Other files in this PR use
2025-2026in the copyright header. This file uses only2025.🔧 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
32-34: Prefix unusedbuilderparameter with underscore.The
builderparameter is required by the@register_functioninterface but is unused. Follow the codebase convention by renaming to_builder.🔧 Proposed fix
`@register_function`(config_type=GitRepoToolConfig) -async def git_repo_tool(tool_config: GitRepoToolConfig, builder: Builder): +async def git_repo_tool(tool_config: GitRepoToolConfig, _builder: Builder): """Git repository management tool for SWE Bench."""examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
1-1: Copyright year should be updated to 2025-2026.For consistency with other files in this PR.
🔧 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.pyexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.ymlexamples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
🧠 Learnings (6)
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.py : Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst} : Every file must start with the standard SPDX Apache-2.0 header
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst} : All source files must include the SPDX Apache-2.0 header template
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-12-03T18:42:23.494Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-11-05T11:45:35.119Z
Learnt from: thepatrickchin
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1152
File: examples/config_inheritance/pyproject.toml:1-25
Timestamp: 2025-11-05T11:45:35.119Z
Learning: In the examples/ directory, pyproject.toml files typically do not include SPDX license headers, with only one exception (adk_demo). This is an established pattern that differs from the general guideline requiring SPDX headers in all .toml files.
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
📚 Learning: 2025-12-12T20:49:44.305Z
Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.
Applied to files:
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
🧬 Code graph analysis (3)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (3)
src/nat/data_models/common.py (4)
BaseModelRegistryTag(91-93)TypedBaseModel(96-171)static_type(157-158)discriminator(165-171)src/nat/data_models/component_ref.py (1)
LLMRef(116-124)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
src/nat/runtime/runner.py (1)
context(93-94)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (3)
RepoManager(38-68)setup_repository(45-59)cleanup(61-68)
🪛 Ruff (0.14.11)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py
33-33: Unused function argument: builder
(ARG001)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
51-51: Avoid specifying long messages outside the exception class
(TRY003)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
🪛 YAMLlint (1.37.1)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml
[error] 23-23: duplication of key "llms" in mapping
(key-duplicates)
[warning] 82-82: too many blank lines (2 > 0)
(empty-lines)
🔇 Additional comments (6)
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/config.py (3)
50-60: LGTM! Well-structured iterative predictor configuration.The new
SweBenchPredictorIterativeConfigis properly defined with appropriate type hints and sensible defaults. TheLLMReftype correctly references the component reference pattern used elsewhere in the codebase.Minor nit: Line 57 has trailing whitespace after the closing triple quotes.
62-66: Discriminated union correctly extended.The
SweBenchPredictorConfigunion properly includes the new iterative config using the establishedTag/Discriminatorpattern.
68-74: Docstring now correctly lists available predictor types.The docstring accurately reflects the available options: "gold, skeleton, or iterative".
examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/register.py (2)
41-59: Good input validation and error handling.The function properly validates JSON input, checks for required keys on the
setupoperation, and provides clear error messages. This addresses the concerns from prior reviews.
61-66: Correct async generator pattern with guaranteed cleanup.The
try/finallyensurescleanup()runs when the tool is disposed, respecting thecleanup_on_exitconfig flag.examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py (1)
81-85: LGTM! Blocking I/O correctly wrapped withasyncio.to_thread.The clone operation now properly yields to the event loop, addressing the prior review concern.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| llms: | ||
| nim_llm: | ||
| _type: nim | ||
| model_name: mistralai/mistral-nemotron | ||
| temperature: 0.6 | ||
| max_tokens: 4096 | ||
|
|
||
| llms: | ||
| claude_sonnet_llm: | ||
| _type: litellm | ||
| model_name: anthropic/claude-sonnet-4-5-20250929 | ||
| temperature: 0.0 | ||
| api_key: "${ANTHROPIC_API_KEY}" # Set this environment variable before running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate llms key causes first LLM configuration to be silently overwritten.
The YAML has two separate llms: keys (lines 16 and 23). YAML spec allows duplicate keys but only the last one is retained, so nim_llm will be silently discarded and only claude_sonnet_llm will be available at runtime.
Additionally, the nim_llm block uses 1-space indentation (lines 17-21) instead of the 2-space standard used elsewhere.
🔧 Proposed fix - merge into single llms block
llms:
-nim_llm:
- _type: nim
- model_name: mistralai/mistral-nemotron
- temperature: 0.6
- max_tokens: 4096
-
-llms:
+ nim_llm:
+ _type: nim
+ model_name: mistralai/mistral-nemotron
+ temperature: 0.6
+ max_tokens: 4096
claude_sonnet_llm:
_type: litellm
model_name: anthropic/claude-sonnet-4-5-20250929
temperature: 0.0
api_key: "${ANTHROPIC_API_KEY}" # Set this environment variable before running🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 23-23: duplication of key "llms" in mapping
(key-duplicates)
🤖 Prompt for AI Agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/configs/config_iterative.yml`
around lines 16 - 28, The YAML defines two separate top-level llms mappings
which causes the first (nim_llm) to be overwritten by the second
(claude_sonnet_llm); merge both entries under a single llms key so both nim_llm
and claude_sonnet_llm are present, and fix nim_llm’s indentation to match the
file’s 2-space style; locate the nim_llm and claude_sonnet_llm blocks and
combine them into one llms mapping preserving their model_name, temperature,
max_tokens and api_key fields.
| def __post_init__(self): | ||
| self.repo_name = self.repo_url.split('/')[-1].replace('.git', '') | ||
| self.repo_path = self.base_path / self.repo_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RepoContext.repo_path calculation is inconsistent with get_repo_path.
RepoContext.__post_init__ computes repo_path = base_path / repo_name (without organization), but get_repo_path returns workspace_dir / org_name / repo_name (with organization).
In setup_repository (line 57), a RepoContext is created with base_path=self.workspace, so context.repo_path will be workspace/repo_name. However, the actual cloned repository is at workspace/org/repo_name (from get_repo_path at line 47). This means context.repo_path points to the wrong location.
🔧 Proposed fix - align RepoContext with get_repo_path
`@dataclass`
class RepoContext:
"""Context manager for repository operations."""
repo_url: str
base_path: Path
repo: Repo | None = None
def __post_init__(self):
- self.repo_name = self.repo_url.split('/')[-1].replace('.git', '')
- self.repo_path = self.base_path / self.repo_name
+ parts = self.repo_url.rstrip('/').split('/')
+ self.repo_name = parts[-1].replace('.git', '')
+ self.org_name = parts[-2]
+ self.repo_path = self.base_path / self.org_name / self.repo_name🤖 Prompt for AI Agents
In
`@examples/evaluation_and_profiling/swe_bench/src/nat_swe_bench/predictors/predict_iterative/tools/git_tool.py`
around lines 33 - 35, RepoContext.__post_init__ sets repo_path = base_path /
repo_name which omits the organization and disagrees with get_repo_path; update
__post_init__ so repo_path includes the org (e.g., repo_path = base_path /
self.org_name / self.repo_name) or call the existing get_repo_path logic to
compute it, ensuring RepoContext.repo_path matches the path used by
setup_repository and clone operations.
Implemented a iterative agent that solves problems by executing bash commands step-by-step, observing results, and generating patches. Achieved 70% success rate (7/10) in initial evaluation.
How Has This Been Tested?
Description
Closes #1397
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.