feat(iorails): Create IORails hierarchy with content-safety support#1638
feat(iorails): Create IORails hierarchy with content-safety support#1638tgasser-nv wants to merge 7 commits intodevelopfrom
Conversation
|
@greptile review |
Greptile SummaryThis PR introduces the IORails engine — a new optimized orchestration path for input/output content-safety guardrails that bypasses the heavier LLMRails pipeline. The architecture is cleanly layered:
|
| Filename | Overview |
|---|---|
| nemoguardrails/guardrails/rails_manager.py | New RailsManager orchestrating input/output safety checks. Critical bug: _parse_content_safety_output_response uses the wrong parser (nemoguard_parse_prompt_safety instead of nemoguard_parse_response_safety), causing output safety checks to evaluate user input safety instead of bot response safety. Also has a misleading `str |
| nemoguardrails/guardrails/guardrails.py | Top-level Guardrails class updated with IORails routing logic. Correctly implements _has_only_iorails_flows() to detect compatible configs and routes to IORails or LLMRails accordingly. Both engines are always instantiated regardless of which is used. |
| nemoguardrails/guardrails/iorails.py | New IORails engine implementing a clean input-check → generate → output-check pipeline. Well-structured with clear separation of concerns delegating to ModelManager and RailsManager. |
| nemoguardrails/guardrails/model_engine.py | New ModelEngine wrapping aiohttp with retry support for OpenAI-compatible endpoints. Clean error handling, configurable timeouts, and proper API key resolution with graceful fallback for local models. |
| nemoguardrails/guardrails/model_manager.py | New ModelManager managing a collection of ModelEngine instances keyed by model type. Clean lifecycle management with async context manager support. |
| nemoguardrails/guardrails/guardrails_types.py | New shared types module with LLMMessage, LLMMessages type aliases and RailResult frozen dataclass. Clean, minimal type definitions. |
| tests/guardrails/test_rails_manager.py | Comprehensive tests for RailsManager. Output safety tests (lines 286-301) mask the parser bug by only checking the "User Safety" JSON field rather than the "Response Safety" field. |
| pyproject.toml | Adds aiohttp-retry dependency and reformats indentation from 2 to 4 spaces. No functional concerns. |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant IORails
participant RailsManager
participant ModelManager
participant ModelEngine
User->>Guardrails: generate_async(messages)
Guardrails->>Guardrails: _convert_to_messages()
alt _use_iorails_engine == true
Guardrails->>IORails: generate_async(messages)
IORails->>RailsManager: is_input_safe(messages)
RailsManager->>RailsManager: _render_prompt(input_template)
RailsManager->>ModelManager: generate_async("content_safety", prompt)
ModelManager->>ModelEngine: call(messages)
ModelEngine-->>ModelManager: API response
ModelManager-->>RailsManager: response text
RailsManager->>RailsManager: _parse_content_safety_input_response()
RailsManager-->>IORails: RailResult(is_safe)
alt Input is safe
IORails->>ModelManager: generate_async("main", messages)
ModelManager->>ModelEngine: call(messages)
ModelEngine-->>ModelManager: LLM response
ModelManager-->>IORails: response text
IORails->>RailsManager: is_output_safe(messages, response)
RailsManager->>RailsManager: _render_prompt(output_template)
RailsManager->>ModelManager: generate_async("content_safety", prompt)
ModelManager-->>RailsManager: response text
RailsManager->>RailsManager: _parse_content_safety_output_response()
RailsManager-->>IORails: RailResult(is_safe)
IORails-->>Guardrails: {role: assistant, content: response}
else Input is unsafe
IORails-->>Guardrails: {role: assistant, content: REFUSAL_MESSAGE}
end
else _use_iorails_engine == false
Guardrails->>Guardrails: delegate to LLMRails
end
Guardrails-->>User: response
Last reviewed commit: 7a0a2e6
| return RailResult(is_safe=True) | ||
|
|
||
| # TODO: Running these sequentially now, need to update to parallel and early-out if any rail fails | ||
| for flow in self.output_flows: | ||
| result = await self._run_output_rail(flow, messages, response) | ||
| if not result.is_safe: | ||
| return result |
There was a problem hiding this comment.
Same TODO about parallel execution applies here. The OutputRails config also has a parallel field that's not being used.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 88:94
Comment:
Same TODO about parallel execution applies here. The OutputRails config also has a `parallel` field that's not being used.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
@greptile see comment on the input flows above, same applies here
f96039f to
2fc0e2c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
@greptile re-check on latest commit SHA 179b34e2 |
…by_role, remove unused guardrails_models.py
|
@greptile RE-check on latest commit. |
| def _parse_content_safety_output_response(self, response: str) -> RailResult: | ||
| """Use the existing `nemoguard_parse_response_safety` method and convert to RailResult.""" | ||
|
|
||
| result = nemoguard_parse_prompt_safety(response) |
There was a problem hiding this comment.
Wrong parser used for output safety
_parse_content_safety_output_response calls nemoguard_parse_prompt_safety (line 229) instead of nemoguard_parse_response_safety. The docstring on line 227 correctly states it should use nemoguard_parse_response_safety, but the implementation doesn't match.
nemoguard_parse_prompt_safety checks the "User Safety" JSON field, while nemoguard_parse_response_safety checks the "Response Safety" field. This means output rail checks will evaluate whether the user's input was safe rather than whether the bot's response was safe. An unsafe bot response will pass through the output rail as long as the user's original message was safe.
Note that the tests at lines 286-301 (test_output_safe_json and test_output_unsafe_json) also only check the "User Safety" field, which masks this bug — they pass because they're testing against the wrong parser.
| def _parse_content_safety_output_response(self, response: str) -> RailResult: | |
| """Use the existing `nemoguard_parse_response_safety` method and convert to RailResult.""" | |
| result = nemoguard_parse_prompt_safety(response) | |
| def _parse_content_safety_output_response(self, response: str) -> RailResult: | |
| """Use the existing `nemoguard_parse_response_safety` method and convert to RailResult.""" | |
| result = nemoguard_parse_response_safety(response) | |
| rail_result = self._parse_content_safety_result(result) | |
| return rail_result |
This also requires adding nemoguard_parse_response_safety to the import on line 28:
from nemoguardrails.llm.output_parsers import nemoguard_parse_prompt_safety, nemoguard_parse_response_safetyPrompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 226:229
Comment:
**Wrong parser used for output safety**
`_parse_content_safety_output_response` calls `nemoguard_parse_prompt_safety` (line 229) instead of `nemoguard_parse_response_safety`. The docstring on line 227 correctly states it should use `nemoguard_parse_response_safety`, but the implementation doesn't match.
`nemoguard_parse_prompt_safety` checks the `"User Safety"` JSON field, while `nemoguard_parse_response_safety` checks the `"Response Safety"` field. This means output rail checks will evaluate whether the **user's input** was safe rather than whether the **bot's response** was safe. An unsafe bot response will pass through the output rail as long as the user's original message was safe.
Note that the tests at lines 286-301 (`test_output_safe_json` and `test_output_unsafe_json`) also only check the `"User Safety"` field, which masks this bug — they pass because they're testing against the wrong parser.
```suggestion
def _parse_content_safety_output_response(self, response: str) -> RailResult:
"""Use the existing `nemoguard_parse_response_safety` method and convert to RailResult."""
result = nemoguard_parse_response_safety(response)
rail_result = self._parse_content_safety_result(result)
return rail_result
```
This also requires adding `nemoguard_parse_response_safety` to the import on line 28:
```python
from nemoguardrails.llm.output_parsers import nemoguard_parse_prompt_safety, nemoguard_parse_response_safety
```
How can I resolve this? If you propose a fix, please make it concise.| def _render_prompt( | ||
| self, | ||
| prompt_key: str, | ||
| user_input: str = "", | ||
| bot_response: str = "", | ||
| ) -> str | None: |
There was a problem hiding this comment.
Return type annotation includes None but None is never returned
_render_prompt is annotated as returning str | None, but it either raises RuntimeError (line 165) or returns a str (line 170) — it never returns None. The callers at lines 121 and 140 pass the result directly into a message dict without None checks, which would cause a runtime error if None were ever actually returned.
| def _render_prompt( | |
| self, | |
| prompt_key: str, | |
| user_input: str = "", | |
| bot_response: str = "", | |
| ) -> str | None: | |
| def _render_prompt( | |
| self, | |
| prompt_key: str, | |
| user_input: str = "", | |
| bot_response: str = "", | |
| ) -> str: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 153:158
Comment:
**Return type annotation includes `None` but `None` is never returned**
`_render_prompt` is annotated as returning `str | None`, but it either raises `RuntimeError` (line 165) or returns a `str` (line 170) — it never returns `None`. The callers at lines 121 and 140 pass the result directly into a message dict without `None` checks, which would cause a runtime error if `None` were ever actually returned.
```suggestion
def _render_prompt(
self,
prompt_key: str,
user_input: str = "",
bot_response: str = "",
) -> str:
```
How can I resolve this? If you propose a fix, please make it concise.| import logging | ||
| from typing import Sequence, cast | ||
|
|
||
| from nemoguardrails.guardrails.guardrails import LLMMessages |
There was a problem hiding this comment.
Import LLMMessages from guardrails_types instead
LLMMessages is defined in guardrails_types.py and re-exported from guardrails.py. Importing it from guardrails.py creates an unnecessary coupling to the top-level module. Since guardrails_types.py is the canonical location for this type alias (and RailResult is already imported from there on line 26), this import should come from the same place.
| from nemoguardrails.guardrails.guardrails import LLMMessages | |
| from nemoguardrails.guardrails.guardrails_types import LLMMessages |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 25:25
Comment:
**Import `LLMMessages` from `guardrails_types` instead**
`LLMMessages` is defined in `guardrails_types.py` and re-exported from `guardrails.py`. Importing it from `guardrails.py` creates an unnecessary coupling to the top-level module. Since `guardrails_types.py` is the canonical location for this type alias (and `RailResult` is already imported from there on line 26), this import should come from the same place.
```suggestion
from nemoguardrails.guardrails.guardrails_types import LLMMessages
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Description
This PR contains the key pieces of IORails (the new orchestration engine to speed up input and output rails). These are:
The IORails object co-exists with the LLMRails for backwards-compatibility as follows:
__init__()method has a new argumentuse_iorails. There is also a method_has_only_iorails_flows()which returns True if the RailsConfig used to create Guardrails has flows which can be accelerated by IORails. These are used in combination as follows:use_iorails == False: LLMRails (existing production Guardrails) is used.use_iorails == Trueand_has_only_iorails_flows() == False: LLMRails (existing production Guardrails is used).use_iorails == Trueand_has_only_iorails_flows() == True: IORails is used. Unimplemented top-level methods raiseNotImplementedError()s.Still todo (further PRs)
generate()synchronous generation in IORails.Test Plan
Pre-commit
Unit-tests
Local integration chat-test (using NVCF)
OpenAI-compatible Server
Server-side: Note this still instantiates LLMRails, hence the "Registered Actions" log line. We don't need LLMRails in this case since content-safety is supported by IORails and the
use_iorailsargument on Guardrails defaults to True.Client-side
Related Issue(s)
Checklist