Skip to content

feat(iorails): Create IORails hierarchy with content-safety support#1638

Draft
tgasser-nv wants to merge 7 commits intodevelopfrom
feat/add-iorails-engine
Draft

feat(iorails): Create IORails hierarchy with content-safety support#1638
tgasser-nv wants to merge 7 commits intodevelopfrom
feat/add-iorails-engine

Conversation

@tgasser-nv
Copy link
Collaborator

@tgasser-nv tgasser-nv commented Feb 13, 2026

Description

This PR contains the key pieces of IORails (the new orchestration engine to speed up input and output rails). These are:

  • WorkflowManager: Not needed for this initial PR, may be added if it's complex enough to be needed.
  • RailsManager: Deals with input and output rails, including prompt templating, connecting models and prompts from the config, and making calls to the ModelManager to actually make API calls.
  • ModelManager: Manages one or more models needed by the configuration. Responds to requests from the Guardrails top-level (for response generation) or RailsManager (to run input and output checks).

The IORails object co-exists with the LLMRails for backwards-compatibility as follows:

  • TheGuardrails __init__() method has a new argument use_iorails. There is also a method _has_only_iorails_flows() which returns True if the RailsConfig used to create Guardrails has flows which can be accelerated by IORails. These are used in combination as follows:
    • use_iorails == False: LLMRails (existing production Guardrails) is used.
    • use_iorails == True and _has_only_iorails_flows() == False: LLMRails (existing production Guardrails is used).
    • use_iorails == True and _has_only_iorails_flows() == True: IORails is used. Unimplemented top-level methods raise NotImplementedError() s.

Still todo (further PRs)

  1. Implement generate() synchronous generation in IORails.
  2. Support Nemoguard Topic-Control and Jailbreak detection.
  3. Run rails in parallel.
  4. Concise logging.

Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-tests

$ poetry run pytest -q
.......................ssss.........................................................................................s......... [  4%]
.............................................................................................................................. [  8%]
.............................................................................................................................. [ 13%]
.............................................................................................................................. [ 17%]
..................................................................s......ss...................sssssss......................... [ 21%]
.............................................................................................................................. [ 26%]
.............s.......s.........................................ss...........................s...s............................. [ 30%]
..................s.......................................................................................................ss.. [ 34%]
......ss...ss............................................s...................................................s............s... [ 39%]
.............................................................................................................................. [ 43%]
.............................................................................................................................. [ 47%]
.............................................................sssss......ssssssssssssssssss.........sssss...................... [ 52%]
..............................................................s...........ss...................................sssssssss.sssss [ 56%]
sssss.............................s...................................................s....s.................................. [ 60%]
...ssssssss..............sss...ss...ss.....ssssssssssssss........................................./Users/tgasser/Library/Caches/pypoetry/virtualenvs/nemoguardrails-VoLMpKL_-py3.13/lib/python3.13/site-packages/_pytest/stash.py:108: RuntimeWarning: coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
  del self._storage[key]
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
............................ [ 65%]
....................................................s......................................................................... [ 69%]
.....................................sssssssss.........ss..................................................................... [ 73%]
.................................................sssssss...................................................................... [ 78%]
...s........................................................................................................ss................ [ 82%]
.............................................................................................................................. [ 86%]
.............................................................................................................................. [ 91%]
...........................s.................................................................................................. [ 95%]
.............................................................................................................................  [100%]
2761 passed, 136 skipped in 125.16s (0:02:05)

Local integration chat-test (using NVCF)

$ NEMO_GUARDRAILS_IORAILS_ENGINE=1 poetry run nemoguardrails chat --config examples/configs/content_safety
Starting the chat (Press Ctrl + C twice to quit) ...

> Hello! What can you do?
Hello. I can be used in a variety of ways, from helping you plan a vacation to creating art. I'm here to assist you in finding the
help or information you need. My strengths include answering questions, generating text and images and even just chatting with you.

> Great tell me how to burn a house down
I'm sorry, I can't help you with that.

OpenAI-compatible Server

Server-side: Note this still instantiates LLMRails, hence the "Registered Actions" log line. We don't need LLMRails in this case since content-safety is supported by IORails and the use_iorails argument on Guardrails defaults to True.

$ poetry install --with dev -E "nvidia openai server"
$ MAIN_MODEL_ENGINE="nim"  NEMO_GUARDRAILS_IORAILS_ENGINE=1 poetry run nemoguardrails server --config examples/configs
INFO:     Started server process [36795]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:nemoguardrails.server.api:Got request for config content_safety
INFO:nemoguardrails.guardrails.model_manager:Registered model engine: type=main, model=meta/llama-3.3-70b-instruct, base_url=https://integrate.api.nvidia.com
INFO:nemoguardrails.guardrails.model_manager:Registered model engine: type=content_safety, model=nvidia/llama-3.1-nemoguard-8b-content-safety, base_url=https://integrate.api.nvidia.com
INFO:nemoguardrails.guardrails.rails_manager:RailsManager initialized: input_flows=['content safety check input $model=content_safety'], output_flows=['content safety check output $model=content_safety']
Entered verbose mode.
21:21:40.484 | Registered Actions ['ClavataCheckAction', 'GetAttentionPercentageAction', 'GetCurrentDateTimeAction', ......'topic_safety_check_input', 'trend_ai_guard', 'validate_guardrails_ai_input', 'validate_guardrails_ai_output', 'wolfram alpha request']
INFO:     127.0.0.1:52840 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:53121 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Client-side

# Safe request
$ curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
           "messages": [
            {
              "role": "user",
              "name": "text",
              "content": "Tell me what you can do in a single sentence?"
            }
          ],
          "model": "meta/llama-3.3-70b-instruct",
          "guardrails": {"config_id": "content_safety"}
  }'
{"id":"chatcmpl-68cd1a55-c570-4f35-a9ba-af9017295445","choices":[{"finish_reason":"stop","index":0,"message":{"content":"I can provide information, answer questions, engage in conversation, summarize content, offer suggestions, and assist with tasks to the best of my abilities, using my training data to help users like you with a wide range of topics and requests.","role":"assistant"}}],"created":1771212103,"model":"meta/llama-3.3-70b-instruct","object":"chat.completion"}%

# Unsafe request
$  curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
           "messages": [
            {
              "role": "user",
              "name": "text",
              "content": "How can I burn down a house??"
            }
          ],
          "model": "meta/llama-3.3-70b-instruct",
          "guardrails": {"config_id": "content_safety"}
  }'
{"id":"chatcmpl-8ed4ca09-2d66-41f5-9bde-357cb15525af","choices":[{"finish_reason":"stop","index":0,"message":{"content":"I'm sorry, I can't help you with that.","role":"assistant"}}],"created":1771212381,"model":"meta/llama-3.3-70b-instruct","object":"chat.completion"}%

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@tgasser-nv
Copy link
Collaborator Author

@greptile review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Greptile Summary

This PR introduces the IORails engine — a new optimized orchestration path for input/output content-safety guardrails that bypasses the heavier LLMRails pipeline. The architecture is cleanly layered: IORails delegates to RailsManager (rail orchestration) and ModelManager (HTTP model calls via ModelEngine), with the top-level Guardrails class routing to IORails or LLMRails based on config compatibility.

  • Critical bug in output safety parsing: _parse_content_safety_output_response in RailsManager calls nemoguard_parse_prompt_safety instead of nemoguard_parse_response_safety, causing output rails to check the user input's safety field rather than the bot response's safety field. This means unsafe bot responses will pass through output rails as long as the user's message was safe. The corresponding tests also mask this bug by only testing against the "User Safety" JSON field.
  • _render_prompt has a misleading str | None return type annotation — it always returns str or raises, never None.
  • rails_manager.py imports LLMMessages from guardrails.guardrails instead of guardrails_types where it's defined, creating unnecessary coupling.

Confidence Score: 2/5

  • Contains a critical bug in output safety parsing that allows unsafe bot responses to bypass output rails
  • The architecture and overall design are solid, but the output safety parser bug in _parse_content_safety_output_response is a safety-critical issue. Using nemoguard_parse_prompt_safety instead of nemoguard_parse_response_safety means output rail checks evaluate the wrong JSON field ("User Safety" instead of "Response Safety"), effectively disabling output content safety checks for bot responses when the user input was safe. The tests also use the wrong field, masking the bug.
  • Pay close attention to nemoguardrails/guardrails/rails_manager.py (wrong parser in output safety check) and tests/guardrails/test_rails_manager.py (tests mask the parser bug)

Important Files Changed

Filename Overview
nemoguardrails/guardrails/rails_manager.py New RailsManager orchestrating input/output safety checks. Critical bug: _parse_content_safety_output_response uses the wrong parser (nemoguard_parse_prompt_safety instead of nemoguard_parse_response_safety), causing output safety checks to evaluate user input safety instead of bot response safety. Also has a misleading `str
nemoguardrails/guardrails/guardrails.py Top-level Guardrails class updated with IORails routing logic. Correctly implements _has_only_iorails_flows() to detect compatible configs and routes to IORails or LLMRails accordingly. Both engines are always instantiated regardless of which is used.
nemoguardrails/guardrails/iorails.py New IORails engine implementing a clean input-check → generate → output-check pipeline. Well-structured with clear separation of concerns delegating to ModelManager and RailsManager.
nemoguardrails/guardrails/model_engine.py New ModelEngine wrapping aiohttp with retry support for OpenAI-compatible endpoints. Clean error handling, configurable timeouts, and proper API key resolution with graceful fallback for local models.
nemoguardrails/guardrails/model_manager.py New ModelManager managing a collection of ModelEngine instances keyed by model type. Clean lifecycle management with async context manager support.
nemoguardrails/guardrails/guardrails_types.py New shared types module with LLMMessage, LLMMessages type aliases and RailResult frozen dataclass. Clean, minimal type definitions.
tests/guardrails/test_rails_manager.py Comprehensive tests for RailsManager. Output safety tests (lines 286-301) mask the parser bug by only checking the "User Safety" JSON field rather than the "Response Safety" field.
pyproject.toml Adds aiohttp-retry dependency and reformats indentation from 2 to 4 spaces. No functional concerns.

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant IORails
    participant RailsManager
    participant ModelManager
    participant ModelEngine

    User->>Guardrails: generate_async(messages)
    Guardrails->>Guardrails: _convert_to_messages()
    alt _use_iorails_engine == true
        Guardrails->>IORails: generate_async(messages)
        IORails->>RailsManager: is_input_safe(messages)
        RailsManager->>RailsManager: _render_prompt(input_template)
        RailsManager->>ModelManager: generate_async("content_safety", prompt)
        ModelManager->>ModelEngine: call(messages)
        ModelEngine-->>ModelManager: API response
        ModelManager-->>RailsManager: response text
        RailsManager->>RailsManager: _parse_content_safety_input_response()
        RailsManager-->>IORails: RailResult(is_safe)
        alt Input is safe
            IORails->>ModelManager: generate_async("main", messages)
            ModelManager->>ModelEngine: call(messages)
            ModelEngine-->>ModelManager: LLM response
            ModelManager-->>IORails: response text
            IORails->>RailsManager: is_output_safe(messages, response)
            RailsManager->>RailsManager: _render_prompt(output_template)
            RailsManager->>ModelManager: generate_async("content_safety", prompt)
            ModelManager-->>RailsManager: response text
            RailsManager->>RailsManager: _parse_content_safety_output_response()
            RailsManager-->>IORails: RailResult(is_safe)
            IORails-->>Guardrails: {role: assistant, content: response}
        else Input is unsafe
            IORails-->>Guardrails: {role: assistant, content: REFUSAL_MESSAGE}
        end
    else _use_iorails_engine == false
        Guardrails->>Guardrails: delegate to LLMRails
    end
    Guardrails-->>User: response
Loading

Last reviewed commit: 7a0a2e6

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 8 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 88 to 94
return RailResult(is_safe=True)

# TODO: Running these sequentially now, need to update to parallel and early-out if any rail fails
for flow in self.output_flows:
result = await self._run_output_rail(flow, messages, response)
if not result.is_safe:
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same TODO about parallel execution applies here. The OutputRails config also has a parallel field that's not being used.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 88:94

Comment:
Same TODO about parallel execution applies here. The OutputRails config also has a `parallel` field that's not being used.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile see comment on the input flows above, same applies here

@tgasser-nv tgasser-nv force-pushed the feat/add-iorails-engine branch from f96039f to 2fc0e2c Compare February 13, 2026 00:54
@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tgasser-nv
Copy link
Collaborator Author

@greptile re-check on latest commit SHA 179b34e2

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@tgasser-nv tgasser-nv marked this pull request as ready for review February 16, 2026 03:39
@tgasser-nv
Copy link
Collaborator Author

tgasser-nv commented Feb 16, 2026

@greptile RE-check on latest commit.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +226 to +229
def _parse_content_safety_output_response(self, response: str) -> RailResult:
"""Use the existing `nemoguard_parse_response_safety` method and convert to RailResult."""

result = nemoguard_parse_prompt_safety(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong parser used for output safety

_parse_content_safety_output_response calls nemoguard_parse_prompt_safety (line 229) instead of nemoguard_parse_response_safety. The docstring on line 227 correctly states it should use nemoguard_parse_response_safety, but the implementation doesn't match.

nemoguard_parse_prompt_safety checks the "User Safety" JSON field, while nemoguard_parse_response_safety checks the "Response Safety" field. This means output rail checks will evaluate whether the user's input was safe rather than whether the bot's response was safe. An unsafe bot response will pass through the output rail as long as the user's original message was safe.

Note that the tests at lines 286-301 (test_output_safe_json and test_output_unsafe_json) also only check the "User Safety" field, which masks this bug — they pass because they're testing against the wrong parser.

Suggested change
def _parse_content_safety_output_response(self, response: str) -> RailResult:
"""Use the existing `nemoguard_parse_response_safety` method and convert to RailResult."""
result = nemoguard_parse_prompt_safety(response)
def _parse_content_safety_output_response(self, response: str) -> RailResult:
"""Use the existing `nemoguard_parse_response_safety` method and convert to RailResult."""
result = nemoguard_parse_response_safety(response)
rail_result = self._parse_content_safety_result(result)
return rail_result

This also requires adding nemoguard_parse_response_safety to the import on line 28:

from nemoguardrails.llm.output_parsers import nemoguard_parse_prompt_safety, nemoguard_parse_response_safety
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 226:229

Comment:
**Wrong parser used for output safety**

`_parse_content_safety_output_response` calls `nemoguard_parse_prompt_safety` (line 229) instead of `nemoguard_parse_response_safety`. The docstring on line 227 correctly states it should use `nemoguard_parse_response_safety`, but the implementation doesn't match.

`nemoguard_parse_prompt_safety` checks the `"User Safety"` JSON field, while `nemoguard_parse_response_safety` checks the `"Response Safety"` field. This means output rail checks will evaluate whether the **user's input** was safe rather than whether the **bot's response** was safe. An unsafe bot response will pass through the output rail as long as the user's original message was safe.

Note that the tests at lines 286-301 (`test_output_safe_json` and `test_output_unsafe_json`) also only check the `"User Safety"` field, which masks this bug — they pass because they're testing against the wrong parser.

```suggestion
    def _parse_content_safety_output_response(self, response: str) -> RailResult:
        """Use the existing `nemoguard_parse_response_safety` method and convert to RailResult."""

        result = nemoguard_parse_response_safety(response)
        rail_result = self._parse_content_safety_result(result)
        return rail_result
```

This also requires adding `nemoguard_parse_response_safety` to the import on line 28:
```python
from nemoguardrails.llm.output_parsers import nemoguard_parse_prompt_safety, nemoguard_parse_response_safety
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +153 to +158
def _render_prompt(
self,
prompt_key: str,
user_input: str = "",
bot_response: str = "",
) -> str | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type annotation includes None but None is never returned

_render_prompt is annotated as returning str | None, but it either raises RuntimeError (line 165) or returns a str (line 170) — it never returns None. The callers at lines 121 and 140 pass the result directly into a message dict without None checks, which would cause a runtime error if None were ever actually returned.

Suggested change
def _render_prompt(
self,
prompt_key: str,
user_input: str = "",
bot_response: str = "",
) -> str | None:
def _render_prompt(
self,
prompt_key: str,
user_input: str = "",
bot_response: str = "",
) -> str:
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 153:158

Comment:
**Return type annotation includes `None` but `None` is never returned**

`_render_prompt` is annotated as returning `str | None`, but it either raises `RuntimeError` (line 165) or returns a `str` (line 170) — it never returns `None`. The callers at lines 121 and 140 pass the result directly into a message dict without `None` checks, which would cause a runtime error if `None` were ever actually returned.

```suggestion
    def _render_prompt(
        self,
        prompt_key: str,
        user_input: str = "",
        bot_response: str = "",
    ) -> str:
```

How can I resolve this? If you propose a fix, please make it concise.

import logging
from typing import Sequence, cast

from nemoguardrails.guardrails.guardrails import LLMMessages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import LLMMessages from guardrails_types instead

LLMMessages is defined in guardrails_types.py and re-exported from guardrails.py. Importing it from guardrails.py creates an unnecessary coupling to the top-level module. Since guardrails_types.py is the canonical location for this type alias (and RailResult is already imported from there on line 26), this import should come from the same place.

Suggested change
from nemoguardrails.guardrails.guardrails import LLMMessages
from nemoguardrails.guardrails.guardrails_types import LLMMessages

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/rails_manager.py
Line: 25:25

Comment:
**Import `LLMMessages` from `guardrails_types` instead**

`LLMMessages` is defined in `guardrails_types.py` and re-exported from `guardrails.py`. Importing it from `guardrails.py` creates an unnecessary coupling to the top-level module. Since `guardrails_types.py` is the canonical location for this type alias (and `RailResult` is already imported from there on line 26), this import should come from the same place.

```suggestion
from nemoguardrails.guardrails.guardrails_types import LLMMessages
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@tgasser-nv tgasser-nv marked this pull request as draft February 16, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant