Skip to content

Comments

Handle tool-call streaming loops and strengthen model cache validation#23

Open
christopherkarani wants to merge 3 commits intomainfrom
codex/fix-missioncritical-framework-bugs
Open

Handle tool-call streaming loops and strengthen model cache validation#23
christopherkarani wants to merge 3 commits intomainfrom
codex/fix-missioncritical-framework-bugs

Conversation

@christopherkarani
Copy link
Owner

Summary

  • Introduce recursive streaming loop in ChatSession that handles tool calls, executor retries, and proper state updates while cancelling and error handling are improved
  • Guard PartialToolCall construction with validation that logs and skips invalid fragments across OpenAI streaming helpers
  • Harden ModelCache lookups to verify disk paths and prune stale entries plus enrich model download progress with speed/ETA tracking

Testing

  • Not run (not requested)

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@claude
Copy link

claude bot commented Feb 19, 2026

Code Review

Thanks for these improvements. The overall direction is good — the streaming loop is significantly cleaner than the old one-shot approach and the disk-validation in ModelCache addresses a real class of bugs. A few issues worth addressing before merge:


Bugs / Correctness

1. allCachedModels() is inconsistent with the validated accessors (ModelCache.swift)

isCached(_:), info(for:), and localPath(for:) now call validateCachedEntry and prune stale entries, but allCachedModels() still returns cache.values directly. A caller who iterates allCachedModels() can get entries that isCached() would report as absent:

let models = await cache.allCachedModels()   // may include stale entries
for m in models {
    await cache.isCached(m.identifier)       // returns false ← mismatch
}

allCachedModels() should filter through the same validation logic (or at least document the divergence).

2. Double progress callback per event (ModelManager.swift)

enrichedProgress calls callback(enriched) immediately (no speed/ETA), then yields to speedTask which calls progress(updated) again with speed info. The user's closure fires twice per tick. This was the same behaviour as the old code, but the refactor was an opportunity to fix it — callers expecting one call per event will see two.


Design / Code Quality

3. Redundant continuation variables (ModelManager.swift, lines ~696-704)

Three variables are declared for what is ultimately the same value:

var speedContinuation: AsyncStream<DownloadProgress>.Continuation?
var speedContinuationRef: AsyncStream<DownloadProgress>.Continuation?
// ...
speedContinuationRef = speedContinuation            // ← copy
let continuationSnapshot = speedContinuationRef     // ← another copy

speedContinuation is only ever used to assign speedContinuationRef. This can be collapsed into a single var speedContinuation captured by the closure. The intermediate continuationSnapshot inside the task body is also unnecessary since the outer speedContinuationRef is never mutated after assignment.

4. Redundant continuationSnapshot?.finish() inside speedTask (ModelManager.swift)

The for await loop inside speedTask exits only after the stream finishes (i.e., after the continuation is already finished). Calling continuationSnapshot?.finish() at that point is a no-op. The real cleanup is already handled by the defer.

5. PartialToolCall.init is a source-breaking public API change (GenerationChunk.swift)

Changing a non-throwing public init to throws is a breaking change for any downstream consumer who constructs PartialToolCall directly (e.g., custom provider implementations, unit tests). If this library follows semantic versioning this warrants a major bump or, at minimum, clear documentation in the changelog/migration guide.

The call sites in OpenAIProvider+Streaming.swift use try? to silently skip invalid fragments. Consider whether swallowing the error here is always correct — a misconfigured accumulator could silently drop tool calls with no indication to the caller. A logger.error (vs logger.warning) may be more appropriate depending on how severe an empty id / toolName is.


Minor

6. Tool-round counting semantics (ChatSession.swift)

The guard toolRoundCount < currentMaxToolCallRounds is evaluated before executing the current round. This means maxToolCallRounds = 0 throws immediately on the first tool call without executing it, and maxToolCallRounds = N allows exactly N rounds. This is correct but non-obvious — a doc comment on the property clarifying inclusive/exclusive semantics would prevent future confusion.

7. No new tests

The tool-call streaming loop is the most complex change here. The existing ChatSessionTests has the mock infrastructure (MockTextProvider) to cover:

  • Exactly N rounds of tool calls completing successfully
  • The maxToolCallRounds overflow path throwing the right error
  • toolExecutor == nil when tool calls are returned
  • Cancellation mid-loop not leaving partial state in messages

These are all high-value cases that would also catch regressions in the loopMessages / turnMessages state management.


Summary

Area Severity
allCachedModels() stale entries Medium
Double progress callback Low–Medium
Redundant continuation variables Low
PartialToolCall breaking change Medium (API contract)
Missing tests for streaming loop Medium

The core logic in all three areas is sound. Addressing the allCachedModels consistency issue and adding tests for the tool-call loop are the most important follow-ups before merging.

@christopherkarani
Copy link
Owner Author

@claude review and fix all identified issues

@claude
Copy link

claude bot commented Feb 19, 2026

Claude finished @christopherkarani's task in 9m 28s —— View job


All Identified Issues Fixed

All seven issues from the previous review have been addressed in commit 21cf27d.

Tasks

  • Gather context and read changed files
  • Fix allCachedModels() consistency with validated accessors (ModelCache.swift)
  • Fix double progress callback + clean up redundant continuation variables (ModelManager.swift)
  • Restore non-throwing PartialToolCall.init, add throwing validated factory (GenerationChunk.swift)
  • Update OpenAI streaming call sites to use PartialToolCall.validated, upgrade log levels (OpenAIProvider+Streaming.swift)
  • Add doc comment clarifying maxToolCallRounds inclusive semantics (ChatSession.swift)
  • Add streaming loop tests + update MockTextProvider (ChatSessionTests.swift)
  • Commit and push changes

What was changed

ModelCache.swiftallCachedModels() consistency (bug fix)

allCachedModels() now snapshots cache.values into an Array first, then filters each entry through validateCachedEntry. This prunes stale entries the same way isCached, info(for:), and localPath(for:) do, so callers can no longer get an entry from allCachedModels() that isCached() would report as absent.


ModelManager.swift — Double callback + redundant vars (bug fix + cleanup)

downloadWithEstimation had three variables all holding the same continuation (speedContinuation, speedContinuationRef, continuationSnapshot/speedContinuationSnapshot). These were collapsed into one speedContinuation plus a single capturedContinuation snapshot for the @Sendable closure. The immediate callback(enriched) call was removed from enrichedProgress; the speed-stream task now fires the user callback exactly once per event (whether or not speed data is available yet). The no-op continuationSnapshot?.finish() at the end of the for await body was also removed.


GenerationChunk.swift — Backward-compatible PartialToolCall API (breaking-change fix)

The public init was restored as non-throwing with precondition guards, preserving the original ABI for any downstream consumers (custom provider implementations, unit tests). A new public static func validated(id:toolName:index:argumentsFragment:) throws -> PartialToolCall factory was added for streaming code that may encounter malformed server data — it throws AIError.invalidInput instead of trapping.


OpenAIProvider+Streaming.swift — Use validated factory + upgrade log level

Both call sites now use PartialToolCall.validated(...). The Responses API path was also refactored from try? (which silently dropped errors) to an explicit do/catch. Both sites log at .error level instead of .warning, reflecting that an empty id or toolName in a streaming response indicates a real problem.


ChatSession.swiftmaxToolCallRounds doc comment

The doc comment now explicitly states the inclusive semantics:

  • = 0 → first tool-call response throws immediately (no rounds executed)
  • = N → exactly N rounds are permitted before throwing

ChatSessionTests.swift — Streaming loop test coverage

MockTextProvider was extended with setQueuedStreamChunkSets(_:) so tests can enqueue per-call [GenerationChunk] arrays for the streaming path (mirroring the existing setQueuedGenerationResults for the generate path). The stream(messages:model:config:) actor method now tracks call counts and received messages the same way generate does.

Four new @Test cases were added to ChatSessionTests:

Test Scenario
streamExecutesToolCallsThenYieldsFinalAnswer N rounds completing, verifies tokens, message structure, and call count
streamThrowsWhenToolLoopExceedsMaxRounds maxToolCallRounds overflow throws AIError.invalidInput and rolls back messages
streamThrowsWhenToolExecutorNilAndToolCallsReturned nil toolExecutor throws and rolls back messages
streamMaxToolCallRoundsZeroThrowsImmediately maxToolCallRounds = 0 throws without executing any tool calls
Branch

- ModelCache: filter allCachedModels() through validateCachedEntry so
  its results are consistent with isCached/info(for:)/localPath(for:)
- ModelManager: collapse three redundant speedContinuation variables
  into one, remove no-op finish() inside speedTask, and fire the user
  progress callback once per tick (from the speed stream task) instead
  of twice
- GenerationChunk: restore non-throwing public init with precondition
  guards to preserve backward compatibility; add public static
  PartialToolCall.validated(...) throwing factory for use in streaming
  code that receives potentially malformed server data
- OpenAIProvider+Streaming: use PartialToolCall.validated at both call
  sites and upgrade log level from warning to error for invalid fragments
- ChatSession: clarify maxToolCallRounds doc comment with
  inclusive/exclusive semantics (0 = no rounds, N = exactly N rounds)
- ChatSessionTests: extend MockTextProvider with setQueuedStreamChunkSets
  and add four streaming tool-call loop tests covering: N rounds
  completing successfully, maxToolCallRounds overflow, nil toolExecutor,
  and maxToolCallRounds = 0

Co-authored-by: Christopher Karani <christopherkarani@users.noreply.github.com>
@christopherkarani
Copy link
Owner Author

@claude review and fix issues identified

@codex review

@claude
Copy link

claude bot commented Feb 20, 2026

Claude finished @christopherkarani's task in 3m 9s —— View job


Code Review & Fix Pass

Tasks

  • Gather context and read repository setup
  • Read all changed files
  • Review for remaining/new issues
  • Fix identified issues
  • Commit and push changes (4982f22)

Issues Found & Fixed

1. Build-breaking: ToolCallRetryPolicy is undefined (ChatSession.swift:667)

The stream() method's capturedState tuple annotation used the type ToolCallRetryPolicy, which does not exist anywhere in the codebase. The correct type is ToolExecutor.RetryPolicy. This was a compilation error that would prevent the project from building entirely.

// Before (broken — ToolCallRetryPolicy is undefined)
let capturedState: (..., toolCallRetryPolicy: ToolCallRetryPolicy, ...) = withLock { ... }

// After (correct)
let capturedState: (..., toolCallRetryPolicy: ToolExecutor.RetryPolicy, ...) = withLock { ... }

2. maxToolCallRounds not clamped in stream() (ChatSession.swift:672)

send(_:) clamps negative maxToolCallRounds values to 0 via max(0, maxToolCallRounds) before capturing. The stream() method was passing the raw value, producing confusing error messages like "exceeded maxToolCallRounds (-2)" for negative inputs.

// After: matches send(_:) behaviour
return (messages, toolExecutor, toolCallRetryPolicy, max(0, maxToolCallRounds))

3. totalSize(), count, isEmpty ignored stale entries (ModelCache.swift)

These three accessors read cache.values / cache.count / cache.isEmpty directly without validating on-disk existence, while allCachedModels(), isCached(), info(for:), and localPath(for:) all call validateCachedEntry. This meant callers could see inflated size/count numbers for models whose files had been removed from disk. All three now route through validateCachedEntry() for consistency.


Summary

Issue File Severity Status
ToolCallRetryPolicy undefined (build error) ChatSession.swift:667 Critical Fixed
maxToolCallRounds not clamped in stream() ChatSession.swift:672 Low Fixed
totalSize()/count/isEmpty include stale entries ModelCache.swift Low–Medium Fixed
Branch

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

…ate totalSize/count

- ChatSession.stream(): fix `ToolCallRetryPolicy` (undefined) to the correct
  `ToolExecutor.RetryPolicy` type in the capturedState tuple annotation,
  which was a build-breaking error introduced in the previous commit.

- ChatSession.stream(): clamp `maxToolCallRounds` to `max(0, ...)` before
  capturing into the loop, matching the behavior of `send(_:)` and
  producing a consistent error message for negative values.

- ModelCache.totalSize(): filter through validateCachedEntry() before
  summing sizes, so stale (deleted from disk) entries are excluded and
  pruned, consistent with isCached/info/localPath/allCachedModels.

- ModelCache.count / isEmpty: route through validateCachedEntry() so
  callers get an accurate count of entries that actually exist on disk,
  consistent with the rest of the validated accessor family.

Co-authored-by: Christopher Karani <christopherkarani@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant