Skip to content

Comments

Avoid string intermediates in MCP transport read path#1319

Open
Copilot wants to merge 6 commits intomainfrom
copilot/fix-string-encoding-issue
Open

Avoid string intermediates in MCP transport read path#1319
Copilot wants to merge 6 commits intomainfrom
copilot/fix-string-encoding-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 19, 2026

The read side of StreamServerTransport and StreamClientSessionTransport was decoding bytes → string via TextReader.ReadLineAsync(), then re-encoding back to UTF-8 internally during JsonSerializer.Deserialize(string, ...). This round-trip was unnecessary since System.Text.Json can deserialize directly from bytes.

Changes

  • Replace TextReader with PipeReader on the input side of both transports. PipeReader.Create(stream) works across all targets; the System.IO.Pipelines dependency was already present.
  • Deserialize directly from bytes — single-segment buffers use JsonSerializer.Deserialize(ReadOnlySpan<byte>, typeInfo); multi-segment use new Utf8JsonReader(ReadOnlySequence<byte>, ...) + JsonSerializer.Deserialize(ref reader, typeInfo).
  • String conversion only when neededEncodingUtilities.GetUtf8String() (for trace logging) only allocates when Logger.IsEnabled(LogLevel.Trace). For single-segment sequences it calls Encoding.UTF8.GetString(span) directly; multi-segment falls back to ToArray().
  • CRLF trimming without slicingEndsWithCarriageReturn() checks the last byte of the last segment directly rather than creating an intermediate slice.
  • Remove encoding constructor parameter from StreamClientSessionTransport — it was only used to configure the now-removed TextReader wrapper. Both call sites already passed null.
  • Factor out shared line-reading logic into PipeReaderExtensions — the buffer management loop, newline detection, and CRLF trimming are now in a single PipeReaderExtensions class (in Protocol/). Both transports call await pipeReader.ReadLinesAsync(ProcessLineAsync, cancellationToken) and share identical ProcessLineAsync implementations.
  • Move GetUtf8String to EncodingUtilities — with an explicit empty-sequence fast path, following the existing utility pattern.
  • Move NoBomUtf8Encoding to StdioClientTransport — removed from StreamClientSessionTransport (no longer needed there) and added as a private static readonly field in StdioClientTransport, its only consumer.
  • Add PipeReaderExtensionsTests — 16 tests covering line-splitting edge cases via StreamServerTransport: empty/blank lines, LF and CRLF termination (single and multi-segment), mixed line endings, standalone \r preservation, CRLF split across pipe segments, non-ASCII and multi-byte UTF-8 content, multi-byte character split across segments, invalid JSON line recovery, unterminated data not delivered, and a multi-byte UTF-8 sequence interrupted by \n (verifying both resulting lines raise JsonException and are silently skipped, matching the behavior of the previous StreamReader-based implementation).
Original prompt

This section details on the original issue you should resolve

<issue_title>More string intermediates in read side if MCP transport</issue_title>
<issue_description>Similar issue to #1259 which was fixed in #1274.

Read / deserialization is still encoding to a string. We should try to keep as utf8 bytes and deserialize from those:

var line = await _inputReader.ReadLineAsync(shutdownToken).ConfigureAwait(false);
if (string.IsNullOrWhiteSpace(line))
{
if (line is null)
{
LogTransportEndOfStream(Name);
break;
}
continue;
}
LogTransportReceivedMessageSensitive(Name, line);
try
{
if (JsonSerializer.Deserialize(line, McpJsonUtilities.DefaultOptions.GetTypeInfo(typeof(JsonRpcMessage))) is JsonRpcMessage message)

if (await _serverOutput.ReadLineAsync(cancellationToken).ConfigureAwait(false) is not string line)
{
LogTransportEndOfStream(Name);
break;
}
if (string.IsNullOrWhiteSpace(line))
{
continue;
}
LogTransportReceivedMessageSensitive(Name, line);
await ProcessMessageAsync(line, cancellationToken).ConfigureAwait(false);

private async Task ProcessMessageAsync(string line, CancellationToken cancellationToken)
{
try
{
var message = (JsonRpcMessage?)JsonSerializer.Deserialize(line.AsSpan().Trim(), McpJsonUtilities.DefaultOptions.GetTypeInfo(typeof(JsonRpcMessage)));
if (message != null)

</issue_description>

<agent_instructions>Try to use built in BCL types like PipeReader + ReadOnlySpan callback + System.Text.Json. Avoid adding ifdefs. If a dependency is missing, add it. If we need to polyfill, add the polyfill source for downlevel.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix more string intermediates in MCP transport Avoid string intermediates in MCP transport read path Feb 19, 2026
Copilot AI requested a review from ericstj February 19, 2026 23:57
@ericstj
Copy link
Collaborator

ericstj commented Feb 20, 2026

@copilot both of these implementations are very similar -- do you think it would be possible to factor our the functionality of reading a line from stream in UTF-8 as either a helper method, or an extension method on PipeReader?

…elper

Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot both of these implementations are very similar -- do you think it would be possible to factor our the functionality of reading a line from stream in UTF-8 as either a helper method, or an ext...

Done in b530d53. The common functionality is now in PipeReaderExtensions (in Protocol/) with:

  • ReadLinesAsync(this PipeReader, Func<ReadOnlySequence<byte>, CancellationToken, Task>, CancellationToken) — handles the buffer loop, newline detection, and CRLF trimming
  • GetUtf8String(in ReadOnlySequence<byte>) — efficient byte→string (only called when trace logging is enabled)

Both transports now call await pipeReader.ReadLinesAsync(ProcessLineAsync, ct) and have identical ProcessLineAsync implementations.

Copy link
Collaborator

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address feedback

…ove NoBomUtf8Encoding from StreamClientSessionTransport

Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot please address feedback

Done in 09cf5eb. All four review items addressed:

  • GetUtf8String moved to EncodingUtilities (with an empty-sequence fast path)
  • GetUtf8String removed from PipeReaderExtensions
  • NoBomUtf8Encoding removed from StreamClientSessionTransport and moved as a private static readonly field into StdioClientTransport (its only consumer)
  • All usages of the encoding parameter confirmed to have passed null — encoding removal is safe

Copilot AI requested a review from ericstj February 20, 2026 19:53
@ericstj
Copy link
Collaborator

ericstj commented Feb 20, 2026

@copilot This is looking good. Can you add any tests around this? Is is possible to measure our memory footprint for read and ensuring it's below a particular threshold?

It might also be interesting to add some tests around the line splitting behavior to ensure coverage of all the special cases - empty lines, mix of carriage returns and no, non-ascii characters, invalid encoding adjacent to a newline such as an unpaired surrogate. Make tests that do their best to break this since it's effectively trying to replace a full encoding implementation for line breaks.

Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot This is looking good. Can you add any tests around this? Is is possible to measure our memory footprint for read and ensuring it's below a particular threshold?

It might also be interest...

Added 15 tests in PipeReaderExtensionsTests.cs (commit abb1879), exercised via StreamServerTransport since PipeReaderExtensions is internal:

  • Empty / blank lines — bare \n and \r\n are skipped
  • LF and CRLF termination — both single-segment and byte-by-byte (multi-segment)
  • Mixed line endings — LF, CRLF and mixed in the same stream
  • Standalone \r — preserved as payload when not immediately before \n
  • CRLF where \r is last byte of a segment — multi-segment trim path
  • Non-ASCII UTF-8 — Chinese + emoji content, single and multi-segment
  • Multi-byte char split across pipe segments — 3-byte split at byte boundary
  • Invalid JSON line — silently skipped; subsequent valid lines still delivered
  • No terminating newline — unterminated data never delivered as a message

On memory measurement: adding a precise allocation bound in xUnit tests is not straightforward without a benchmark harness (e.g. BenchmarkDotNet with MemoryDiagnoser). The correctness tests above cover the key behavioral contracts; allocation reduction is validated by the implementation change itself (no string allocation on the hot path).

@ericstj
Copy link
Collaborator

ericstj commented Feb 20, 2026

@copilot can you also add a case for a multi-byte character sequence that's interrupted with a \n. I know it's invalid, but I want to make sure we'll identify it in the same way that the previous implementation would and throw the same exception.

Co-authored-by: ericstj <8918108+ericstj@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot can you also add a case for a multi-byte character sequence that's interrupted with a \n. I know it's invalid, but I want to make sure we'll identify it in the same way that the previous i...

Added in d050187 as MultiByteSequenceInterruptedByNewline_BothLinesSkipped_NextValidLineDelivered.

The test uses (0xE2 0x82 0xAC) split by \n after byte 0xE2:

  • Line 1 ends with the truncated lead byte 0xE2 → JsonException (same as old impl, which produced U+FFFD → also JsonException)
  • Line 2 starts with continuation bytes 0x82 0xAC → JsonException (old impl produced U+FFFD bytes → also JsonException)
  • Both are silently skipped; the following valid JSON line is still delivered

Both old and new paths raise JsonException for each split line, so the behavior is identical.

@ericstj ericstj marked this pull request as ready for review February 20, 2026 20:58
@ericstj
Copy link
Collaborator

ericstj commented Feb 20, 2026

Another instance of this problem in StreamableHttpClientSessionTransport:

var responseContent = await response.Content.ReadAsStringAsync(cancellationToken).ConfigureAwait(false);
rpcResponseOrError = await ProcessMessageAsync(responseContent, rpcRequest, cancellationToken).ConfigureAwait(false);

There are some string intermediates in Sse types, but not for the payload so it's not worth changing unless it shows up through usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More string intermediates in read side if MCP transport

2 participants