Skip to content

Conversation

@jboolean
Copy link
Contributor

@jboolean jboolean commented Jan 16, 2026

Fixes the cause of https://app.datadoghq.com/incidents/48132

The Tools are shared between requests, and the intent capture logic caused a concurrent writes error, crashing the server. The change resolves that issue.

@github-actions github-actions bot added the apm:ecosystem contrib/* related feature requests or bugs label Jan 16, 2026
Copy link
Contributor Author

jboolean commented Jan 16, 2026

@jboolean jboolean changed the base branch from main to graphite-base/4362 January 16, 2026 21:58
@jboolean jboolean force-pushed the jb/intent-capture-fix-concurrent branch from 85d8a90 to c353afa Compare January 16, 2026 21:58
@jboolean jboolean changed the base branch from graphite-base/4362 to jb/replace-ddtrace-with-telemetry- January 16, 2026 21:58
@jboolean jboolean changed the title Replace ddtrace with telemetry fix(mark3labs/mcp): fix concurrent writes bug in intent capture Jan 16, 2026
@jboolean jboolean marked this pull request as ready for review January 16, 2026 21:59
@jboolean jboolean requested review from a team as code owners January 16, 2026 21:59
@jboolean jboolean force-pushed the jb/replace-ddtrace-with-telemetry- branch from 44f4456 to 7a43c31 Compare January 16, 2026 22:00
@jboolean jboolean force-pushed the jb/intent-capture-fix-concurrent branch from c353afa to 928df14 Compare January 16, 2026 22:00
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.20%. Comparing base (8f4902e) to head (7288e17).

Additional details and impacted files

see 428 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jan 16, 2026

Benchmarks

Benchmark execution time: 2026-01-20 17:27:31

Comparing candidate commit 7288e17 in PR branch jb/intent-capture-fix-concurrent with baseline commit 8f4902e in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 156 metrics, 8 unstable metrics.

Base automatically changed from jb/replace-ddtrace-with-telemetry- to main January 20, 2026 14:37
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot requested a review from a team as a code owner January 20, 2026 14:37
}

// The server reuses tools across requests. Slices and nested objects are cloned to avoid concurrent writes.
result.Tools = slices.Clone(result.Tools)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jboolean What are the odds of result.Tools's items being updated while cloning? It doesn't seem that we have any lock to ensure result.Tools or its items aren't modified while slices.Clone does its work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought. Extremely low as tools are not dynamic afaik, but let me add some locking.

Copy link
Contributor Author

@jboolean jboolean Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if it is written to anywhere inside the library itself, I wouldn't be able to control that.

I had Claude do a deep investigation and it concluded that this is not an issue. What do you think:

Summary: Line 39 slices.Clone(result.Tools) is Safe

TL;DR: No other goroutine can write to result.Tools during the clone operation because each request gets its own local slice that no other goroutine has a reference to.

Key Evidence from mcp-go

Each ListTools request creates its own local tools slice:

  // /Users/julian.boilen/dd/mcp-go/server/server.go:1159                                                                                                                                                                                                                                                                       
  func (s *MCPServer) handleListTools(                                                                                                                                                                                                                                                                                          
      ctx context.Context,                                                                                                                                                                                                                                                                                                      
      id any,                                                                                                                                                                                                                                                                                                                   
      request mcp.ListToolsRequest,                                                                                                                                                                                                                                                                                             
  ) (*mcp.ListToolsResult, *requestError) {                                                                                                                                                                                                                                                                                     
      s.toolsMu.RLock()                                                                                                                                                                                                                                                                                                         
      tools := make([]mcp.Tool, 0, len(s.tools))  // ← Local to THIS request                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                
      // Build tools slice...                                                                                                                                                                                                                                                                                                   
      for _, name := range toolNames {                                                                                                                                                                                                                                                                                          
          tools = append(tools, s.tools[name].Tool)  // Struct copy                                                                                                                                                                                                                                                             
      }                                                                                                                                                                                                                                                                                                                         
      s.toolsMu.RUnlock()                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                
      // Apply pagination (returns subslice sharing backing array)                                                                                                                                                                                                                                                              
      toolsToReturn, nextCursor, err := listByPagination(ctx, s, request.Params.Cursor, tools)                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                
      result := mcp.ListToolsResult{                                                                                                                                                                                                                                                                                            
          Tools: toolsToReturn,  // ← Shares backing array with local 'tools'                                                                                                                                                                                                                                                   
      }                                                                                                                                                                                                                                                                                                                         
      return &result, nil                                                                                                                                                                                                                                                                                                       
  }    

Why it's safe:

  • The local tools variable exists only within this specific request's stack frame
  • result.Tools shares the backing array with this local variable
  • No other goroutine has a reference to this local backing array
  • The mutex (toolsMu) is released before the hook runs, but that only protected the global s.tools map, not this local slice

The actual race (now fixed) was in nested structures:

  • Tool structs contain InputSchema.Properties (map) and InputSchema.Required (slice)
  • These nested references were shared across all requests (line 1179 copies the struct, not deep copies)
  • Without cloning them (lines 55, 64), concurrent writes to the same map/slice caused "concurrent map writes" panics

Verification: The test TestIntentCaptureConcurrentListTools passes with -race detector, confirming no race on line 39.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, LGTM.

@jboolean jboolean force-pushed the jb/intent-capture-fix-concurrent branch from 928df14 to 7288e17 Compare January 20, 2026 17:07
@jboolean jboolean requested a review from darccio January 20, 2026 17:34
@gh-worker-devflow-routing-ef8351
Copy link

gh-worker-devflow-routing-ef8351 bot commented Jan 21, 2026

View all feedbacks in Devflow UI.

2026-01-21 15:33:13 UTC ℹ️ Start processing command devflow:merge


2026-01-21 15:33:55 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-01-21 19:35:09 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

@gh-worker-devflow-routing-ef8351
Copy link

gh-worker-devflow-routing-ef8351 bot commented Jan 23, 2026

View all feedbacks in Devflow UI.

2026-01-23 18:11:13 UTC ℹ️ Start processing command devflow:merge


2026-01-23 18:11:27 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-01-23 22:12:09 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apm:ecosystem contrib/* related feature requests or bugs mergequeue-status: removed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants