Skip to content

[BUG]: Data race in gRPC client interceptor when using grpc.Peer() CallOption #3895

@lihao-figma

Description

@lihao-figma

Tracer Version(s)

1.65.1

Go Version(s)

1.22

Bug Report

gRPC version: v1.73.0

  Race Detector Output

  WARNING: DATA RACE
  Read at 0x00c0007e9920 by goroutine 380:
    gopkg.in/DataDog/dd-trace-go.v1/contrib/google.golang.org/grpc.doClientRequest()
        contrib/google.golang.org/grpc/client.go:197 +0x506

  Previous write at 0x00c0007e9920 by goroutine 763:
    google.golang.org/grpc.PeerCallOption.after()
        google.golang.org/grpc@v1.73.0/rpc_util.go:271 +0xbe

This happens upon stream grpc call with fast context cancellation after start.

The issue occurs in doClientRequest (client.go):

  // line 191-197
  var p peer.Peer
  opts = append(opts, grpc.Peer(&p))

  handlerCtx := injectSpanIntoContext(ctx)
  err := handler(handlerCtx, opts)

  setSpanTargetFromPeer(span, p)  // RACE: Reading p here

in grpc library
google.golang.org/grpc@v1.73.0/stream.go line 391

	if desc != unaryStreamDesc {
		// Listen on cc and stream contexts to cleanup when the user closes the
		// ClientConn or cancels the stream context.  In all other cases, an error
		// should already be injected into the recv buffer by the transport, which
		// the client will eventually receive, and then we will cancel the stream's
		// context in clientStream.finish.
		go func() {
			select {
			case <-cc.ctx.Done():
				cs.finish(ErrClientConnClosing)
			case <-ctx.Done():
				cs.finish(toRPCErr(ctx.Err()))
			}
		}()
	}

The race happens because:

  1. A local peer.Peer variable is created and passed to gRPC via grpc.Peer(&p)
  2. The handler (which creates the gRPC stream/call) returns immediately
  3. DataDog immediately reads from p via setSpanTargetFromPeer()
  4. Meanwhile, gRPC populates the peer asynchronously (if context cancellation signal received) when the call completes, calling PeerCallOption.after() which writes to p

The grpc.Peer() CallOption is designed to be populated after the RPC completes, not immediately after stream creation. The peer information is only available once the connection is established and the call finishes. DataDog's code incorrectly assumes the peer is populated synchronously when the handler returns.

We can consider change fetching peer information from stream context?

  if p, ok := peer.FromContext(stream.Context()); ok {
      setSpanTargetFromPeer(span, *p)
  }

Metadata

Metadata

Assignees

Labels

apm:ecosystemcontrib/* related feature requests or bugsbugunintended behavior that has to be fixedwaiting-for-infowaiting for answer from issue creator

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions