Skip to content
This repository has been archived by the owner on Jan 19, 2022. It is now read-only.

Tracing between services in app engine flexible #1019

Closed
hlarsson opened this issue Sep 13, 2018 · 7 comments
Closed

Tracing between services in app engine flexible #1019

hlarsson opened this issue Sep 13, 2018 · 7 comments
Assignees

Comments

@hlarsson
Copy link

Tracing across different services in app engine flexible doesn't seem to work properly.
In a http call from service A to service B, the X-Cloud-Trace-Context header isn't set by A, so app engine creates a new trace id and sets it in the X-Cloud-Trace-Context header. So now the X-B3-TraceId header and X-Cloud-Trace-Context header are different. Service B sees the X-Cloud-Trace-Context and uses this for tracing.
This breaks the trace chain in the google console..
When running locally the X-Cloud-Trace-Context is never set and then everything works as expected.

As an example of this, replace the URL in the trace sample here:
https://github.com/spring-cloud/spring-cloud-gcp/blob/8ac48c5dea656c9eeb138a2b3bc2f27690372435/spring-cloud-gcp-samples/spring-cloud-gcp-trace-sample/src/main/java/com/example/WorkService.java#L40
with the public url of the service, ie https://<service>-dot-<project>.appstpot.com. With localhost it works.

@meltsufin
Copy link
Contributor

@hlarsson Why does Service B in your example use the X-Cloud-Trace-Context header when X-B3-TraceId is present? AFAIK, in the library we always default to using X-B3-TraceId when present.

@hlarsson
Copy link
Author

@meltsufin Thanks for the speedy reply.
If i have understood it correctly, then X-Cloud-Trace-Context is what stackdriver uses to do its tracing. And that seems to be set somewhere in app engine and is not passed on when making calls to a different service.
Google sets a X-Cloud-Trace-Context in A, but it's not passed along to B, so the tracing is lost by stackdriver because google then generates a new X-Cloud-Trace-Context header for B.

Like in the gcp sample linked, then you see all in the same trace if localhost:8080 is used. I assume because then the calls stay inside that server. While when using the https://<service>-dot-<project>.appstpot.com, it goes through google's load balancer and such, which i further assume creates a new X-Cloud-Trace-Context because it is not present in the request and that causes a mismatch between X-Cloud-Trace-Context and X-B3-TraceId

@meltsufin
Copy link
Contributor

Well, our integration (Brave/Sleuth) does not propagate the the X-Cloud-Trace-Context header when you call out to a different service. It only propagates the X-B3-TraceId.
Regardless of what the load-balancer generates for X-Cloud-Trace-Context our integration will always use X-B3-TraceId whenever available. The question is, does service A include X-B3-TraceId header when calling out to service B? It should, if you're using our integration and Rest Template.
Then service B should be using X-B3-TraceId and traces should continue across service boundaries.

Aside from all this, perhaps we should always copy X-B3-TraceId into X-Cloud-Trace-Context on the way out. This might help in situations where X-B3-TraceId is not used but X-Cloud-Trace-Context is used. Is this what you're looking for?

@hlarsson
Copy link
Author

Did some more digging, and i think this is the reason that the X-Cloud-Trace-Context is used in B is
propogation-stackdriver
It says it will use X-Cloud-Trace-Context if it exists, and not X-B3-TraceId.
So, i think if you add X-Cloud-Trace-Context it would probably work for us, but i'm not sure now which one is best practice. Maybe you can answer which one should be used when running spring-cloud-gcp-starter-trace?

@meltsufin
Copy link
Contributor

Indeed, what I said about X-B3-TraceId being higher precedence is wrong. In fact our reference documentation clearly says so. Sorry about the confusion.
There's clearly a problem here because zipkin-gcp extractor extracts from X-Cloud-Trace-Context but injector only injects the X-B3-TraceId.

@elefeint
Copy link
Contributor

Prioritizing B3 headers in extraction (openzipkin/zipkin-gcp#97) fixed this issue.
Double propagating the headers was split off as a follow-up (openzipkin/zipkin-gcp#98).

@meltsufin
Copy link
Contributor

We do still need to upgrade to the new zipkin-gcp version with the change when it's out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants