-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easy to code failure cases when using core annotations #52
Comments
made up a new distribution for those who own zipkin instrumentation. feel free to ignore invite if I mis-tagged you, or ask for an invite if I missed you @openzipkin/instrumentation-owners |
some things that might not be immediately obvious to folks who don't use normal zipkin spans
|
Particularly, this flags existing finagle formatted errors, like so: "Server Send Error: TimeoutException: socket timed out" Fixes #1472 See openzipkin/openzipkin.github.io#52
Particularly, this flags existing finagle formatted errors, like so: "Server Send Error: TimeoutException: socket timed out" Fixes #1472 See openzipkin/openzipkin.github.io#52
FYI, summary of right now is most use normal core annotations, and just add an "error" tag/binary annotation when there's an error. Use of timeline annotations to add transient errors isn't common, but not discouraged either. |
Right now, most instrumentation do not model spans properly on error. For example, they log "sr" and "ss", but don't know what to do, or worse don't handle the error case. While we have a new "error" annotation, I think we should help people understand what to do with this, ideally learning from the past, or even adapting towards it.
Right now, Finagle addresses failure annotations consistently. They have a simple design that looks like this:
These work for client and server spans, as well tasks within a span, such as wire send or wire fragment send. Notably, the failure annotation is based on the context. ex there's a different annotation prefix for a failed server span vs a failed wire send attempt within a span. In finagle, all error annotations include the word "Error".
For example, for a server span, the variables apply like this:
Using a simplified format for annotations as
{timestamp, annotation}
Say a client-server span succeeded, you have the following in finagle:
Say the server-side failed, but was able to send a failed response to the client. I'm making up exceptions here..
This is even more helpful when annotation bookends that are not modeled as a span are used, for example when a wire fragment failed to send.
I believe that we should encourage Finagle's approach. Logging a useful formatted string as opposed to the word "error" as an annotation, and always close core spans with the bookends.
This mostly requires no change to zipkin, except backfilling some constants that are in finagle's version of the thrift, and changing the ui to stain spans yellow when an annotation contains "error" vs being exactly "error". We can leave the binary annotation key of "error" alone.
Backfilling error constants could help in @rogeralsing's work, where he's trying to color spans according to annotations.
cc @mosesn @sveinnfannar @eirslett @kristofa @virtuald @openzipkin/cross-implementation-team-zipkin-pinpoint-htrace-etc
See openzipkin/zipkin#1140
Longer note on why not make this json!
Zipkin's model was designed for simple human-readable string annotations. While tempting to try and structure the error into its components (type, class, etc) in a json ball, doing so would literally be json inside json.
Even if we changed the ui to accommodate json embedded in a string, the raw json logged into storage (like elasticsearch or cassandra) would be garbled, making other tooling (like kibana, cql queries etc) worse.
Please resist the temptation to structure errors as escaped json and instead format them in some readable way.
Example impact of reporting an formatted string as opposed to a json encoded value
Here's an example real span as most zipkin libraries would report it:
And here's how terrible it looks when encoding json in json string, something that some OpenTracing libraries are starting to do.
What to do when a tracer doesn't accept strings, and instead accepts a raw map
Using normal text formatting for annotation values allow existing deployed zipkin installations to have the best experience, as they allow users to easily read json directly via api or the UI, and allow storage queries and tooling to work in the simplest manner. Tracers should always format annotations in a storage and user-friendly String before reporting to zipkin.
Dapper-based tracers only log strings, so there's no concern if using one of those. For example, finagle, brave, htrace, zipkin-csharp, pyzipkin etc don't accept structured logs (or require users or libraries to format them before sending to zipkin!)
Notably, OpenTracing recently changed to prefer logging raw maps while luckily still retaining the string log function. Unfortunately, users often don't act on tracers directly, so it isn't in their direct control which api to use.
OpenTracing zipkin bindings should include a formatting hook so that Map annotations can be stringified before reporting to zipkin. If they don't, they should mark clearly in their README the practices used to avoid json wrapped in a string. Either approach puts tradeoffs in the hands of the caller, before they accidentally start filling zipkin with data like this.
openzipkin-attic/zipkin-ui#11
The text was updated successfully, but these errors were encountered: