Investigate Observability Options

Cylc comprises of a distribution of systems and, as such, if there is a bottleneck anywhere, then this can be difficult to pinpoint. Also, one of the key objectives of observability is to, not only see there is a problem, but to facilitate the discovery of **where** the problem has occurred.

Observability offers a detailed view of the internals of a software system - and Open Telemetry offers a standardised way of looking at traces. 

By using a standard that is not tied to any language or platform, we can easily send traces from all parts of the system e.g. seeing the flow from `cylc-flow` to the `ui-server`, through to the `ui` itself.

This would be independent of our current logging. We _could_ look at Open Logging and Open Metrics in the future when these standards are finalised also.

Using Open Telemetry and not a proprietary logging method, the users are free to send all telemetry to tracing aggregating tools of their choice as those increasingly support Open Telemetry framework; for example Zipkin and Jaeger. 

So, for example, we may be able to set up spans such that, for example, for a request, we can display a detailed view of how time is spent on each process - a Gantt chart that you could drill down into to spot any bottlenecks.
  
Although not a current priority, this may be worth some investigation once things settle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate Observability Options #132

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigate Observability Options #132

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions