Skip to content

Investigate Observability Options #132

@datamel

Description

@datamel

Cylc comprises of a distribution of systems and, as such, if there is a bottleneck anywhere, then this can be difficult to pinpoint. Also, one of the key objectives of observability is to, not only see there is a problem, but to facilitate the discovery of where the problem has occurred.

Observability offers a detailed view of the internals of a software system - and Open Telemetry offers a standardised way of looking at traces.

By using a standard that is not tied to any language or platform, we can easily send traces from all parts of the system e.g. seeing the flow from cylc-flow to the ui-server, through to the ui itself.

This would be independent of our current logging. We could look at Open Logging and Open Metrics in the future when these standards are finalised also.

Using Open Telemetry and not a proprietary logging method, the users are free to send all telemetry to tracing aggregating tools of their choice as those increasingly support Open Telemetry framework; for example Zipkin and Jaeger.

So, for example, we may be able to set up spans such that, for example, for a request, we can display a detailed view of how time is spent on each process - a Gantt chart that you could drill down into to spot any bottlenecks.

Although not a current priority, this may be worth some investigation once things settle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions