Skip to content

Inform user of dropped events #8572

@cardil

Description

@cardil

Problem
Whenever events are being dropped by event mesh, we should inform the end-user about that. Just logging is not enough for such a case. IMHO, the best to create a dedicated warning level Kubernetes event, which is designed for such a case: https://www.cncf.io/blog/2023/03/13/how-to-use-kubernetes-events-for-effective-alerting-and-monitoring/

Background
With the recent addition of the EventTransform API, the chance of misconfiguration by the end-user raises greatly, as they could easily create infinite loops by transforming the source event in a way the original trigger matches again. The TTL mechanism should eventually break that loop, but in that case, we should inform the end-user about it.

Persona:
System Operator, Developer

Exit Criteria
The end-user could easily identify the Event Mesh configuration is invalid, and some messages are getting dropped. The best would be to allow use of well known K8s monitoring tools - using K8s events should be adequate.

Time Estimate (optional):
5d (events are dropped in number of places across the codebase)

Additional context (optional)
My proposal to solve this is to reconcile a Kubernetes event whenever such a situation occurs. Such an event may look like:

apiVersion: events.k8s.io/v1
kind: Event
eventTime: 2025-04-23T09:09:54Z
metadata: 
  namespace: user-evening-bits-namespace
  name: knative-eventing-mt-broker.1838e7822e31b835
  labels:
    eventing.knative.dev/event-type: my-event-type
    eventing.knative.dev/event-source: my-event-source
regarding: 
  apiVersion: eventing.knative.dev/v1
  kind: Broker
  name: default
  namespace: user-evening-bits-namespace
type: Warning
action: event-dropped
reason: EventLoop
note: Event of type "my-event-type" and source "my-event-source" has reached internal TTL, which most likely signals an event loop. The event was dropped.
series: 
  count: 351
  lastObservedTime: 2025-04-23T09:09:54Z

Notice the series.count. It should be bumped whenever "same event" occurs again. In this case, the reconciler should match the K8s events using metadata.labels of eventing.knative.dev/event-type and eventing.knative.dev/event-source, and bump the series.count when the next message is being dropped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.kind/feature-requesttriage/acceptedIssues which should be fixed (post-triage)

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions