-
Notifications
You must be signed in to change notification settings - Fork 5
fix: notification event payload #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d8f62af to
31512d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the notification event payload structure to support notification idempotency by separating the notification ID from the resource ID and adding automatic ID generation.
Key changes:
- Splits the payload's
IDfield intoID(for notification identification) andResourceID(for resource identification) - Adds
WithIDSet()method to generate deterministic notification IDs based on notification context - Updates all references throughout the codebase to use the appropriate ID field
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| notification/send.go | Refactors NotificationEventPayload struct to separate notification and resource IDs, adds ID generation logic |
| notification/job.go | Updates logging and event processing to use ResourceID instead of generic ID |
| notification/events.go | Updates all notification processing logic to use the correct ID fields and calls WithIDSet() for ID generation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
31512d2 to
819e0c8
Compare
0948ae5 to
d1271f6
Compare
f10666c to
638b796
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
notification/events.go (1)
329-331:⚠️ Potential issue | 🟡 MinorUse
duty.Now()instead oftime.Now()for theNotBeforetimestamp.Line 331 uses
time.Now()which should beduty.Now()per project conventions for database timestamps.Suggested fix
- NotBefore: lo.ToPtr(time.Now().Add(*n.WaitFor)), + NotBefore: lo.ToPtr(duty.Now().Add(*n.WaitFor)),Based on learnings:
Use duty.Now() instead of time.Now() for database timestamps and soft deletesnotification/job.go (1)
320-325: 🛠️ Refactor suggestion | 🟠 MajorInconsistent event reconstruction: inline construction here vs.
ParentEvent()inprocessPendingNotification.Line 320 constructs the event inline while
processPendingNotification(line 410) usespayload.ParentEvent(). TheParentEvent()method also setsProperties["config_id"]for config.changed/config.updated events, providing a safety net whenpayload.Propertiesis empty. Consider usingParentEvent()here too for consistency.Suggested refactor
- originalEvent := models.Event{Name: payload.EventName, EventID: payload.EventID, CreatedAt: payload.EventCreatedAt} + originalEvent := payload.ParentEvent()
🤖 Fix all issues with AI agents
In `@events/resource.go`:
- Line 42: The doc comment currently mentions the unexported name
buildEventResource while the function is exported; update the comment to use the
exported identifier and proper Go doc format—e.g., change the comment to start
with "BuildEventResource creates an EventResource..." and place it immediately
above the BuildEventResource function to match Go documentation conventions and
the exported symbol.
- Around line 46-60: The DB lookup error handling in the Event resource loader
(involving ctx.DB().Where(...).First(&eventResource.Check),
.First(&eventResource.Canary), etc.) currently replaces all errors with a
generic dutyAPI.ENOTFOUND; update each lookup to distinguish
gorm.ErrRecordNotFound (use errors.Is(err, gorm.ErrRecordNotFound)) from other
DB errors: if record-not-found return dutyAPI.Errorf(dutyAPI.ENOTFOUND, ...) as
before, but for any other err return that original err (or wrap it with context)
so the caller sees real connection/timeout/DB errors; apply the same change to
every call site in this function and keep duty.CheckSummary logic intact.
In `@go.mod`:
- Line 439: Remove or comment out the local replace directive "replace
github.com/flanksource/duty => ../duty" so the module doesn't require a local
checkout; instead update the go.mod requirement for github.com/flanksource/duty
to the published version that contains the changes from duty PR `#1533` (use the
released semver or commit hash for that PR). Ensure the replace line is not left
active and run "go mod tidy" to verify dependency resolution succeeds without
the local path.
In `@incidents/responder/responder_events.go`:
- Around line 61-62: generateCommentAddedAsyncEvent reads event.EventID into
commentID but when building downstream events (the ones that set
Properties["id"]) it never populates the EventID field, causing handlers like
reconcileCommentEvent to receive a zero UUID; update
generateCommentAddedAsyncEvent so each created models.Event has its EventID set
to the original event.EventID (or parsed from commentID) before
publishing/returning, ensuring EventID and Properties["id"] remain consistent
for functions such as reconcileCommentEvent to operate correctly.
- Line 32: The downstream events currently only put the responder ID into
Properties["id"] which lets GORM generate a different api.Event.ID, so
reconcileResponderEvent reads the wrong ID; in generateResponderAddedAsyncEvent
(and the equivalent responder comment event generator) parse the responder ID
string into a UUID and assign it to the api.Event.ID field (instead of leaving
it unset) before creating the event, and handle parse errors; this ensures
reconcileResponderEvent (which uses event.ID/EventID.String() semantics) can
look up the responder correctly.
In `@notification/events.go`:
- Around line 920-924: The code reads configID from event.Properties for events
named api.EventConfigChanged or api.EventConfigUpdated and then calls
query.GetCachedConfig; add an explicit non-empty check for configID before
invoking query.GetCachedConfig to avoid silent failures—if configID == "" skip
the cache lookup (or return/log an error) so you don't call
query.GetCachedConfig with an empty string; reference the event.Name check and
the variables configID and the call to query.GetCachedConfig (and note
ParentEvent() in send.go is not sufficient to guarantee properties) when
applying the guard.
In `@playbook/scheduler_test.go`:
- Around line 447-473: The test name is misleading: the TestCase.Name says
"config.updated" but the test actually creates a models.ConfigChange in
DatabaseChange and asserts api.EventConfigChanged in ExpectedEvents; update the
TestCase.Name to reflect the actual behavior (e.g., "should generate correct
EventResource for organically triggered config.changed event" or similar) so it
matches the DatabaseChange logic and ExpectedEvents in the testEvent invocation.
- Around line 486-488: In the DatabaseChange callback (the anonymous func
DatabaseChange(ctx context.Context, res TestResource) error) replace use of
time.Now() with duty.Now() when updating the deleted_at column (i.e., change
UpdateColumn("deleted_at", time.Now()) to UpdateColumn("deleted_at",
duty.Now())); also ensure the duty package is imported where this test lives so
the symbol duty.Now() resolves.
🧹 Nitpick comments (2)
notification/send.go (1)
80-99:GenerateEventIDsilently discards the error fromhash.DeterministicUUIDand doesn't coverPlaybookIDrecipients.Two observations:
The error from
hash.DeterministicUUIDis silently discarded on line 97. If it ever fails,generatedwould be a zero UUID, potentially causing silent deduplication collisions.When
PlaybookIDis set as the recipient (no Connection, Person, Team, or CustomService),recipientSigremains empty, making the generated ID independent of the recipient. This may be intentional for playbooks (only one run per event), but worth confirming.Suggested improvement for error handling
- generated, _ := hash.DeterministicUUID(sig) - return generated + generated, err := hash.DeterministicUUID(sig) + if err != nil { + // Fallback to a random UUID if deterministic generation fails + return uuid.New() + } + return generated,
events/resource.go (1)
67-80: Duplicate switch cases can be consolidated.The
EventConfigHealthy/Unhealthy/Warning/Unknown/Degradedcase (Lines 67-70) and theEventConfigCreated/Deletedcase (Lines 72-75) have identical bodies — both fetch aConfigItembyevent.EventID. Merging them into a single case reduces duplication.Suggested consolidation
- case api.EventConfigHealthy, api.EventConfigUnhealthy, api.EventConfigWarning, api.EventConfigUnknown, api.EventConfigDegraded: - if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.EventID).First(&eventResource.Config).Error; err != nil { - return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.EventID) - } - - case api.EventConfigCreated, api.EventConfigDeleted: + case api.EventConfigHealthy, api.EventConfigUnhealthy, api.EventConfigWarning, api.EventConfigUnknown, api.EventConfigDegraded, + api.EventConfigCreated, api.EventConfigDeleted: if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.EventID).First(&eventResource.Config).Error; err != nil { return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.EventID) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
playbook/events.go (1)
356-357:⚠️ Potential issue | 🟡 MinorUse
duty.Now()instead oftime.Now()for database timestamps.These timestamps are stored in a
PlaybookRunrecord. Per coding guidelines, database timestamps should useduty.Now().♻️ Proposed fix
- StartTime: lo.ToPtr(time.Now()), - EndTime: lo.ToPtr(time.Now()), + StartTime: lo.ToPtr(duty.Now()), + EndTime: lo.ToPtr(duty.Now()),(Ensure
dutyis imported:"github.com/flanksource/duty")As per coding guidelines, "Use
duty.Now()instead oftime.Now()for database timestamps and soft deletes".
🧹 Nitpick comments (3)
playbook/events.go (1)
219-226: Fallback chain is reasonable but verify behavior when all lookups are empty.If
playbook_id,id, andEventIDare all zero/empty,playbookIDwill be"00000000-0000-0000-0000-000000000000", and the query on line 287 will return a "record not found" error. This is safe but produces a less descriptive error than an explicit validation. Consider adding a guard.notification/send.go (1)
81-100:GenerateEventIDproduces a good deterministic signature, butCustomService.Namealone may not be unique.If two custom notification services share the same
Namebut differ in configuration (URL, connection, etc.), they'd produce the sameEventID, potentially colliding. This is likely a very rare edge case given that Name is typically unique within a notification's custom services.playbook/actions/ai.go (1)
291-296: The"id"property is redundant alongside"playbook_id".Both
"id"and"playbook_id"are set toplaybook.ID.String(). Inplaybook/events.go(lines 220-226),onNewRunchecksplaybook_idfirst, then falls back toid. The fallback exists for backward compatibility with events from other sources, but no other code in the codebase directly consumesProperties["id"]. Removing"id"from this code is safe and would clarify thatplaybook_idis the canonical property.
AI actions that trigger multiple context-provider playbooks all used the same ActionID as event_id. Since event_queue has a unique constraint on (name, event_id), the second child playbook insert would fail with a constraint violation, silently preventing all but the first child run. Derive a deterministic UUID from (ActionID, target playbook ID) so each child gets its own dedupe slot while remaining idempotent on retry.
Plain Create(&event) on the playbook.run insert means a duplicate dispatch replay (e.g. async consumer tx.Commit failure) would hard-fail with a unique constraint error instead of being handled gracefully. Switch to EventQueueOnConflictClause so replays reset attempts and created_at rather than crashing.
Include source trigger context in notification.send event_id generation to reduce unintended collisions across event types and recipient configs. Use EventQueueOnConflictClause when AI actions enqueue playbook.run events so retries are idempotent under (name,event_id) uniqueness.
df59fbf to
665f6c8
Compare
WalkthroughAdds a stable EventID to the Event model and updates event creation, queuing, resource resolution, notification payloads, playbook actions, and responder logic to use EventID and ResourceID. Introduces events.BuildEventResource and centralizes event-queue conflict handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Source as Event Source
participant DB as Database
participant Events as events.BuildEventResource
participant Notif as Notification Service
participant Responder as Responder/Playbook
Source->>DB: Create models.Event (includes EventID)
DB-->>Source: Persisted event (queued)
Events->>DB: Load related resources by EventID / properties
DB-->>Events: Return Component/Config/Check/Canary
Events->>Notif: Provide EventResource / map
Notif->>DB: Create NotificationEventPayload (ResourceID + EventID) and ParentEvent()
Notif->>DB: Enqueue/send notification using EventID
Responder->>DB: Lookup responder/comment using EventID
DB-->>Responder: Return responder/comment records
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
incidents/responder/responder_events.go (1)
94-119:⚠️ Potential issue | 🟠 MajorMultiple responders of the same type will collide in the event_queue due to identical (Name, EventID) pairs.
When iterating over responders for comment events (lines 94–119), all events of the same type share the same
(Name, EventID). TheON CONFLICTclause resetsattempts,last_attempt,created_at, anderror, but does not updateProperties, so only the first responder'sresponder_idis preserved. Currently,reconcileCommentEventre-queries all responders from the database, masking the issue. However, the TODO at line 174 indicates an intended refactoring to useresponder_idfrom event properties, which would expose this as a critical bug where only one responder per type is notified.The same concern applies to responder-added events (lines 46–63), though having multiple integrations of the same type is less common there.
🤖 Fix all issues with AI agents
In `@events/resource.go`:
- Around line 47-67: The Find() calls are passing pointers to potentially nil
struct fields (e.g., eventResource.Check and eventResource.Canary), causing
possible nil-pointer panics when you later access .ID; instead, allocate local
non-nil variables (e.g., var chk models.Check; var canary models.Canary) and use
ctx.DB().Where(...).Limit(1).Find(&chk) (or First(&chk) and check for
gorm.ErrRecordNotFound), validate chk.ID against uuid.Nil and only then assign
eventResource.Check = chk (same pattern for eventResource.Canary,
eventResource.Component, and eventResource.Config); also preserve the existing
duty.CheckSummary call (use checkID.String()) and return the same dutyAPI errors
when records are missing.
🧹 Nitpick comments (2)
events/resource.go (2)
77-99: Config health events and config created/deleted share identical lookup logic.Lines 77-83 and 85-91 are exact duplicates — both look up
ConfigItembyevent.EventID. Consider merging these cases to reduce duplication:♻️ Suggested consolidation
- case api.EventConfigHealthy, api.EventConfigUnhealthy, api.EventConfigWarning, api.EventConfigUnknown, api.EventConfigDegraded: - if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.EventID).Limit(1).Find(&eventResource.Config).Error; err != nil { - return eventResource, err - } - if eventResource.Config.ID == uuid.Nil { - return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.EventID) - } - - case api.EventConfigCreated, api.EventConfigDeleted: - if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.EventID).Limit(1).Find(&eventResource.Config).Error; err != nil { - return eventResource, err - } - if eventResource.Config.ID == uuid.Nil { - return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.EventID) - } + case api.EventConfigHealthy, api.EventConfigUnhealthy, api.EventConfigWarning, api.EventConfigUnknown, api.EventConfigDegraded, + api.EventConfigCreated, api.EventConfigDeleted: + if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.EventID).Limit(1).Find(&eventResource.Config).Error; err != nil { + return eventResource, err + } + if eventResource.Config.ID == uuid.Nil { + return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.EventID) + }
93-99: Missing validation:config_idproperty must exist forconfig.changed/config.updatedevents.If
event.Properties["config_id"]is empty (missing key), the queryWhere("id = ?", "")will match zero rows and return a generic "not found" error. Consider validating the property exists upfront to provide a more actionable error message.🛡️ Suggested validation
case api.EventConfigChanged, api.EventConfigUpdated: + configID := event.Properties["config_id"] + if configID == "" { + return eventResource, dutyAPI.Errorf(dutyAPI.EINVALID, "config_id property is required for %s event", event.Name) + } - if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", event.Properties["config_id"]).Limit(1).Find(&eventResource.Config).Error; err != nil { + if err := ctx.DB().Model(&models.ConfigItem{}).Where("id = ?", configID).Limit(1).Find(&eventResource.Config).Error; err != nil { return eventResource, err } if eventResource.Config.ID == uuid.Nil { - return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", event.Properties["config_id"]) + return eventResource, dutyAPI.Errorf(dutyAPI.ENOTFOUND, "config(id=%s) not found", configID) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@notification/events.go`:
- Around line 849-851: The DB error returned by ctx.DB().Where("id = ?",
event.EventID).Find(&comment).Error is being discarded; update the error return
in the block that queries for comment (using ctx.DB().Where / Find and
event.EventID) to include and wrap the original err (e.g., use fmt.Errorf with
%w or include err in the formatted message) so the underlying database error is
preserved when returning from this function.
merge after: flanksource/duty#1533
resolves: flanksource/duty#1770
Summary by CodeRabbit
New Features
Refactor
Tests