Skip to content

Comments

Add alarm to cut trigger on sentry app webhook timing out#108563

Open
Christinarlong wants to merge 10 commits intomasterfrom
crl/webhook-alarm
Open

Add alarm to cut trigger on sentry app webhook timing out#108563
Christinarlong wants to merge 10 commits intomasterfrom
crl/webhook-alarm

Conversation

@Christinarlong
Copy link
Contributor

@Christinarlong Christinarlong commented Feb 19, 2026

This PR is in a stacked chain of
crl/add-breaker-to-webhook-sending -> crl/add-ff-to-breaker -> crl/impl-circuit-breaker -> crl/webhook-alarm -> master

Summary

  • Adds a SIGALRM-based hard timeout for webhook requests to prevent indefinite hangs when safe_urlopen sockets can't reach or resolve the target URL
  • Reuses the existing timeout_alarm context manager from sentry.taskworker.workerchild
  • Gated behind the organizations:sentry-app-webhook-hard-timeout feature flag and configurable via the sentry-apps.webhook.hard-timeout.sec option (default 5s)

Details

  • Introduces WebhookTimeoutError (a BaseException subclass) and a _handle_webhook_timeout signal handler in webhooks.py
  • When the feature flag is enabled, wraps the safe_urlopen call in timeout_alarm so requests that exceed the hard timeout are interrupted
  • Records a HARD_TIMEOUT halt reason on the event lifecycle when the alarm fires
  • Adds the new HARD_TIMEOUT variant to SentryAppWebhookHaltReason

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 19, 2026
@Christinarlong Christinarlong changed the title Add circuit breaker to track sentry app webhook failures Add alarm to cut trigger on sentry app webhook timing out Feb 19, 2026
With Exception, ignore_unpublished_app_errors swallows timeouts for
unpublished (dev/test) apps, which is the correct behavior — the
lifecycle halt metric still fires before the re-raise. Tests updated
to use published=True so the exception propagates for assertion.
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link
Member

@GabeVillalobos GabeVillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of a philosophical question, but could we change the DNS resolution timeout instead, or make it an optional parameter to configure? This seems to sidestep the core issue a little bit.

Comment on lines +130 to +136
with timeout_alarm(timeout_seconds, _handle_webhook_timeout):
response = safe_urlopen(
url=url,
data=app_platform_event.body,
headers=app_platform_event.headers,
timeout=options.get("sentry-apps.webhook.timeout.sec"),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of discerning between this and the task-level timeout we already enforce? Is this covering a different code path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants