Skip to content

Conversation

@fntlnz
Copy link
Collaborator

@fntlnz fntlnz commented Dec 18, 2025

Adds numSpotInterruptions field to trace records for Tower/Platform telemetry:

  • AWS Batch: detect spot interruptions via "Host EC2*" status reason
  • Google Batch: detect preemptions via exit code 50001

(cherry picked from commit eecd816)

@fntlnz fntlnz force-pushed the backport/6606-spot-interruption-tracking branch from bf2119b to db6e9a4 Compare December 18, 2025 11:24
Track and report spot/preemptible instance interruptions for cloud batch executors.

Changes:
- Add `numSpotInterruptions` transient field to TraceRecord
- AWS Batch: detect spot interruptions by checking status reason pattern "Host EC2*"
- Google Batch: detect spot preemptions via exit code 50001 in status events
- Tower plugin: send numSpotInterruptions to Seqera Platform telemetry

This enables workflow optimization and cost analysis by tracking how often
tasks are retried due to spot instance reclamation.

(cherry picked from commit eecd816)
Signed-off-by: Lorenzo Fontana <[email protected]>
@fntlnz fntlnz force-pushed the backport/6606-spot-interruption-tracking branch from db6e9a4 to b62f0e6 Compare December 18, 2025 12:58
@fntlnz fntlnz requested a review from munishchouhan December 18, 2025 13:02
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good, just not merge yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants