Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Add Support for AWS Batch #612

Open
cschwartz1020 opened this issue Dec 11, 2024 · 2 comments
Open

[batch] Add Support for AWS Batch #612

cschwartz1020 opened this issue Dec 11, 2024 · 2 comments
Labels
feature-request New feature

Comments

@cschwartz1020
Copy link

Feature scope

AWS Batch

Describe your suggested feature

Feature request is for an AWS Batch Monitoring construct

@cschwartz1020 cschwartz1020 added the feature-request New feature label Dec 11, 2024
@echeung-amzn
Copy link
Member

Do you have particular alarms and dashboard widgets that you think would make sense for Batch users?

@echeung-amzn echeung-amzn changed the title [AWS Batch] Add Support for AWS Batch [batch] Add Support for AWS Batch Dec 13, 2024
@cschwartz1020
Copy link
Author

Do you have particular alarms and dashboard widgets that you think would make sense for Batch users?

The most basic requirement would be widgets which show the number of Batch Job instances in any given status (SUBMITTED, PENDING, RUNNABLE, STARTING, RUNNING, SUCCEEDED, FAILED) for a given Job Queue or Job Definition.

However, I do understand this would likely be a large effort given that these metrics are currently not even sent to CloudWatch (i.e. there's no Batch CW namespace--no native metrics or CW integration). I have seen this solved before via EventBridge rules which route Batch Job State Change event detail types to an SNS Topic target, and from there you can track the AWS/SNS namespace "NumberOfMessagesPublished" metric. Although, this is somewhat of heuristic as it tells you how many jobs entered a given state during a period as opposed to how many jobs are in a given state. Regardless, it would be nice to have a construct that takes care of all this heavy lifting for you via .monitorBatchJob(..). It would also be nice to add a dimension of EC2 Instance Type, so you can see how workloads are spread across the instances configured on the Batch ComputeEnvironment.

Beyond that, it would be nice to have basic CPU/GPU (mem/util) metric widgets from the nodes on the underlying ECS/EKS cluster powering the Batch ComputeEnvironment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature
Projects
None yet
Development

No branches or pull requests

2 participants