Skip to content

Conversation

@cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Oct 8, 2025

Problem

As mentioned by @vsoch in #650, there is currently no way to enforce a limit on an association's ability to run jobs on a certain instance type, which flux-accounting already tracks in its SQLite database by calculating job usage (which is a product of the number of nodes allocated to a job and its duration). However, there is currently no way to enforce this using flux-accounting's mf_priority jobtap plugin.


This PR begins to lay the groundwork for this "usage" limit enforcement by adding a new jobtap plugin (called compute_hours_limits) which, for now, just tracks an association's current usage across all of their running jobs (by calculating the job's anticipated usage by multiplying the job's nnodes by its requested duration) and computes the jobs actual usage when the job completes. The workflow looks something like this:

The user submits a job and specifies its size and duration (or a default duration is set on the job):

$ flux submit -N4 -S duration=3600 my_job

The plugin takes these resource specifications and calculates an expected usage by multiplying the both of these numbers together:

job->expected_usage = counts.nnodes * duration; 

When the job transitions to RUN, the expected_usage is added to the association's current_usage attribute:

current_usage += job->expected_usage;

When the job completes, the job's actual usage is calculated and added to the association's total_usage attribute:

total_usage += (job->nnodes * (t_inactive - job->t_run));

and the expected usage from the job is subtracted from the association's current_usage attribute:

current_usage -= job->expected_usage;

The associations' total_usage attributes can be reset to 0.0 by sending a "clear" rpc to the plugin:

flux.Flux().rpc("job-manager.compute_hours_limits.clear")

To avoid making the plugin code a lot to review, I've only added tracking of an association's job usage in this PR. If this looks like the right way we want to track an association's usage (and eventually enforce a limit), I can submit follow-up PRs to add that functionality.

I've added some basic tests to showcase how the plugin tracks job usage for a given association by submitting one or multiple jobs and ensuring the current_usage and total_usage values are calculated correctly.


still some things to take care of

  • not tracking a job's usage if it is cancelled before it ever runs
  • removing a Job object after it has transitioned to job.state.inactive
  • updating (resetting?) an association's total_usage value so it doesn't just increase indefinitely

@cmoussa1 cmoussa1 changed the title [WIP] plugin: add new jobtap plugin to track job usage across associations' running jobs [WIP] plugin: add new jobtap plugin to track job usage across associations' jobs Oct 8, 2025
Problem: As mentioned in flux-framework#650, there is a need to want to enforce a
limit on an association's ability to run jobs on a certain instance
type, which flux-accounting already tracks in its SQLite database by
calculating job usage (which is a product of the number of nodes
allocated to a job and its duration). However, there is currently no way
to enforce this using flux-accounting's mf_priority jobtap plugin.

Begin to lay the groundwork for this limit enforcement by adding a new
jobtap plugin called compute_hours_limits, which for now, just tracks
an association's current usage across all of their running jobs and adds
the actual usage of the job to the association's total_usage value when
jobs complete.
Problem: There is no way to send flux-accounting database information to
the compute_hours_limits plugin.

Add a command that extracts flux-accounting database information and
packs it into JSON objects to be sent over to and unpacked by the
compute_hours_limits plugin.
Problem: There are no tests for the compute_hours_limits plugin.

Add some basic tests.
@cmoussa1 cmoussa1 force-pushed the compute.hours.limits branch from e3a7f50 to 3b4ed80 Compare October 9, 2025 00:07
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 76.28866% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.02%. Comparing base (2b428e4) to head (3b4ed80).

Files with missing lines Patch % Lines
src/plugins/compute_hours_limits.cpp 76.28% 46 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #770      +/-   ##
==========================================
- Coverage   83.56%   83.02%   -0.54%     
==========================================
  Files          27       28       +1     
  Lines        2421     2615     +194     
==========================================
+ Hits         2023     2171     +148     
- Misses        398      444      +46     
Files with missing lines Coverage Δ
src/plugins/compute_hours_limits.cpp 76.28% <76.28%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant