-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-Task cpu time metric #56320
base: master
Are you sure you want to change the base?
Add per-Task cpu time metric #56320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I believe you may need to add the reporting functions to public and docs, and the PR to NEWs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I'm not too satisfied with the metrics being directly in jl_task_t
; hopefully @vtjnash has a suggestion for what to do there.
We now have to separate out spin time, wait time, compilation time, and GC time (in another PR I guess) and the task CPU time will be actual user CPU time.
base/timing.jl
Outdated
This metric is only updated when the task yields or completes. | ||
See also [`task_wall_time_ns`](@ref). | ||
|
||
Will be `UInt64(0)` if task timings are not enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this method should throw if timings are not enabled for this task instead of returning 0, which is a valid return value. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or missing
or nothing
are also possible options for that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO you should not use unsigned integers for representing positive values - you should use Int64
for this instead.
It's better in julia than it is in C, since julia's defaults are less dangerous, but using unsigned ints for positive numeric values is a consistent source of bugs in C programs, and i think should be still frowned upon in julia. (https://jacobegner.blogspot.com/2019/11/unsigned-integers-are-dangerous.html)
IMO Unsigned ints should be used only for things like hashes, identifiers, addresses, and possibly for indexes but even for indexes i'd use signed ints.
But anyway, another benefit is that this also gives -1
as a possible return value for the unset case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've so far opted for returning an Int
when enabled and nothing
otherwise. I think nothing
is maybe nicer because it'll force an error if you try to use it like an Int
.
That said, these functions currently have _ns()
names, similar to cumulative_compile_time_ns()
and gc_time_ns()
and time_ns()
which all return UInt64
, so don't love the inconsistency
I also moved these all to Experimental
so we can change these APIs as we learn more about what makes sense (as we add more metrics).
src/jltypes.c
Outdated
"is_timing_enabled", | ||
"first_enqueued_at", | ||
"last_dequeued_at", | ||
"cpu_time_ns", | ||
"wall_time_ns"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these go in a separate struct called metrics
or something @vtjnash? We won't be able to initialize that struct type before this, so it would have to be jl_any_type
, but that would mean we'd have to alloc this in jl_new_task
. I don't know how/if we can do an embedded struct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a separate struct inlined here, but I don't see much value to that. More valuable would probably be marking each of these as atomic though (in jltypes.c) so that observing them isn't UB. Most reads can probably be monotonic, but some writes are probably release and some reads might be acquires
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay on the separation concern. And yeah, I guess it is necessary for these to be atomic. Will discuss with @nickrobinson251.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to atomics. But i wasn't sure from the docs how to select the correct memory ordering (so put :monotonic
everywhere for now) -- would love advice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash would you be able to take a look and advise on which orderings to use for the various atomic reads and writes in the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge it with the current orderings.
It'd be helpful to add an example of using this to the original post. |
src/task.c
Outdated
@@ -1146,6 +1146,11 @@ JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, jl_value_t *completion | |||
t->ptls = NULL; | |||
t->world_age = ct->world_age; | |||
t->reentrant_timing = 0; | |||
t->is_timing_enabled = jl_atomic_load_relaxed(&jl_task_timing_enabled) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, so once a Task is spawned, enabling/disabling the feature doesn't affect live tasks?
So i guess if we wanted to enable this e.g. in @time
, we would have to explicitly set this field on the Task struct? 🤔 I think that seems okay to me. Either you enable the global field before spawning any Task of interest, or you set the field on the Task when you want to start timing it. 👍 Coolcool.
🤔 And actually, i'm not sure if these numbers would get screwed up if you enable it while a task is running. So maybe you really do need to do this only at Task creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coooooooool! 😁 Looks great from my perspective!
I'll let everyone else finish their reviews and your responses and such, but this LGTM from my perspective. 😎
for the running task only. Other tasks report metrics that are non-continuous but updated only on task switches.
80b52fa
to
466f189
Compare
"metrics_enabled", | ||
"first_enqueued_at", | ||
"last_started_running_at", | ||
"cpu_time_ns", | ||
"finished_at"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to instead store this state out-of-line and just have a single pointer here?
You could be cute and have a Union{Nothing,Metric} for the metric enabled query.
Alternativly these values could live in task-local-storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We originally proposed that, yeah, but they were moved inline. I don't remember why. Maybe to dodge the extra cache-miss on the task switches?
CC @vtjnash
JL_DLLEXPORT void jl_task_metrics_enable(void) | ||
{ | ||
// Increment the flag to allow reentrant callers. | ||
jl_atomic_fetch_add(&jl_task_metrics_enabled, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a global, an alternative option here would be a scoped value.
That would avoid having to deal with underflow and global state and allow for precise measuring of task-metrics for sub-regions of the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested the same, but i think the concern was whether scoped values are accessible from C, if I remember right?
Are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call a Julia function from C that performs that query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that not going to be slow?
Since this has to happen on every task switch.
Co-authored-by: Valentin Churavy <[email protected]>
…ot." This reverts commit d23a96f.
fff2c91
to
22dd957
Compare
22dd957
to
5b58e04
Compare
!!! compat "Julia 1.12" | ||
This method was added in Julia 1.12. | ||
""" | ||
function task_cpu_time_ns(t::Task=current_task()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name seems like it might be a bit mis-leading - What about task_running_time_ns()
?
As the note you added makes clear, this is the time that the task has been "in the running state" (i.e. scheduled by Julia) but not the time the task has actually spent running on-cpu (i.e. scheduled by the OS)
It's plausible to me that one day we might want to hook up perf
with our scheduler so that you can get the actual instruction / branch / tsc spent on-core, so it's probably good to leave room for the future names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like good feedback to me. 👍
I'm supportive of a name change.
I like task_running_time_ns()
. I also like task_executing_time_ns()
or maybe task_scheduled_time_ns()
?
I think task_running_time_ns
is the best I've seen so far.
Close #47351 (builds on top of #48416)
Adds to
Task
a metric for scheduled CPU time and wall timeI have put
record_cpu_time!
inwait()
, so there is a singular place this is recorded by the scheduler, plus inyield(t)
,yieldto(t)
andthrowto(t)
(which skip the scheduler).Channel
,ReentrantLock
,Event
,Timer
,Semaphore
are all implemented in terms ofwait(Condition)
, which in turn callswait()
. AndLibuvStream
similarly callswait()
. So hopefully this captures everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hitwait()
).The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in
jl_task_t
. They are all atomic, except themetrics_enabled
flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and markedconst
on the julia side.I'd love to add these metrics in a way that is extendable, so that in future PRs we could add more per-task metrics (e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially count the number of times it yielded).
Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by
@time
(e.g. writing this same timing data to a ScopedValue like in #55103 but only for tasks lexically scoped to inside the@time
block).Timings are off be default but can be turned on globally via starting julia with
--task-metrics=yes
or callingBase.Experimental.task_metrics(true)
. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings viaBase.Experimental.task_metrics
does not affect existingTask
s, only newTask
s.The other new APIs are
Base.Experimental.task_cpu_time_ns(::Task)
andBase.Experimental.task_wall_time_ns(::Task)
for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). Names are up for discussion. I've put these all inBase.Experimental
to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines).cc @NHDaly @kpamnany @d-netto