-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aperf: Add support for analytics #241
Conversation
src/lib.rs
Outdated
@@ -180,6 +187,7 @@ impl PerformanceData { | |||
let meta_data_handle = fs::OpenOptions::new() | |||
.create(true) | |||
.write(true) | |||
.truncate(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given it's an error when any output directory already exists, in what scenario is this code encountering an existing file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed. It was a suggestion from a different version of clippy.
src/utils.rs
Outdated
pub p50: f64, | ||
pub mean: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mean and p50 are virtually interchangeable, I'd dump p50 and add p90.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Changing it to p90.
src/html_files/analytics.ts
Outdated
|
||
class Finding { | ||
text: string; | ||
status: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the type 'string' instead of 'Status'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Good catch. It's simpler to have the Status everywhere.
src/html_files/analytics.ts
Outdated
function is_unique_map(values_map) { | ||
return new Set([...values_map.values()]).size == 1; | ||
} | ||
|
||
function is_unique_array(values_array) { | ||
return new Set(values_array).size == 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are odd definitions of 'unique'. It suggests that somewhere else you're using the wrong data type if you have to call into something to de-duplicate and check the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We're using arrays everywhere now. We were using arrays and maps. This is cleaner.
src/html_files/cpu_utilization.ts
Outdated
rules: [ | ||
{ | ||
name: "User", | ||
func: function (ruleOpts: RuleOpts) { | ||
let system_util = get_data_key(ruleOpts.data_type, "System"); | ||
let findings = []; | ||
let init_key = ruleOpts.runs[0]; | ||
let init_total_util: number = ruleOpts.per_run_data.get(init_key) + system_util.get(init_key); | ||
for (const [key, value] of ruleOpts.per_run_data) { | ||
if (key == init_key) { | ||
continue; | ||
} | ||
let run_total_util: number = value + system_util.get(key); | ||
let cpu_diff = Math.ceil(Math.abs(run_total_util - init_total_util)); | ||
findings.push(new Finding( | ||
`Average CPU Utilization difference between ${init_key} and ${key} is ${cpu_diff}%.`, | ||
cpu_diff > 10 ? Status.NotGood : Status.Good, | ||
)); | ||
} | ||
return findings; | ||
}, | ||
good: "", | ||
bad: "", | ||
}, | ||
{ | ||
name: "idle", | ||
func: function (ruleOpts: RuleOpts) { | ||
let findings = []; | ||
let init_key = ruleOpts.runs[0]; | ||
for (const [key, value] of ruleOpts.per_run_data) { | ||
if (key == init_key) { | ||
continue; | ||
} | ||
let idle_diff = Math.abs(ruleOpts.per_run_data.get(key) - ruleOpts.per_run_data.get(init_key)); | ||
if (idle_diff > 10) { | ||
findings.push(new Finding( | ||
`Difference in Average 'Idle time' between ${key} and ${init_key} is ${idle_diff}.`, | ||
)); | ||
} | ||
} | ||
return findings; | ||
}, | ||
good: "", | ||
bad: "", | ||
} | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rules don't start turning into copy-and-paste-fests of eachother, start refactoring this now so that things as simple as comparing aggregates across runs is not duplicated in multiple rules's 'func's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm splitting it up into 3 types of rules.
- Runs == 1, Rules to point out things. Rules are implemented in single_run_rules[].
- Runs > 1,
a. Set base run = Runs[0]. Iterate over the other runs and pass in base_run and other run to a function. Prevents needing repeat for loops for all rules. Rules are implemented in per_run_rules[].
b. Provide all details to a different set of functions which needs all the details. Rules are implemented in all_run_rules[].
src/html_files/system_info.ts
Outdated
func: function (ruleOpts: RuleOpts) { | ||
return is_unique_map(ruleOpts.per_run_data); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpu utilization rules are returning arrays of Findings, and this is returning boolean. Lock this function signature down or it will be a nightmare to maintain having to support a wide variety of return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I'm changing it so that only Finding should be returned.
This is in preparation for adding rules for SystemInfo and CPU Utilization.
Form the metrics which will be used by the front-end for analytics work. Generate it for SystemInfo and CPU Utilization.
Attached a newer version of the report. The SUT Config and Findings are now clubbed together in a single tab. |
pub mean: f64, | ||
} | ||
|
||
impl Stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will only saving P99, P90 and mean be flexible enough for people writing rules in Javascript? Could this be a histogram instead that is lighter weight to traverse than the full data-set but you can reconstitute any stat you wanted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which stat do you anticipate we would need to have? We can calculate any stat in the backend and have it brought to the UI. The other approach is to have all the data points brought to the front-end and let the user decide which works for them. That will be a rework of the way we are doing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipate this could be any stat: trimmed mean, different percentiles etc. The frontend already has all the data present correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now all aggregations are computed at report generation time. The frontend has the data (required in order to graph the data), but it is not computing these aggregates itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend mirroring p99 and p90 with p01 and p10. I expect there will be some counters you'd like to see when they fall off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This structure can be updated as needed with any stats we need in the future.
enum Status { | ||
Good = '✅', | ||
NotGood = '❌', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that soon you are going to discover there is frequently a grey area between good and evil NotGood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We can add an 'Ehhh' when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an item for "Mediocre" sooner rather than later. I don't want us to be handling tickets with NotGood cpu at 49%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we add a rule for 'Mediocre' we can add that item. I don't want to add code which isn't being explicitly used.
pub mean: f64, | ||
} | ||
|
||
impl Stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend mirroring p99 and p90 with p01 and p10. I expect there will be some counters you'd like to see when they fall off.
src/html_files/cpu_utilization.ts
Outdated
let cpu_utilization_rules = { | ||
data_type: "cpu_utilization", | ||
pretty_name: "CPU Utilization", | ||
single_run_rules: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long before these are config files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the scenario you see where users can provide their own rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm just thinking about ease of adding/subtracting rules. For these simple ones it's basically just "metric" and "threshold" and maybe "reduction function". Seems kind of silly to make a new function for each of them. If we had a more reliable way to look up the data it might be easier to break these into some sort of config structure.
On the other hand more complex rules probably deserve their own function.
Config file probably isn't right (especially as I'm realizing this is typescript not rust), but maintaining many different very similar functions is going to become painful.
src/html_files/cpu_utilization.ts
Outdated
}, | ||
} | ||
], | ||
per_run_rules: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still quite a bit of copy-paste here, could make more sense to have 1 rule, but multiple functions to implement the logic for each way to compare the run(s) to each other. Then the rule running engine just iterates over the functions in each rule to generate the report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it to have a per_run, all_run and a single_run attached to one rule.
src/html_files/utils.ts
Outdated
return data['String']; | ||
} else if ('Stats' in data) { | ||
if (comparator == 'mean') { | ||
return data['Stats']['mean']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to keep growing if we add more default aggregations, can leverage this typescript syntax to return a default if not found:
data.get('Stats').get('p99.9') ?? undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll update all the aggregations to use this.
rules: [ | ||
{ | ||
name: "User", | ||
single_run_rule: function* (opts): Generator<Finding, void, any> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking a bit more on this, all_run_rule
and single_run_rule
are not much different. If I'm looking at CPU utilization, I'd be just as interested knowing all have low utilization in addition to just the base run. We can eliminate the "single run" case and just apply the rules that look at all runs independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim is to logically separate 3 use cases. Consider 3 runs A, B and C. all_run_rule will allow us to compare the data across all runs. per_run_rule allows us to compare A B and A C. single_run_rule will allow us to write rules for each run. This will apply when comparison is not possible when a report is generated for a single run in addition to applying it for multiple runs. Ex: A config is KConfig is known to cause performance regressions. We want that flagged in both cases. The split is a logical separation helper for implementers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, each single_run_rule will need an if check to make sure the other comparison run exists. If we have 3 separate rule calls it allows basic assumptions.
src/html_files/utils.ts
Outdated
return data['String'] ?? undefined; | ||
} else if ('Stats' in data) { | ||
if (comparator == 'mean') { | ||
return data['Stats']['mean'] ?? undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can simply do return data['Stats'][comparator] ?? undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Got it. Changing.
@geoffreyblake Updated to include comments describing each type of rule. |
Thanks. LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Add base support for Aperf Analytics. The stats are formed and created in the Aperf Rust backend. These values are then acted on by the rules in the Javascript frontend. Basic implementation for the keys in 'SystemInfo' and 'CPU Utilization' show how the rules can operate using the API provided.
Attached is a report comparing two runs. The output of the rules are shown in the landing page 'SystemInfo' tab.
build_metrics.tar.gz
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.