Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aperf: Add support for analytics #241

Merged
merged 3 commits into from
Jan 27, 2025
Merged

Aperf: Add support for analytics #241

merged 3 commits into from
Jan 27, 2025

Conversation

janaknat
Copy link
Contributor

@janaknat janaknat commented Dec 9, 2024

Add base support for Aperf Analytics. The stats are formed and created in the Aperf Rust backend. These values are then acted on by the rules in the Javascript frontend. Basic implementation for the keys in 'SystemInfo' and 'CPU Utilization' show how the rules can operate using the API provided.

Attached is a report comparing two runs. The output of the rules are shown in the landing page 'SystemInfo' tab.

build_metrics.tar.gz

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@janaknat janaknat requested a review from a team as a code owner December 9, 2024 19:09
src/lib.rs Outdated
@@ -180,6 +187,7 @@ impl PerformanceData {
let meta_data_handle = fs::OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's an error when any output directory already exists, in what scenario is this code encountering an existing file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed. It was a suggestion from a different version of clippy.

src/utils.rs Outdated
Comment on lines 16 to 17
pub p50: f64,
pub mean: f64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean and p50 are virtually interchangeable, I'd dump p50 and add p90.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Changing it to p90.


class Finding {
text: string;
status: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type 'string' instead of 'Status'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Good catch. It's simpler to have the Status everywhere.

Comment on lines 56 to 68
function is_unique_map(values_map) {
return new Set([...values_map.values()]).size == 1;
}

function is_unique_array(values_array) {
return new Set(values_array).size == 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are odd definitions of 'unique'. It suggests that somewhere else you're using the wrong data type if you have to call into something to de-duplicate and check the size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We're using arrays everywhere now. We were using arrays and maps. This is cleaner.

Comment on lines 7 to 29
rules: [
{
name: "User",
func: function (ruleOpts: RuleOpts) {
let system_util = get_data_key(ruleOpts.data_type, "System");
let findings = [];
let init_key = ruleOpts.runs[0];
let init_total_util: number = ruleOpts.per_run_data.get(init_key) + system_util.get(init_key);
for (const [key, value] of ruleOpts.per_run_data) {
if (key == init_key) {
continue;
}
let run_total_util: number = value + system_util.get(key);
let cpu_diff = Math.ceil(Math.abs(run_total_util - init_total_util));
findings.push(new Finding(
`Average CPU Utilization difference between ${init_key} and ${key} is ${cpu_diff}%.`,
cpu_diff > 10 ? Status.NotGood : Status.Good,
));
}
return findings;
},
good: "",
bad: "",
},
{
name: "idle",
func: function (ruleOpts: RuleOpts) {
let findings = [];
let init_key = ruleOpts.runs[0];
for (const [key, value] of ruleOpts.per_run_data) {
if (key == init_key) {
continue;
}
let idle_diff = Math.abs(ruleOpts.per_run_data.get(key) - ruleOpts.per_run_data.get(init_key));
if (idle_diff > 10) {
findings.push(new Finding(
`Difference in Average 'Idle time' between ${key} and ${init_key} is ${idle_diff}.`,
));
}
}
return findings;
},
good: "",
bad: "",
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rules don't start turning into copy-and-paste-fests of eachother, start refactoring this now so that things as simple as comparing aggregates across runs is not duplicated in multiple rules's 'func's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm splitting it up into 3 types of rules.

  1. Runs == 1, Rules to point out things. Rules are implemented in single_run_rules[].
  2. Runs > 1,
    a. Set base run = Runs[0]. Iterate over the other runs and pass in base_run and other run to a function. Prevents needing repeat for loops for all rules. Rules are implemented in per_run_rules[].
    b. Provide all details to a different set of functions which needs all the details. Rules are implemented in all_run_rules[].

Comment on lines 17 to 25
func: function (ruleOpts: RuleOpts) {
return is_unique_map(ruleOpts.per_run_data);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cpu utilization rules are returning arrays of Findings, and this is returning boolean. Lock this function signature down or it will be a nightmare to maintain having to support a wide variety of return types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'm changing it so that only Finding should be returned.

This is in preparation for adding rules for SystemInfo and CPU
Utilization.
Form the metrics which will be used by the front-end for analytics work.
Generate it for SystemInfo and CPU Utilization.
@janaknat
Copy link
Contributor Author

Attached a newer version of the report. The SUT Config and Findings are now clubbed together in a single tab.

cmp.tar.gz

pub mean: f64,
}

impl Stats {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will only saving P99, P90 and mean be flexible enough for people writing rules in Javascript? Could this be a histogram instead that is lighter weight to traverse than the full data-set but you can reconstitute any stat you wanted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which stat do you anticipate we would need to have? We can calculate any stat in the backend and have it brought to the UI. The other approach is to have all the data points brought to the front-end and let the user decide which works for them. That will be a rework of the way we are doing things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipate this could be any stat: trimmed mean, different percentiles etc. The frontend already has all the data present correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now all aggregations are computed at report generation time. The frontend has the data (required in order to graph the data), but it is not computing these aggregates itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend mirroring p99 and p90 with p01 and p10. I expect there will be some counters you'd like to see when they fall off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This structure can be updated as needed with any stats we need in the future.

Comment on lines +25 to +39
enum Status {
Good = '✅',
NotGood = '❌',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that soon you are going to discover there is frequently a grey area between good and evil NotGood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We can add an 'Ehhh' when we need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an item for "Mediocre" sooner rather than later. I don't want us to be handling tickets with NotGood cpu at 49%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we add a rule for 'Mediocre' we can add that item. I don't want to add code which isn't being explicitly used.

pub mean: f64,
}

impl Stats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend mirroring p99 and p90 with p01 and p10. I expect there will be some counters you'd like to see when they fall off.

Comment on lines 4 to 7
let cpu_utilization_rules = {
data_type: "cpu_utilization",
pretty_name: "CPU Utilization",
single_run_rules: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long before these are config files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the scenario you see where users can provide their own rules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm just thinking about ease of adding/subtracting rules. For these simple ones it's basically just "metric" and "threshold" and maybe "reduction function". Seems kind of silly to make a new function for each of them. If we had a more reliable way to look up the data it might be easier to break these into some sort of config structure.

On the other hand more complex rules probably deserve their own function.

Config file probably isn't right (especially as I'm realizing this is typescript not rust), but maintaining many different very similar functions is going to become painful.

},
}
],
per_run_rules: [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still quite a bit of copy-paste here, could make more sense to have 1 rule, but multiple functions to implement the logic for each way to compare the run(s) to each other. Then the rule running engine just iterates over the functions in each rule to generate the report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to have a per_run, all_run and a single_run attached to one rule.

return data['String'];
} else if ('Stats' in data) {
if (comparator == 'mean') {
return data['Stats']['mean'];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to keep growing if we add more default aggregations, can leverage this typescript syntax to return a default if not found:
data.get('Stats').get('p99.9') ?? undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll update all the aggregations to use this.

rules: [
{
name: "User",
single_run_rule: function* (opts): Generator<Finding, void, any> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking a bit more on this, all_run_rule and single_run_rule are not much different. If I'm looking at CPU utilization, I'd be just as interested knowing all have low utilization in addition to just the base run. We can eliminate the "single run" case and just apply the rules that look at all runs independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim is to logically separate 3 use cases. Consider 3 runs A, B and C. all_run_rule will allow us to compare the data across all runs. per_run_rule allows us to compare A B and A C. single_run_rule will allow us to write rules for each run. This will apply when comparison is not possible when a report is generated for a single run in addition to applying it for multiple runs. Ex: A config is KConfig is known to cause performance regressions. We want that flagged in both cases. The split is a logical separation helper for implementers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, each single_run_rule will need an if check to make sure the other comparison run exists. If we have 3 separate rule calls it allows basic assumptions.

return data['String'] ?? undefined;
} else if ('Stats' in data) {
if (comparator == 'mean') {
return data['Stats']['mean'] ?? undefined;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simply do return data['Stats'][comparator] ?? undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Got it. Changing.

@janaknat
Copy link
Contributor Author

@geoffreyblake Updated to include comments describing each type of rule.

@geoffreyblake
Copy link

Thanks. LGTM.

Copy link

@geoffreyblake geoffreyblake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@janaknat janaknat merged commit 202c756 into aws:main Jan 27, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants