-
Notifications
You must be signed in to change notification settings - Fork 10
Description
A common problem while running benchmarks is to assume what's the limiting factor the benchmark is stressing and the simpler the benchmark, the easier is to believe to know the unknown/ignore there are any.
To fix this, is highly suggested to practice https://www.brendangregg.com/activebenchmarking.html in the form of opt-in profiling/monitoring for the different components involved, whilst the tests are running
eg
- Enable lightweight monitoring, like sar/mpstat/time/pidstat or docker stats, against all components
- Opt-in to make sure that with/without such observability tools, there won't be any unexpected and unpredictable "observer" effect
As part of this, although with an higher and more biased resolution level, I would enable (still opt-in because more heavy, for the type of test) startup profiling too, which I can help setup, for both JVM and native image mode.
The latter is not trivial, but afaik, be we can ask our Mandrel team if the CPU profiler recently introduced for native-image is suitable for this job, and accurate enough, compared to Linux perf. If we enable native image profiling this is going (iirc) to impact building native too, because it requires additional compilation flags to produce debug symbols required to resolve the application frame stacks properly.
In short:
- let's start by proving lightweight monitoring (opt-in) via sar/mpstat/time/pidstat on components
- Optional, but highly recommended, enable system wide, to collect similar statistics: this is very tied to our current choice to run everything local, but is good to ensure there is not overall saturation in the system due to scarse resources or noisy neighbors presence
- Eventually, add finer grain profiling to JVM mode and native image (JVM already does it under load, but not at start-up, which is a whole different beast)
For 1, we can collect the output of monitoring and report it raw too; it doesn't need ATM to be part of the results, but still we need to make it easier to consume in order to detect unwanted/unexpected behaviors which could invalidate our results.
This is very key to make sure our results are valid, especially if run by others which environment is drastically different from our lab e.g. while they mine Bitcoin with their CPUs eheh
I will provide few pointers of existing qDup scrips from other OSS test we performed which could be (re)used here.