performance observations while interacting with a 23k-tip tree #1878

jameshadfield · 2024-10-26T01:59:56Z

A summary of observations while rendering & interacting with a huge tree. Ideally we'd like to have Auspice be performant on such big trees, but also this is a simple way to flag up slow parts of the code. The timings are just to indicate bottlenecks - while they are reasonably consistent for me they'll be different for you. (This issue is mainly for me to keep track of things in a public-facing way. If you'd like to work on anything please spin it out into a new issue.)

Setup:

Dataset: groups/blab-private/ncov/north-america/usa/washington/king-county/large?c=ns_kc&d=tree&p=full (i.e. only tree panel rendering). 23k tips. The dataset is only 45MB, so lots of nodes but (relatively) few mutations (45MB minimised but uncompressed).

Auspice 2.59.1 at master branch (35e4ca6) running in production mode with the following changes:

webpack.config.js
-    devtool: !devMode ? undefined : "eval-source-map",
+    devtool: "eval-source-map",
-      minimize: !devMode,
+      minimize: false,

Chrome 130.0 running in incognito mode on an Apple M3 Pro chip. GC triggered before each snapshot was taken.

Test 1:

Filter to f_region=South%20America (n=259 tips selected). Start a snapshot then inactivate the filter, immediately hover over a tip so we can see when that listenerr will fire/render. This shares a lot of similarities to dragging the date range / animating but is much easier to reproduce consistently & isolate. Trace-20241026T131320.json

Heap Size: 230MB to 330Mb.
Timings:

666ms JS "Event: click". This is our code & d3 code.
240ms JS "Animation frame fired' (triggered from (1)). This is mostly d3 code.
115ms recalculate style (triggered from (2))
40ms (pre-paint, paint, layerize) (no visual changes - looks identical)
1170ms Commit
- During this time, partially-presented frames are rendered at 60fps
- About ~100ms into the Commit cycle we get the tips all shown but not all the branches are correct. This implies step (2) is a partial d3 update, perhaps due to transitions.
- All the other partially-presented frames look identical
- (Aside: I don't fully understand this stage in Chrome. The docs imply it's simply copying the property tree & display list to the compositor thread and blocking main while that happens, but during the commit phase there's activity on the rasterization thread / GPU and the screen is redrawn, implying that the compositor is doing its thing.)
<1ms Event: mouseover
3ms Recalculate style, layout, pre-paint
- NOTE: no paint here - does chrome bail out of the pipeline? Maybe there's no change to the property tree / display list?
Mouseover / onTipHover (1ms) + recalculate style, layout, pre-paint.

Note this is before the queued animation frame (from the d3 call in (1))

75ms JS "Animation frame fired" (triggered from (1))
80ms recalculate style (triggered from (6))
45ms (pre-paint, paint, layerize, commit)

This (I think) is the stage which results in the first screenshot where the tree is "correct".
We also have the tip hover showing here in the same frame as the tree becomes correctly rendered.

TLDR: Total time to partial tree update painted: ~1250ms. Total time to tree fully painted & responsive: 2500ms.

Don't try to animate things UPDATE: done in #1880

Improvements: huge; should be an easy win if we can make a heuristic about when to animate vs not. Changing genericSelectAndModify to no longer have.transition().duration(transitionTime) provides a huge improvement.

In the timings above, (1) is reduced by 100ms. Most of this saving is a reduction of time spent in genericSelectAndModify.
We completely get rid of steps (2,8,9,10)
The paint during (5: commit) is the correct one - i.e the tree is correctly rendered. Note the commit still takes ~1200ms with the paint coming very early on in this phase. My understanding of what the commit phase actually does is limited.

Note: Achieving the same thing by using a transitionTime=0 and thus calling timerFlush() is worse. Step 1 takes ~900ms with timerFlush taking ~300ms.

Recalculating entropy, despite the entropy panel not being shown: UPDATE: done in #1879

Improvements: large. calcEntropyInView takes 295ms. The entropy panel's not even rendered!

We should add a concept of "thing X doesn't need to be recalculated" - e.g. when the entropy panel's off the screen or not even on. The data can then be recalculated when needed.

Even when the entropy panel is on the screen, it can have delayed updating. Seems perfect for web-workers - e.g. show a grey overlay on the panel itself to indicate data out-of-date, recalculate things in the worker, then trigger the redux update. (Aside: we don't have any web workers in Auspice, but there are many situations they would be useful. This blog post seems a good resource on how workers may access the tree structure.)

Somewhat related to delayed updating, the thunk which calls this - updateEntropyVisibility - is debounced. Simply changing this to not fire on the leading edge (leading: false) shifts this to a separate task on the event loop. That won't necessarily improve responsiveness, but it'll speed up the tree being re-drawn.

Unnecessary recalculation of filtering (sidebar) options UPDATE: issue #1882

Improvements: medium, but easy wins. Total time: 41ms. The sidebar filtering component <FilterData> updates on any filter change (not date range). Each time it calls its makeOptions method which recomputes all options (to show in the dropdown), even though most of them are unchanged. Calling collectGenotypeStates takes 30ms, but the result of this doesn't ever change so is completely unnecessary.

Note: the options are calculated every render time, but we're debouncing the callback which hands them to <AsyncSelect>. Can we just defer the calculation of options to this callback?

Expensive tooltips

The unmounting of the react tooltip (shown when hovering on the filter badge) takes 40ms and then it immediately mounts again (40ms). These aren't deferred / put onto the event loop for some reason (I don't quite get this). Probably hard to improve but seems egregious.

Test 2:

No filters. Hover over a branch to bring up the info-box, then mouse out to remove it (very slow re-rendering). Now do the same action over tips and it's extremely responsive.

Comparing timings, the big difference is a ~1000ms commit phase for the branch hover, and a <1ms commit for tip hover. (Similar to test 1, the branch hover is drawn to screen during that long commit, but the main thread remains blocked.) The tip-hover is 60fps on my machine.

With tip hover, we select the tip element (within the SVG) and change its radius as well as creating/destroying a react component (a <div>). The branch hover similarly selects the branch and changes it's stroke colour, creates/destroys the info-box <div>, and additionally adds a confidence-interval <path> to the SVG. Removing this CI code reduces commit phase time to match that of tip-hover, i.e. 60fps.

This tells us something about property trees / display lists - selecting and modifying a single (SVG?) element is very very different from creating / removing one.

There are a few options to fix this:

We could try having all CIs in the DOM and just changing their opacity / hidden state? I presume this would be a faster commit in this case, but slower performance in general.
Don't highlight CIs on branch-hover for trees > n tips.
Or perhaps trigger the CI display when a branch has been hovered for n seconds. (Probably my preferred option.)
We could try using will-change to put the CIs on a different rasterization layer, but I think this is too fine-grained usage of will-change and I'm not sure you could use it within a SVG.

Test 3: initial load

Load http://localhost:4000/groups/blab-private/ncov/north-america/usa/washington/king-county/large?c=ns_kc&d=tree&p=full. Running on localhost without any network slowness etc we can see the following timings (which seem to vary quite a bit more than the above tests):

~250ms of webpack code compilation & loading. During this time the spinner appears.
~500ms GC
~1900ms main loading phase.
- 100ms treeJsonToState
- 55ms modifyControlsStateViaTree
- 55ms countTraitsAcrossTree
- 330ms calcEntropyInView (see Test 1, above)
- 950ms renderTree (over half of this in drawBranches)
300ms recalculate style
1600ms commit phase
- As above, the tree is drawn to screen during this phase
~100ms the dataset part of the sidebar flashes in

Deferring calcEntropyInView is an obvious win here (see above). In the wild, the JSON fetch time will probably dominate this phase so I wouldn't prioritise this above the other observations in this issue. (One nice change would be to investigate embedding the spinner image in the JS - I often notice it's not loaded for a long time when the network is slower.)

The text was updated successfully, but these errors were encountered:

For big trees, attempting to animate changes results in poor performance and we're better off making all SVG changes in a single pass. "big" here depends on both the browser's performance and the size of the tree. Here we just use the latter as it's simpler to code, but in the future we may measure performance as the user interacts with the viz and update these flags accordingly. (This is one of the reasons we calculate these in middleware.) For future, we should add nuance to this performance flag. For instance, viewing a 2k tree and zooming into a clade has poor performance and we'd be better off skipping the animation, however once in a small clade it works well. For that same tree (viewing all 2k tips) filtering to a clade still looks good with the animation. 4000 tips is arbitrary, it was chosen after testing a variety of datasets locally. Note that there is a big difference between not setting a transition (within d3) and setting a transition of 0ms - the former is much faster. See <#1878> for more details here. I didn't implement this change for `modifySVGInStages` as it uses a more complex scheduling system; but the performance here is terrible for really large trees so we should revisit this in future work. Performance changes ------------------- Auspice built in production mode on Chrome 130.0 on a Apple M3 pro chip. Datasets: a large ncov tree (23k tips) (see <#1878> for more), and a flu tree of 2k tips. Both only rendering the tree panel. Flu performance is unchanged, as 2k tips isn't enough to trigger this performance toggle. For posterity: * Zooming into clade 2a. Initial frame ~160ms, then 30fps. * Filtering to clade 2a. Initial frame ~115ms, then 60fps. * Animation: 60fps ncov performance: * Zoom into clade 21M. * Previously: Initial frame 1500ms, second (final frame) 430ms later, main blocked for a further 700ms * Now: Single frame of 921ms. Main blocked for a further 550ms. * Change: Time to correct tree reduced by ~1000ms (50%). Time until main is no longer blocked: reduced by ~1100ms (45%). * Filter to clade 21M. * Previously: Initial frame 1050ms, second frame 930ms later, main blocked for a further 660ms * Now: Single frame of 1018ms, main blocked for a further 660ms * Change: Time to correct tree reduced by ~950ms (50%). Time until main is no longer blocked: reduced by ~950ms (35%). * Animate. This is unchanged - Auspice sets the transition time to 0 (see end of this commit message). We get around 4 frames at 70ms each (14fps) then a long commit phase blocking for ~650ms. For unknown reasons we trigger a timer flush but it doesn't show up in the performance snapshot, which is why these times are unchanged by this commit. Note that the timings for the initial frame are slightly variable in chrome. The JS time / blinkng pipeline parts shown on the main thread seem pretty consistent, but when the frame is drawn (relative to when the commit phase starts) is more variable. So take the above results with a grain of salt. P.S. there is a somewhat parallel implementation of this concept in Auspice where PhyloTree sets a transition time of zero if subsequent updates come in quick enough, but that is buggy. Firstly, if an update takes a long time the main thread will be blocked so the next update will be indistinguishable from one which was triggered an acceptable time later. Secondly, it only works for animation where we have a stream of updates.

jameshadfield added the performance improvement label Oct 26, 2024

jameshadfield mentioned this issue Oct 28, 2024

Improve entropy recalculations #1879

Open

This was referenced Oct 28, 2024

Improve tree performance for trees with over 4000 tips #1880

Open

Improve recalculation of filtering (sidebar) options #1882

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance observations while interacting with a 23k-tip tree #1878

performance observations while interacting with a 23k-tip tree #1878

jameshadfield commented Oct 26, 2024 •

edited

Loading

performance observations while interacting with a 23k-tip tree #1878

performance observations while interacting with a 23k-tip tree #1878

Comments

jameshadfield commented Oct 26, 2024 • edited Loading

Setup:

Test 1:

Don't try to animate things UPDATE: done in #1880

Recalculating entropy, despite the entropy panel not being shown: UPDATE: done in #1879

Unnecessary recalculation of filtering (sidebar) options UPDATE: issue #1882

Expensive tooltips

Test 2:

Test 3: initial load

jameshadfield commented Oct 26, 2024 •

edited

Loading