Skip to content

feat: [DSM-102] Wrap CanisterStates in Arcs#8839

Open
alin-at-dfinity wants to merge 2 commits intomasterfrom
alin/DSM-102-Arc-CanisterState
Open

feat: [DSM-102] Wrap CanisterStates in Arcs#8839
alin-at-dfinity wants to merge 2 commits intomasterfrom
alin/DSM-102-Arc-CanisterState

Conversation

@alin-at-dfinity
Copy link
Contributor

Have ReplicatedState::cnister_states hold Arc<CanisterState> values instead of CanisterState directly. This makes it (and ReplicatedState) much cheaper to clone. As well as making manipulation of canister_states contents (e.g. for routing and scheduling) much cheaper.

Callers must now call Arc::make_mut() on individual entries they want to mutate. And, in order to keep overhead in check, only do so when necessary (i.e. don't make_mut every canister every round).

Have `ReplicatedState::cnister_states` hold `Arc<CanisterState>` values instead of `CanisterState` directly. This makes it (and `ReplicatedState`) much cheaper to clone. As well as making manipulation of `canister_states` contents (e.g. for routing and scheduling) much cheaper.

Callers must now call `Arc::make_mut()` on individual entries they want to mutate. And, in order to keep overhead in check, only do so when necessary (i.e. don't make_mut every canister every round).
@alin-at-dfinity
Copy link
Contributor Author

Benchmark results (a bit shaky, as successive runs on my devenv produced results that were off by as much as 10%).

Current master:

round                   time:   [514.67 ms 522.53 ms 530.71 ms]

"execution_round_preparation_duration_seconds": Some(HistogramStats { count: 132, sum: 4.119863824 }),
"execution_round_consensus_queue_duration_seconds": Some(HistogramStats { count: 132, sum: 0.00035218599999999994 }),
"execution_round_advance_long_install_code_duration_seconds": Some(HistogramStats { count: 132, sum: 0.00017579000000000002 }),
"execution_round_scheduling_duration_seconds": Some(HistogramStats { count: 132, sum: 1.462299983 }),
"execution_round_inner_duration_seconds": Some(HistogramStats { count: 132, sum: 52.37131959399998 }),
"execution_round_subnet_queue_duration_seconds": Some(HistogramStats { count: 147, sum: 16.871549279000014 }),
"execution_round_inner_heartbeat_overhead_duration_seconds": Some(HistogramStats { count: 264, sum: 0.374448894 }),
"execution_round_inner_preparation_duration_seconds": Some(HistogramStats { count: 147, sum: 5.124614065999999 }),
"execution_round_inner_execution_duration_seconds": Some(HistogramStats { count: 147, sum: 29.995616233000007 }),
"execution_round_inner_finalization_duration_seconds": Some(HistogramStats { count: 147, sum: 0.3685645859999999 }),
"execution_round_finalization_duration_seconds": Some(HistogramStats { count: 132, sum: 6.350381167000002 }),
"execution_round_finalization_stop_canisters_duration_seconds": Some(HistogramStats { count: 132, sum: 0.33941595300000005 }),
"execution_round_finalization_ingress_history_prune_duration_seconds": Some(HistogramStats { count: 132, sum: 0.0019545249999999995 }),
"execution_round_finalization_charge_resources_duration_seconds": Some(HistogramStats { count: 132, sum: 0.19223597800000003 }),
"execution_round_inner_preparation_step_duration_seconds": {},
"state_manager_checkpoint_op_duration_seconds": {{"op": "compute_manifest"}: HistogramStats { count: 0, sum: 0.0 }, {"op": "copy_state"}: HistogramStats { count: 133, sum: 12.989015154999995 }, {"op": "create"}: HistogramStats { count: 0, sum: 0.0 }, {"op": "hash_tree"}: HistogramStats { count: 134, sum: 5.424613341999999 }},
"mr_process_batch_phase_duration_seconds": {{"phase": "commit"}: HistogramStats { count: 132, sum: 18.454822891000003 }, {"phase": "execution"}: HistogramStats { count: 132, sum: 64.30791173699998 }, {"phase": "induction"}: HistogramStats { count: 132, sum: 0.6500734589999998 }, {"phase": "load_state"}: HistogramStats { count: 132, sum: 0.0008985979999999998 }, {"phase": "message_routing"}: HistogramStats { count: 132, sum: 0.6035860849999997 }, {"phase": "shed_messages"}: HistogramStats { count: 132, sum: 0.27849730899999997 }, {"phase": "time_out_callbacks"}: HistogramStats { count: 132, sum: 0.3212080410000001 }, {"phase": "time_out_messages"}: HistogramStats { count: 132, sum: 0.322334562 }}

flamegraph-split02131618

This branch:

round                   time:   [397.96 ms 408.03 ms 418.54 ms]

"execution_round_preparation_duration_seconds": Some(HistogramStats { count: 140, sum: 6.719237387999999 }),
"execution_round_consensus_queue_duration_seconds": Some(HistogramStats { count: 140, sum: 0.0004725249999999999 }),
"execution_round_advance_long_install_code_duration_seconds": Some(HistogramStats { count: 140, sum: 0.00023484099999999996 }),
"execution_round_scheduling_duration_seconds": Some(HistogramStats { count: 140, sum: 2.224596809 }),
"execution_round_inner_duration_seconds": Some(HistogramStats { count: 140, sum: 55.90155255600003 }),
"execution_round_subnet_queue_duration_seconds": Some(HistogramStats { count: 155, sum: 17.786476231999988 }),
"execution_round_inner_heartbeat_overhead_duration_seconds": Some(HistogramStats { count: 280, sum: 0.5375481759999998 }),
"execution_round_inner_preparation_duration_seconds": Some(HistogramStats { count: 155, sum: 6.753478399000002 }),
"execution_round_inner_execution_duration_seconds": Some(HistogramStats { count: 155, sum: 30.901487884000005 }),
"execution_round_inner_finalization_duration_seconds": Some(HistogramStats { count: 155, sum: 0.44893904400000006 }),
"execution_round_finalization_duration_seconds": Some(HistogramStats { count: 140, sum: 8.162084068000002 }),
"execution_round_finalization_stop_canisters_duration_seconds": Some(HistogramStats { count: 140, sum: 0.23767537600000005 }),
"execution_round_finalization_ingress_history_prune_duration_seconds": Some(HistogramStats { count: 140, sum: 0.0023041079999999996 }),
"execution_round_finalization_charge_resources_duration_seconds": Some(HistogramStats { count: 140, sum: 0.22467061000000008 }),
"execution_round_inner_preparation_step_duration_seconds": {},
"state_manager_checkpoint_op_duration_seconds": {{"op": "compute_manifest"}: HistogramStats { count: 0, sum: 0.0 }, {"op": "copy_state"}: HistogramStats { count: 141, sum: 0.9683901430000001 }, {"op": "create"}: HistogramStats { count: 0, sum: 0.0 }, {"op": "hash_tree"}: HistogramStats { count: 142, sum: 6.361884002999999 }},
"mr_process_batch_phase_duration_seconds": {{"phase": "commit"}: HistogramStats { count: 140, sum: 7.3691273279999985 }, {"phase": "execution"}: HistogramStats { count: 140, sum: 73.01186541500003 }, {"phase": "induction"}: HistogramStats { count: 140, sum: 0.8010207990000004 }, {"phase": "load_state"}: HistogramStats { count: 140, sum: 0.0007118310000000001 }, {"phase": "message_routing"}: HistogramStats { count: 140, sum: 0.7301934729999998 }, {"phase": "shed_messages"}: HistogramStats { count: 140, sum: 0.3757221769999999 }, {"phase": "time_out_callbacks"}: HistogramStats { count: 140, sum: 0.48366583199999996 }, {"phase": "time_out_messages"}: HistogramStats { count: 140, sum: 0.5086816049999999 }}

flamegraph-Arc02131704

This shows an improvement of about 20%, which is significantly more than I remembered.

Note that the metrics cover the full benchmark binary execution, so including canister creation (hence the significant inner round, i.e. execution, times) and warm-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant