Replies: 2 comments 2 replies
-
Interesting idea :) My current understanding is that the topology of the graph is already factored in the probability distribution of paths length: assuming some number of nodes, some average degree, one (not me but I know it's doable) can compute or estimate this distribution. It's a simplification but it seems to me it reasonably well describes what we care about at least in the Praos (and Peras) cases: given some node produces a block, what's the 95% percentile of the delay distribution for this block across the given distribution of possible paths. Interestingly, we have been able to compare the distribution that DeltaQ gives us, with the actual 1-hop distribution observed across a significant of the network (that's in the first tech report for Peras) and it seems pretty accurate. Perhaps this is an artefact, and more analysis might be required, but to me this is a solid hint that, within the error margin of a coarse grained simulation a simple Delta-Q expression can model real world behaviour accurately. I therefore don't understand why you would want to go to that level of details with DeltaQ in the case of Leios. We should first try to model Leios using the existing combinators, possibly factoring in congestion by including in each hop delay a random variable modelling delay distribution according to some queueing model's congestion. |
Beta Was this translation helpful? Give feedback.
-
You’re right about the elegance and usefulness of the ΔQ approach when it comes to timeliness, and what I propose above isn’t needed to improve that. My impetus is to tackle the load analysis. As far as I can see the beauty of ΔQ lies in just abstracting away exactly those details that would be needed for such an analysis — notice how the amplification of messages at each hop is absent in the expression, hence it isn’t possible to judge the bandwidth usage of the sending node. |
Beta Was this translation helpful? Give feedback.
-
So far, ΔQ expressions have been used to model the timeliness of block diffusion in Cardano using an approach like the following:
The probabilities for needing one hop, two hops, etc. have been taken from measurements or simulations of a peer-to-peer network shaped like the Cardano network.
This approach is not satisfactory for any analysis that aims to take network load or buffers into account, because it has no access to the branching nature of the messages traveling through the network as well as the crucial deduplication functionality: when information reaches a node where it is already known, it won’t be replicated further. With the separation into headers and block data this becomes even more of an inaccuracy, as a node will only fetch the heavy block data from one peer.
Notation: CDF[(0.1, 0.3), (0.5, 0.6), (0.8, 1)] describes a cumulative distribution function starting at (0,0) and having steps at 0.1, 0.5, and 0.8, rising to the value 0.3, 0.5, and 1, respectively.
It seems to me that we need to capture more information about the network in order to solve this problem. The first step is to recognise that diffusion is hierarchical: peers in the same location will very quickly get the data, much quicker than diffusion takes across locations (cf. the latency of 12ms vs. 69ms or 268ms cited in the technical report). The second step is to quantify the graph in terms of size S and degree D. Let L1{small,large} be the short local latency and L2{small,large} be the latency between locations — the system can be extended to more tiers later. Let R=D-1 be the replication factor during diffusion steps.
Ignoring co-located (L1) peers for now and assuming a random graph between locations, the process may be approximated like so:
This process continues as long as new nodes can be reached. Since theoretically lots of duplicates may happen, the probability distribution won’t reach certainty about reaching all S nodes soon, but we can apply a suitable cutoff for pruning the distribution functions that appear in the process, leading to steady progress of the lower bound on how many nodes should have been reached at a given step. This will end the recursion without having to specify other limits than the graph degree and size.
@dcoutts Is this roughly the kind of fix point you had in mind?
Now, coming back to the hierarchy, the above process may be added as a ΔQ operator, something like
where
light
is the CDF for doing the part that allows deduplication (i.e. header transfer) andheavy
is the CDF for performing the rest of the transfer (i.e. the block body). The hierarchical approach then allows us to decompose the diffusion process into the far-away part followed by the co-located part — the latter starts at each location as it is reached and proceeds otherwise in parallel, thanks to how convolution works:I don’t have sufficient information on the actual network, but perhaps S_far=200 and S_near=12. From the design discussion it seems that the choice of “half random, half closeby” for peer selection results in D_far and D_near being roughly the same (because random is overwhelmingly far-away unless there are some really large locations). It should be obvious that the second Gossip step involves a much shorter timeframe than the first, so even if we pessimistically made S_near larger the result wouldn’t be dramatically different.
If any of this makes any sense, the hard part is finding the right formulae for the probability distributions, I’m a bit rusty with that right now.
Beta Was this translation helpful? Give feedback.
All reactions