Skip to content

Commit d838a5e

Browse files
committed
more ΔQ experiments and Logbook update
1 parent 484f228 commit d838a5e

File tree

3 files changed

+65
-38
lines changed

3 files changed

+65
-38
lines changed

Logbook.md

+53-37
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,29 @@
22

33
## 2024-12-06
44

5+
### ΔQ
6+
7+
- created a ΔQ model (`comparison_rs.txt`) of transaction diffusion in the Rust simulation:
8+
- propagation among the five clusters followed by propagation within the 40 nodes of each cluster
9+
- only fixed message delays of 12ms, 69ms, 268ms, independent of message size
10+
- general structure of completion matches the timings, but the completion rate is overall quite different
11+
- spreading to neighbor clusters (3×68ms) followed by another such hop should hit all clusters, but that also doesn’t happen in the simulation, it waits until 3×268ms before it can break through 74% completion
12+
- **conclusion:** I don’t really understand what the simulation is doing, even though the Rust code looks obvious enough, and obviously correct on the node level; will dive into the machine room later
13+
- created a ΔQ model (`comparison_hs.txt`) of Praos block diffusion in the Haskell simulation:
14+
- hs simulates TCP window collapse, which adds a very latency-dependent additional delay to block transfer times — I wasn’t able to adequately model that, plausible ΔQ expressions lead to too slow completion
15+
- when TCP window collapse is hacked out (thanks Andrea!) I get close matching of the result with a ΔQ expression, however, that expression does not match the stated simulation behaviour
16+
- in particular: it matches only when assuming that blocks are _not validated_ during relaying, only afterwards before adoption
17+
- one suspicious detail: according to my (hopefully not buggy!) measurement, the network topology for the hs simulation has a clustering coefficient of exactly zero — I was unable to find a single triangle
18+
519
### Haskell simulation
620

721
First Leios visualisations implemented (on `andrea/leios-p2p` branch atm):
22+
823
- short-leios-1: 2 nodes, showing every mini-protocol message.
924
- short-leios-p2p-1: 100 nodes, showing transfers of RB,IB,EB,Votes and some statistics.
1025

1126
Next steps:
27+
1228
- Improve readability of short-leios-p2p-1 to differentiate pipelines
1329
and kinds of blocks.
1430
- Verify parameters are set to sensible values (in particular wrt
@@ -30,33 +46,33 @@ This all is a work and progress and values may change significantly in the futur
3046

3147
- Sortition: 50 ms
3248
- Votes
33-
- Number: 500
34-
- Size: 500 B
35-
- Construction: 0.65 ms
36-
- Verification: 0.15 ms
49+
- Number: 500
50+
- Size: 500 B
51+
- Construction: 0.65 ms
52+
- Verification: 0.15 ms
3753
- ALBA certificate
38-
- Size: 75 kB
39-
- Construction (aggregation plus proof): 200 ms
40-
- Verification: 0.15 ms
54+
- Size: 75 kB
55+
- Construction (aggregation plus proof): 200 ms
56+
- Verification: 0.15 ms
4157

4258
### Draft of several sections of the first tech report
4359

4460
We now have a full draft of several sections of the technical report.
4561

4662
- Cost analysis
47-
- Simulation of transaction volume on Cardano
48-
- Estimation of costs for a Leios SPO
49-
- Cost of storage
50-
- Break-even cost for perpetual storage of blocks
51-
- Compressed storage of Praos blocks
63+
- Simulation of transaction volume on Cardano
64+
- Estimation of costs for a Leios SPO
65+
- Cost of storage
66+
- Break-even cost for perpetual storage of blocks
67+
- Compressed storage of Praos blocks
5268
- Rewards received
53-
- Importance of Cardano Reserves
69+
- Importance of Cardano Reserves
5470
- Insights for Leios techno-economics
5571
- Approximate models of Cardano mainnet characteristics
56-
- Transaction sizes and frequencies
57-
- Stake distribution
72+
- Transaction sizes and frequencies
73+
- Stake distribution
5874

59-
Work is in progress on voting and certificates in https://github.com/input-output-hk/ouroboros-leios/pull/94. The following subsections have been fully drafted:
75+
Work is in progress on voting and certificates in <https://github.com/input-output-hk/ouroboros-leios/pull/94>. The following subsections have been fully drafted:
6076

6177
- Voting and certificates
6278
- Structure of votes
@@ -334,7 +350,7 @@ Findings:
334350

335351
### Techno-economic analysis of SPO nodes
336352

337-
The *Refined Estimate* tab of the [Leios High-Level Resources Estimates spreadsheet](analysis/Leios%20resource%20estimates%20-%20ROUGH%20ESTIMATE.ods) computes node costs for SPOs under Praos and Leios.
353+
The _Refined Estimate_ tab of the [Leios High-Level Resources Estimates spreadsheet](analysis/Leios%20resource%20estimates%20-%20ROUGH%20ESTIMATE.ods) computes node costs for SPOs under Praos and Leios.
338354

339355
- Each SPO has one block producer and two relays.
340356
- CPU, IOPS, disk, and network costs are estimated.
@@ -721,7 +737,7 @@ Agenda:
721737
- IB of shard i should not have tx consuming token of shard j
722738
- fees of IB i are paid with token shard i
723739
- ensure IB from different shards will never consume token from other shards
724-
- *important* : fees are always paid, even if tx is not included in the ledger
740+
- _important_ : fees are always paid, even if tx is not included in the ledger
725741
- Q: what about multiple tokens per UTxO?
726742
- grinding with people trying to overload one shard?
727743
- \# shards w.r.t IB rate => decrease probability of concurrent IBs for the same shard
@@ -781,11 +797,11 @@ The diagram above illustrates a techno-economic business case for Leios adoption
781797

782798
We could consider the following goals for January 2025.
783799

784-
- *Technical goal for PI8:* Estimate a reasonably tight upper bound on the cost of operating a Leios node, as a function of transaction throughput, and estimate the maximum practical throughput.
800+
- _Technical goal for PI8:_ Estimate a reasonably tight upper bound on the cost of operating a Leios node, as a function of transaction throughput, and estimate the maximum practical throughput.
785801
- Target level: SRL2
786-
- *Business goal for PI8:* Identify (a) the acceptable limit of transaction cost for Cardano stakeholders, (b) the maximum throughput required by stakeholders, and (c) the throughput-cost relationship for other major blockchains.
802+
- _Business goal for PI8:_ Identify (a) the acceptable limit of transaction cost for Cardano stakeholders, (b) the maximum throughput required by stakeholders, and (c) the throughput-cost relationship for other major blockchains.
787803
- Target level: IRL3
788-
- *Termination criteria for Leios:* Transaction costs are unacceptably high for Leios or the practical maximum throughput fails to meet stakeholder expectations. In this case the Leios protocol may need reconceptualization and redesign, or it may need to be abandoned.
804+
- _Termination criteria for Leios:_ Transaction costs are unacceptably high for Leios or the practical maximum throughput fails to meet stakeholder expectations. In this case the Leios protocol may need reconceptualization and redesign, or it may need to be abandoned.
789805

790806
### Haskell Simulation
791807

@@ -973,7 +989,7 @@ Main question is what to test (first)? And how to test? Network diffusion seems
973989
- The node can fetch new headers and blocks
974990
- The node can diffuse new headers and blocks
975991
- It must node propagate equivocated blocks more than once
976-
- But it must propagate them at least once to ensure a *proof-of-equivocation* is available to all honest nodes in the network
992+
- But it must propagate them at least once to ensure a _proof-of-equivocation_ is available to all honest nodes in the network
977993

978994
How does coverage comes into play here?
979995

@@ -1008,7 +1024,7 @@ Discussing some possible short-term objectives:
10081024
- start with Adversarial scenarios, answering the question on where to define the behaviour: in the spec or in the tester?
10091025
- simulatios/prototypes will need to have some ways to interact w/ tester => interfaces can be refined later
10101026
- Define a taxonomy of "adversarialness"
1011-
- strong adversary that's *misbehaving*
1027+
- strong adversary that's _misbehaving_
10121028
- adversarial "natural" conditions, eg. outages/split brains
10131029
- transaction level adversary
10141030
- we need to qualify those different scenarios
@@ -1079,16 +1095,16 @@ We can run the conformance tests in the ledger spec :tada:
10791095
#### What approach for Leios?
10801096

10811097
- We don't have an executable Agda spec for Leios, only a relational one (with holes).
1082-
- We need to make the spec executable, but we know from experience with Peras that maintaining *both* a relational spec and an executable spec is costly
1098+
- We need to make the spec executable, but we know from experience with Peras that maintaining _both_ a relational spec and an executable spec is costly
10831099
- to guarantee at least soundness we need to prove the executable spec implements correctly the relational one which is non trivial
10841100
- Also, a larger question is how do we handle adversarial behaviour in the spec?
10851101
- it's expected the specification uses dependent types to express the preconditions for a transition, so that only valid transitions can be expressed at the level of the specification
1086-
- but we want the *implementaiton* to also rule out those transitions and therefore we want to explicitly test failed preconditions
1102+
- but we want the _implementaiton_ to also rule out those transitions and therefore we want to explicitly test failed preconditions
10871103
- then the question is: how does the (executable) specification handles failed preconditions? does it crash? can we know in some ways it failed?
10881104
- we need to figure how this is done in the ledger spec
10891105
- In the case of Peras, we started out modelling an `Adversary` or dishonest node in the spec but this proved cumbersome and we needed to relax or remove that constraint to make progress
10901106

1091-
- however, it seems we really want the executable spec to be *total* in the sense that any sequence of transitions, valid or invalid, has a definite result
1107+
- however, it seems we really want the executable spec to be _total_ in the sense that any sequence of transitions, valid or invalid, has a definite result
10921108

10931109
- we have summarized short term plan [here](https://github.com/input-output-hk/ouroboros-leios/issues/42)
10941110
- we also need to define a "longer" term plan, eg. 2 months horizon
@@ -1260,7 +1276,7 @@ ND starts raising a few concerns he has about leios that should be answered:
12601276
- How does it work at saturation?
12611277

12621278
A key issue is potential attack vector that comes from de-duplicating txs: how is it handled by Leios forwarding infra? In general, how does Leios deals with adversarial behaviour?
1263-
We acknowledge this needs to be answered, and there's work on mempool management that needs to happen, but that's not the core topic we want to work on *now*
1279+
We acknowledge this needs to be answered, and there's work on mempool management that needs to happen, but that's not the core topic we want to work on _now_
12641280

12651281
Another important question to answer is "What resources are needed?" as this has a deep impact on centralisation:
12661282

@@ -1633,7 +1649,7 @@ Here are a few comments by @bwbush about the `leios-sim` package:
16331649

16341650
Added some documentation to the Leios simulator:
16351651

1636-
- Added *tooltips* to document the various parameters available
1652+
- Added _tooltips_ to document the various parameters available
16371653
- Added readonly fields computing various aggregates from the simulation's data: Throughput, latency to inclusion in EB, dropped IB rate
16381654
- Added a [comment](https://github.com/input-output-hk/ouroboros-leios/issues/7#issuecomment-2236521300) on the simulator issue as I got perplexed with the throughput computation's result: I might be doing something wrong and not computing what I think I am computing as the results are inconsistent. I think this comes from the fact we are simulation 2 nodes so the throughput aggregates the 2 nodes' and should be assigned individually to each one, perhaps more as a distribution?
16391655

@@ -1656,12 +1672,12 @@ Managed to configure the ECS cluster, service, and task to run the image, but it
16561672

16571673
need to configure a secret containing a PAT for pulling the manifest: <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_repositoryCredentials>
16581674

1659-
I gave up trying to run on AWS, every solution I found is an insanely intricate maze of stupidly complicated solution which I don't care about as I only need to deploy a *single* image without any data dependency attached.
1675+
I gave up trying to run on AWS, every solution I found is an insanely intricate maze of stupidly complicated solution which I don't care about as I only need to deploy a _single_ image without any data dependency attached.
16601676

16611677
I managed to get Gcloud run deployment working, mostly copy pasting what I did peras and fiddling with it.
16621678

16631679
- I reused same service account than Peras which is a mistake -> should create a new service account with limited rights
1664-
- Needeed to add service account as an *owner* of the domain in the google console (a manual task) in order to allow subdomain mapping
1680+
- Needeed to add service account as an _owner_ of the domain in the google console (a manual task) in order to allow subdomain mapping
16651681
- Changed the server code to support defining its port from `PORT` environment variable which is provided by the deployment configuration
16661682

16671683
Allowing anyone to access the server proved annoying too: The folowing configuration works
@@ -1790,7 +1806,7 @@ The recording is available on GDrive: <https://drive.google.com/file/d/1r04nrjMt
17901806

17911807
Discussing with researchers on some early simulations that are being worked on for Leios.
17921808

1793-
- Constraint: Setup threshold on *egress* bandwidth, then simulate diffusion of a block to downstream peers
1809+
- Constraint: Setup threshold on _egress_ bandwidth, then simulate diffusion of a block to downstream peers
17941810
- upstream sends notificatoin (Eg. header)
17951811
- downstream asks for block body if it does not have it
17961812
- then it "validates" (simulated time) and advertises to neighbours
@@ -1804,7 +1820,7 @@ Discussing with researchers on some early simulations that are being worked on f
18041820
- δ = 8 (4 inbound, 4 outbound)
18051821
- b/w limit = 1Mb/s
18061822
- block size ~ 1kB
1807-
- when sending 10 blocks/s we observe more variation, a bit more contention as the *freshest first* policy starts to kick in
1823+
- when sending 10 blocks/s we observe more variation, a bit more contention as the _freshest first_ policy starts to kick in
18081824
- at 1block/ms there's a much wider variation in time it takes to reach nodes
18091825
- the first blocks take the longest as the queues are filling up with fresher blocks
18101826
- latest blocks go faster, almost as fast as when rate is much slower, but this is also an artifact of the simulation (eg. time horizon means there's no block coming after which decreases contention)
@@ -1854,7 +1870,7 @@ Spyros will work this week on network simulation for Leios
18541870
- need to queue local actions according to bandwidth availability
18551871
- main input parameter is IB generation rate
18561872
- output = delivery ratio of IBs
1857-
- if IB rate > threshold -> most blocks won't make it because of *freshest first* policy
1873+
- if IB rate > threshold -> most blocks won't make it because of _freshest first_ policy
18581874

18591875
Next steps:
18601876

@@ -1961,10 +1977,10 @@ Here is some draft we drew:
19611977
19621978
Couple explanations:
19631979
1964-
- Upper part is about *equivocation*, eg. an adversary producing different IBs at the same slot.
1965-
- a node will observe the equivocation (on the far right) by being offered 2 *equivocated* headers from different peers
1966-
- This node will be able to produce a *proof of equivocation* that's useful when voting for IBs (and EBs?)
1967-
- Lower part is about *freshest first* download policy: Two nodes producing valid IBs at different slots.
1980+
- Upper part is about _equivocation_, eg. an adversary producing different IBs at the same slot.
1981+
- a node will observe the equivocation (on the far right) by being offered 2 _equivocated_ headers from different peers
1982+
- This node will be able to produce a _proof of equivocation_ that's useful when voting for IBs (and EBs?)
1983+
- Lower part is about _freshest first_ download policy: Two nodes producing valid IBs at different slots.
19681984
- given the choice of headers (and bodies) consumer node will choose to download the freshest body first, eg. B in this case
19691985
- headers are downloaded in any order as we can't know whether or not they are "freshest" before reading them
19701986
- It seems that's only relevant if there are more blocks offered than available bandwidth :thinking:

delta_q/comparison_rs.txt

+11
Large diffs are not rendered by default.

sim-rs/txn_diffusion.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jq --unbuffered -rc 'select(.message.type=="TransactionGenerated") | (.message.i
66
echo $id $t
77
CDF=`(
88
echo $t
9-
jq -c 'select(.message|.type=="TransactionReceived" and .id=='$id') | {time,id:.message.id}' < "$1"
9+
jq -c $id' as $id|select(.message|{type,id}|. == {type: "TransactionReceived", id: $id|tostring}) | {time,id:.message.id}' < "$1"
1010
) | jq -srf convert.jq`
1111
if [ -z "$RET" ]; then
1212
RET="$CDF"

0 commit comments

Comments
 (0)