Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Birthday attack on 64-bit seed #54

Open
solardiz opened this issue Mar 15, 2020 · 5 comments
Open

Birthday attack on 64-bit seed #54

solardiz opened this issue Mar 15, 2020 · 5 comments

Comments

@solardiz
Copy link
Contributor

64-bit seed did look like providing only a low safety margin to me during my ProgPoW review last year, and I was going to revisit this and share some thoughts with the community, but in the end I ran out of time and I felt like ProgPoW was non-final anyway (to my liking, at least) yet further tweaks were not encouraged. Now reminded and inspired by @kik's #51 and by this community's prompt response to it and willingness to tweak ProgPoW to fix it, I present another related yet very different attack:

While mining ProgPoW with a large cluster, maintain a cluster-wide cache of mappings from 64-bit seed to 256-bit mix digest (immediate result of the memory-hard computation). This cache can be emptied on every new period (10 blocks) and filled during the period, maybe for up to a pre-defined maximum size (as memory permits) such as 2^32 entries (128 GiB).

Once the cache fill is above a threshold, each cluster node can reasonably start to utilize its attached Keccak ASICs to search the nonce space until a previously cached seed is seen. For example, with a cache fill of 2^32 entries, it'd take around 2^32 Keccak computations until a cache hit. With a large enough cache and with enough Keccak ASICs working in parallel, this might be cheaper than doing a mix digest computation for a previously unseen seed (although the node's GPUs would also continue working on new seeds in parallel).

Now, what cache size would make this attack worthwhile? We'd need to match (and then exceed) a GPU's hash rate with our rate of finding previously cached seeds. With a cache of 2^32, and thus needing to do 2^32 Keccak computations, to match a GPU's e.g. 2^24 (16.8M) hashes/s we need to perform 2^56 Keccak computations per second. Can an ASIC with enough Keccak cores (perhaps across many chips) to accomplish this potentially consume less power than a GPU does? Probably not.

Can a cache much larger than 2^32 reasonably be maintained? Probably yes, distributed across the cluster nodes' RAM. Then fewer Keccak cores would be needed, and their energy efficiency vs. GPUs would be better.

Can a cluster node's Keccak ASICs quickly determine if a seed is (likely) cached? Probably yes, with a probabilistic data structure such as a Bloom filter in RAM closely attached to the ASICs. (They would not need to wait for this check result, but would proceed to test more nonces. There would need to also be locally stored queues of seed values to check.) This RAM could be many times smaller than the cache itself (perhaps 10 to 20 bits per seed, not 256), but it would nevertheless be the limiting factor on the total size of the cache.

Can the Bloom filter RAM have enough throughput to accommodate the many candidate seeds coming out of Keccak every second? That's probably the worst bottleneck. I guess it'd be tricky ASIC design with Keccak+SRAM cores (distributed RAM), NOC, and inter-chip mesh to implement that.

Overall, this doesn't look practical yet. But if we want to have a better safety margin and greater confidence with respect to attacks like this, we need to move to larger seeds.

My bigger concern isn't this attack per-se, but rather that this line of thought could become the missing piece of the puzzle in making some other yet-undiscovered attack practical.

@olalawal
Copy link

olalawal commented Mar 15, 2020 via email

@solardiz
Copy link
Contributor Author

@olalawal It is unhealthy to the project if/when we have to self-censor our sharing of potential issues out of concern that these would be looked at from a political ([inadvertently] helping one of the camps - for or against ProgPoW) rather than a technical perspective ("strengthening" ProgPoW, as @OhGodAGirl correctly put it in a tweet).

Can we please stick to the tech and discuss it openly? With self-censorship, we'd end up doing the tech worse and then playing catch-up to address issues that we could have discussed and dealt with (or made informed decisions not to) in time.

One of the reasons I said nothing about the 64-bit seed providing only a low safety margin last year was out of concern that my saying so would be misinterpreted as arguing against ProgPoW. I now realize I should have said this anyway - it could have saved us from @kik's exploit and needing to address it now.

@solardiz solardiz mentioned this issue Mar 15, 2020
@solardiz
Copy link
Contributor Author

One way to look at the attack I described is that it doesn't let attackers eliminate memory-hard computation (unlike #51), but it gives miners the (unintended) flexibility to partially replace the intended parallel memory-hard computation with different yet also parallel memory-hard computation (lookups against the Bloom filters or such). This makes it harder to analyze expected amortized costs of ProgPoW computation on future hardware, and it potentially gives large clusters an advantage (have the choice to do things differently) over smaller individual miners.

That said, with the numbers I came up with so far (in my original comment), this alternative choice looks less optimal than computing things in the straightforward manner. So at this point the attack is academic.

@solardiz
Copy link
Contributor Author

solardiz commented Mar 15, 2020

Here's an extension of the attack:

During initial cache filling, also use the Keccak ASICs to search nonces such that e.g. upper 20 bits of seed would be all 0's.

Then during the attack I initially described we can filter out most non-cached seeds by requiring the same bits to be all 0's, which is an extremely cheap check without any memory hardness. With the example of requiring 20 bits to be all 0's, we'd have reduced the frequency of Bloom filter lookups by a factor of over 1 million.

This arbitrarily reduces the Bloom filter RAM throughput issue, thereby removing what I described as "probably the worst bottleneck" in my first initial description of the attack.

With fewer maybe-non-0 bits left, this also reduces reasonable cache size or/and increases cache hit rate.

I think this extension is enough to make the attack practical, needing to be addressed now.

@solardiz
Copy link
Contributor Author

solardiz commented Mar 16, 2020

[This comment previously included corrections to the initial revision of my comment above, but I've since edited the comment above to reflect my latest understanding. I am not deleting this comment in case someone wants to see the full edits history.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants