-
Notifications
You must be signed in to change notification settings - Fork 57
Truncate hash output rather than modulo p
#142
Comments
Have done some research into this. With If we look at this as a pidgeon-hole-principal problem, with The ideal mean and median will be when truncating the number of bits to be Thus there are two patterns which are safe:
And using one, or two extra bits seems to introduce large biases. This re-enforces the notion that simply taking I am unsure of the real-world implications of other approaches. So this change should definitely be adopted. example code used to derive stats about probabilities; from math import log2, ceil, floor
from collections import defaultdict
from hashlib import sha256
from os import urandom
from statistics import mean, median, pstdev, pvariance, stdev, variance
with open('primes1.txt') as handle:
for lineno, line in enumerate(handle):
for p in map(int, filter(lambda x: x not in ['', "\n", "\m"], line.split(' '))):
if p == 2:
continue
b32_occurs = defaultdict(int)
sm_occurs = defaultdict(int)
mm_occurs = defaultdict(int)
mm1_occurs = defaultdict(int)
mm2_occurs = defaultdict(int)
maxbits = ceil(log2(p))
safebits = floor(log2(p))
max_mask = (1<<(maxbits)) - 1
max1_mask = (1<<(maxbits+1)) - 1
max2_mask = (1<<(maxbits*2)) - 1
safe_mask = (1<<(safebits)) - 1
print(p, log2(p), ceil(log2(p)), bin(max_mask), safe_mask)
hasher = sha256(urandom(32))
p_cubed = p**3
p_squared = p **2
for _ in range(p_cubed):
rand_sample_bytes = hasher.digest()
rand_sample = int.from_bytes(rand_sample_bytes, 'little')
safe_rand = rand_sample & safe_mask
max_rand = rand_sample & max_mask
max1_rand = rand_sample & max1_mask
max2_rand = rand_sample & max2_mask
b32_occurs[ rand_sample % p ] += 1
sm_occurs[ safe_rand % p ] += 1
mm_occurs[ max_rand % p ] += 1
mm1_occurs[ max1_rand % p ] += 1
mm2_occurs[ max2_rand % p ] += 1
hasher.update(rand_sample_bytes)
for n, v in [('b32', b32_occurs), ('sm', sm_occurs), ('mm', mm_occurs), ('mm1', mm1_occurs), ('mm2', mm2_occurs)]:
vals = v.values()
print(n, round(mean(vals) / p_squared, 3), round(median(vals) / p_squared, 3), round(p / pstdev(vals),3), round(pvariance(vals) / p_squared,3)) # sorted(v.items()),
print()
if lineno > 10:
break |
The approach used by |
As per: https://crypto.stackexchange.com/questions/21006/security-concern-about-reducing-hash-value-using-modulo-operation/21010#21010
I know @sanchopansa asked this question privately, and I remember responding at the time that I was unable to find bias in the outputs with a range of small primes (where a brute-force approach was possible)
Also referencing: Loopring/protocol3-circuits#7
The goal is to Identify where hash output is computed modulo N rather than truncated to a number of bits where it fits safely within the field. I would have thought that for modulo reduction vs truncation, if the source number is at 2 bits larger than the result, then you would get even uniformly random distribution, rather than the
The following reference is useful: https://ericlippert.com/2013/12/16/how-much-bias-is-introduced-by-the-remainder-technique/ - However, that is with a
2^n
extension field rather than a prime field.The
libff::Fp_model::random_element()
function sets the highest bits to zero until it's below the modulus, which is equivalent to masking the least significantFieldT::capacity()
bits.Locations:
I can't find any other locations where something is interpreted as
mod p
rather than truncating tofloor(log2(p))
bits.There is one location where the output of a hash is truncated:
The text was updated successfully, but these errors were encountered: