Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Truncate hash output rather than modulo p #142

Open
HarryR opened this issue Jul 26, 2019 · 2 comments
Open

Truncate hash output rather than modulo p #142

HarryR opened this issue Jul 26, 2019 · 2 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@HarryR
Copy link
Owner

HarryR commented Jul 26, 2019

As per: https://crypto.stackexchange.com/questions/21006/security-concern-about-reducing-hash-value-using-modulo-operation/21010#21010

I know @sanchopansa asked this question privately, and I remember responding at the time that I was unable to find bias in the outputs with a range of small primes (where a brute-force approach was possible)

Also referencing: Loopring/protocol3-circuits#7

The goal is to Identify where hash output is computed modulo N rather than truncated to a number of bits where it fits safely within the field. I would have thought that for modulo reduction vs truncation, if the source number is at 2 bits larger than the result, then you would get even uniformly random distribution, rather than the

The following reference is useful: https://ericlippert.com/2013/12/16/how-much-bias-is-introduced-by-the-remainder-technique/ - However, that is with a 2^n extension field rather than a prime field.

The libff::Fp_model::random_element() function sets the highest bits to zero until it's below the modulus, which is equivalent to masking the least significant FieldT::capacity() bits.

Locations:

I can't find any other locations where something is interpreted as mod p rather than truncating to floor(log2(p)) bits.

There is one location where the output of a hash is truncated:

@HarryR HarryR added help wanted Extra attention is needed question Further information is requested labels Jul 27, 2019
@HarryR
Copy link
Owner Author

HarryR commented Jul 27, 2019

Have done some research into this.

With 2*ceil(log2(p)) versus n*... where n>2 you get diminishing returns on how uniform the distribution is. Say if the prime is b bits, and you obtain 2*b random bits then reduce modulo p, then the distribution is roughly equivalent to 10*b or 100*b random bits.

If we look at this as a pidgeon-hole-principal problem, with p**3 random samples.

The ideal mean and median will be p**2, the ideal population standard deviation will be p, and the ideal population variance will be p**2.

when truncating the number of bits to be k=floor(log2(p)), the population standard deviation and variance are close to ideal, but the mean and median are larger, meaning a smaller range of values are filled, such that p/mean = k.

Thus there are two patterns which are safe:

  1. Truncating random input to k=floor(log2(p)) bits, where you get k bits of space.
  2. Using k=2*ceil(log2(p)) bits, where the result is k % p

And using one, or two extra bits seems to introduce large biases.

This re-enforces the notion that simply taking ceil(log2(p)), even with a few extra bits, is generally not secure when reduced modulo p to derive a uniformly random distribution, and as such either of the two techniques can be used for better quality randomness.

I am unsure of the real-world implications of other approaches. So this change should definitely be adopted.

example code used to derive stats about probabilities;

from math import log2, ceil, floor
from collections import defaultdict
from hashlib import sha256
from os import urandom
from statistics import mean, median, pstdev, pvariance, stdev, variance


with open('primes1.txt') as handle:
	for lineno, line in enumerate(handle):
		for p in map(int, filter(lambda x: x not in ['', "\n", "\m"], line.split(' '))):
			if p == 2:
				continue
			b32_occurs = defaultdict(int)
			sm_occurs = defaultdict(int)
			mm_occurs = defaultdict(int)
			mm1_occurs = defaultdict(int)
			mm2_occurs = defaultdict(int)

			maxbits = ceil(log2(p))
			safebits = floor(log2(p))
			max_mask = (1<<(maxbits)) - 1
			max1_mask = (1<<(maxbits+1)) - 1
			max2_mask = (1<<(maxbits*2)) - 1
			safe_mask = (1<<(safebits)) - 1
			print(p, log2(p), ceil(log2(p)), bin(max_mask), safe_mask)
			hasher = sha256(urandom(32))
			p_cubed = p**3
			p_squared = p **2
			for _ in range(p_cubed):
				rand_sample_bytes = hasher.digest()
				rand_sample = int.from_bytes(rand_sample_bytes, 'little')
				safe_rand = rand_sample & safe_mask
				max_rand = rand_sample & max_mask
				max1_rand = rand_sample & max1_mask
				max2_rand = rand_sample & max2_mask

				b32_occurs[ rand_sample % p ] += 1
				sm_occurs[ safe_rand % p ] += 1
				mm_occurs[ max_rand % p ] += 1
				mm1_occurs[ max1_rand % p ] += 1
				mm2_occurs[ max2_rand % p ] += 1

				hasher.update(rand_sample_bytes)
			for n, v in [('b32', b32_occurs), ('sm', sm_occurs), ('mm', mm_occurs), ('mm1', mm1_occurs), ('mm2', mm2_occurs)]:
				vals = v.values()
				print(n, round(mean(vals) / p_squared, 3), round(median(vals) / p_squared, 3), round(p / pstdev(vals),3), round(pvariance(vals) / p_squared,3)) # sorted(v.items()), 
			print()
		if lineno > 10:
			break

@HarryR
Copy link
Owner Author

HarryR commented Jul 27, 2019

The approach used by libff is to clear all bits higher than the MSB of modulus, then reject samples until it's within the field. This has a potentially unlimited bound, but practically there are few iterations - although on average it may consume more entropy than then 2*log2(p) bits used with approach 2)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant