Skip to content

Commit

Permalink
better documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
lemire committed Jun 27, 2023
1 parent 7b03a3a commit 46401e8
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 20 deletions.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,41 @@ You would expect `ActualfpRate` to be close to the desired false-positive rate `
The `EstimateFalsePositiveRate` function creates a temporary Bloom filter. It is
also relatively expensive and only meant for validation.

## Serialization

You can read and write the Bloom filters as follows:


```Go
f := New(1000, 4)
var buf bytes.Buffer
bytesWritten, err := f.WriteTo(&buf)
if err != nil {
t.Fatal(err.Error())
}
var g BloomFilter
bytesRead, err := g.ReadFrom(&buf)
if err != nil {
t.Fatal(err.Error())
}
if bytesRead != bytesWritten {
t.Errorf("read unexpected number of bytes %d != %d", bytesRead, bytesWritten)
}
```

*Performance tip*:
When reading and writing to a file or a network connection, you may get better performance by
wrapping your streams with `bufio` instances.

E.g.,
```Go
f, err := os.Create("myfile")
w := bufio.NewWriter(f)
```
```Go
f, err := os.Open("myfile")
r := bufio.NewReader(f)
```

## Contributing

Expand Down
40 changes: 28 additions & 12 deletions bloom.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,31 @@ a non-cryptographic hashing function.
This implementation accepts keys for setting as testing as []byte. Thus, to
add a string item, "Love":
uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))
uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))
Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))
if filter.Test([]byte("Love"))
For numeric data, I recommend that you look into the binary/encoding library. But,
for example, to add a uint32 to the filter:
i := uint32(100)
n1 := make([]byte,4)
binary.BigEndian.PutUint32(n1,i)
f.Add(n1)
i := uint32(100)
n1 := make([]byte,4)
binary.BigEndian.PutUint32(n1,i)
f.Add(n1)
Finally, there is a method to estimate the false positive rate of a
Bloom filter with _m_ bits and _k_ hashing functions for a set of size _n_:
if bloom.EstimateFalsePositiveRate(20*n, 5, n) > 0.001 ...
if bloom.EstimateFalsePositiveRate(20*n, 5, n) > 0.001 ...
You can use it to validate the computed m, k parameters:
m, k := bloom.EstimateParameters(n, fp)
ActualfpRate := bloom.EstimateFalsePositiveRate(m, k, n)
m, k := bloom.EstimateParameters(n, fp)
ActualfpRate := bloom.EstimateFalsePositiveRate(m, k, n)
or
Expand Down Expand Up @@ -281,7 +281,9 @@ func (f *BloomFilter) ClearAll() *BloomFilter {

// EstimateFalsePositiveRate returns, for a BloomFilter of m bits
// and k hash functions, an estimation of the false positive rate when
// storing n entries. This is an empirical, relatively slow
//
// storing n entries. This is an empirical, relatively slow
//
// test using integers as keys.
// This function is useful to validate the implementation.
func EstimateFalsePositiveRate(m, k, n uint) (fpRate float64) {
Expand Down Expand Up @@ -343,6 +345,13 @@ func (f *BloomFilter) UnmarshalJSON(data []byte) error {

// WriteTo writes a binary representation of the BloomFilter to an i/o stream.
// It returns the number of bytes written.
//
// Performance: if this function is used to write to a disk or network
// connection, it might be beneficial to wrap the stream in a bufio.Writer.
// E.g.,
//
// f, err := os.Create("myfile")
// w := bufio.NewWriter(f)
func (f *BloomFilter) WriteTo(stream io.Writer) (int64, error) {
err := binary.Write(stream, binary.BigEndian, uint64(f.m))
if err != nil {
Expand All @@ -359,6 +368,13 @@ func (f *BloomFilter) WriteTo(stream io.Writer) (int64, error) {
// ReadFrom reads a binary representation of the BloomFilter (such as might
// have been written by WriteTo()) from an i/o stream. It returns the number
// of bytes read.
//
// Performance: if this function is used to read from a disk or network
// connection, it might be beneficial to wrap the stream in a bufio.Reader.
// E.g.,
//
// f, err := os.Open("myfile")
// r := bufio.NewReader(f)
func (f *BloomFilter) ReadFrom(stream io.Reader) (int64, error) {
var m, k uint64
err := binary.Read(stream, binary.BigEndian, &m)
Expand Down
18 changes: 10 additions & 8 deletions murmur.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ type digest128 struct {
h2 uint64 // Unfinalized running hash part 2.
}

//bmix will hash blocks (16 bytes)
// bmix will hash blocks (16 bytes)
func (d *digest128) bmix(p []byte) {
nblocks := len(p) / block_size
for i := 0; i < nblocks; i++ {
Expand All @@ -65,7 +65,7 @@ func (d *digest128) bmix(p []byte) {
}
}

//bmix_words will hash two 64-bit words (16 bytes)
// bmix_words will hash two 64-bit words (16 bytes)
func (d *digest128) bmix_words(k1, k2 uint64) {
h1, h2 := d.h1, d.h2

Expand Down Expand Up @@ -246,12 +246,14 @@ func fmix64(k uint64) uint64 {
// It is designed to never allocate memory on the heap. So it
// works without any byte buffer whatsoever.
// It is designed to be strictly equivalent to
// a1 := []byte{1}
// hasher := murmur3.New128()
// hasher.Write(data) // #nosec
// v1, v2 := hasher.Sum128()
// hasher.Write(a1) // #nosec
// v3, v4 := hasher.Sum128()
//
// a1 := []byte{1}
// hasher := murmur3.New128()
// hasher.Write(data) // #nosec
// v1, v2 := hasher.Sum128()
// hasher.Write(a1) // #nosec
// v3, v4 := hasher.Sum128()
//
// See TestHashRandom.
func (d *digest128) sum256(data []byte) (hash1, hash2, hash3, hash4 uint64) {
// We always start from zero.
Expand Down

0 comments on commit 46401e8

Please sign in to comment.