This repository hosts a Rust library implementing xor filters and their derivates:
- Binary Fuse filters (most recommended)
- Xor filters
- Fuse filters (deprecated, use Binary Fuse filters instead)
Xor filters are data structures for fast approximation of set membership using little memory. Probabilistic filters like xor filters are useful when it's okay to have false positives sometimes, but it's important to be space and time efficient. In other words, they trade off accuracy for efficiency as compared to general-purpose hashsets. Filters like xor filter are often used in conjunction with larger hash-based data structures, with the filter doing a "first pass" of the work to avoid using a more expensive resource unnecessarily. For example, filters like xor filters can be used to reduce disk writes in a cache or identify malicious URLs in a browser.
Xor filters are faster and smaller than Bloom and Cuckoo filters. Xor filters incur a relative time penalty in construction, but are very fast in lookups; the expectation is that construction of a filter is amortized after many queries. Daniel Lemire's go implementation provides a useful summary of xor filters' benefits and listing of other xor filter libraries.
This library is no_std
and
needs_allocator
.
xorf
also provides a HashProxy
for using Xor filters
with arbitrary key types.
Add a dependency to xorf
in Cargo.toml
:
[dependencies]
xorf = "M.m.p" # use a desired version
Available versions are listed on crates and the in repository's releases.
Please see the library documentation for usage information.
To use a custom global allocator,
you must be using a nightly release of rustc and have enabled the nightly
feature for xorf
.
[dependencies]
xorf = { version = "M.m.p", features = ["nightly"] }
This will tag the crate as needs_allocator
, which you will then have to
provide. At this time, a custom allocator is used globally.
Serialization and deserialization with serde cab be enabled
with the serde
feature.
[dependencies]
xorf = { version = "M.m.p", features = ["serde"] }
By default, xorf
uses the uniform-random
feature, which uses random values for unused
fingerprint entries rather than initializing them to zero. This provides a slightly lower false-positive
rate, but incurs a higher initialization cost. The cost of lookups is not affected.
The distribution of zero-valued fingerprints for a 1,000,000-element
BinaryFuse8
filter, both without and with the uniform-random
feature, is
visualized below. Notice that the scales of the y-axes differ.
BinaryFuse8 without uniform-random |
BinaryFuse8 with uniform-random |
---|---|
To disable the uniform-random
feature, specify that default features should be disabled:
[dependencies]
xorf = { version = "M.m.p", default-features = false }
By default, xorf
uses the binary-fuse
feature, which adds support for and
exposes Binary Fuse filter implementations. This feature pulls in a dependency
of libm
, but has no runtime cost. This feature is highly recommended, as
Binary Fuse filters are the most powerful in the Xor filter family.
Development of xorf
targets the master branch of this repository.
Changes can be tested by running the check
script:
scripts/check lf # validates lint and format
scripts/check test # tests source code
Contributions are warmly welcomed. No contribution is too small, and all are appreciated.