Releases: Cyan4973/xxHash
xxHash v0.8.2
xxHash v0.8.2
is an incremental update featuring multiple small improvements and fixes spread out over ~300 commits.
Faster performance
Several updates by @easyaspi314 and @hzhuang1 impact arm
platform, most notably the neon
code path. On the M1 Pro
, this translates into +20% speed for xxh3
and xxh128
(from 30.0 GB/s to 36 GB/s).
Some of the changes are generic, so other platforms can be affected too, though typically to a lesser extend (~5%).
On wasm
, speed fo xxh3
is improved by a large factor x2 to x3 (depending on underlying hardware) through the use of simd128
(@easyaspi314). This is especially efficient under the v8
js engine, notably used by chrome
and node.js
.
Finally, @hzhuang1 added support for the arm
's SVE
vector extension. This is useful for server-side aarch64
cpus with hardware support for wide vectors, such as Fujitsu's A64FX.
Fixes and improvements
Notable fixes in this update include the resolution of issues with XXH3
S390x
vector implementation, PowerPC
vector compilation with IBM XL compiler, and -Og
compilation.
Furthermore, the command line interface (CLI) was refined with features such as support for comment lines in check files and commands such as --binary
and --ignore-missing
(@t-mat). Additionally, issues with filename containing /LF
character were resolved.
The build process was also refined, with improvements such as fixing pkgconfig
generation with cmake
(@ilya-fedin), icc
compilation, cmake
install directories, and new build options to reduce binary size (@easyaspi314). Dedicated install targets were introduced (@ffontaine), and support for DISPATCH
mode in cmake was added (@hzhuang1).
In terms of portability, the update includes the SVE
vector implementation of XXH3
, compatibility with freestanding environments using XXH_NO_STDLIB
, and the ability to build on Haiku. The code has also been validated on m68k
and risc-v
.
Documentation
XXH3
finally has a written specification, thanks to @adrien1018 !
Source code can also be digested by doxygen
to generate code documentation automatically. An instance is now available at homepage.
Erratum
There is a bug in this version when invoking the function XXH3_128bits_withSecretandSeed()
, specifically when the parameter seed == 0
, and input length < XXH3_MIDSIZE_MAX
(< 240 bytes), and the secret
is different from the one created with XXH3_generateSecret_fromSeed()
, and the user is invoking the Streaming API. The hash values produced in this case are incorrect: as stated in the documentation, they should be == XXH3_128bits_withSeed()
. This is fixed in later version and the dev
branch , thanks to @hltj.(a9b2f18).
Changelog
- fix : XXH3
S390x
vector implementation (@hzhuang1) - fix : PowerPC vector compilation with IBM XL compiler (@MaxiBoether)
- perf : improved
WASM
speed by x2/x3 usingSIMD128
(@easyaspi314) - perf : improved speed (+20%) for XXH3 on ARM
NEON
(@easyaspi314) - cli : Fix filename contain
/LF
character (@t-mat) - cli : Support
#
comment lines in--check
files (@t-mat) - cli : Support commands
--binary
and--ignore-missing
(@t-mat) - build: fix
-Og
compilation (@easyaspi314, @t-mat) - build: fix
pkgconfig
generation withcmake
(@ilya-fedin) - build: fix
icc
compilation - build: fix
cmake
install directories - build: new build options
XXH_NO_XXH3
,XXH_SIZE_OPT
andXXH_NO_STREAM
to reduce binary size (@easyaspi314) - build: dedicated install targets (@ffontaine)
- build: support
DISPATCH
mode incmake
(@hzhuang1) - portability: fix
x86dispatch
when building withVisual
+ clang-cl (@t-mat) - portability:
SVE
vector implementation of XXH3 (@hzhuang1) - portability: compatibility with freestanding environments, using
XXH_NO_STDLIB
- portability: can build on Haiku (@Begasus)
- portability: validated on
m68k
andrisc-v
- doc : XXH3 specification (@adrien1018)
- doc : improved doxygen documentation (@easyaspi314, @t-mat)
- misc : dedicated sanity test binary (@t-mat)
Full change list (github generated)
- Fix an assert comparison the same values (flagged by PVS Studio in 0.8.1) by @kcgen in #628
- Add GitHub Actions badge for release branch by @t-mat in #633
- Add windows-2022 to ci.yml by @t-mat in #634
- Add macOS matrix to ci.yml by @t-mat in #635
- Fix compilation on RHEL 7 ppc64le (gcc 4.8) by @ellert in #631
- Add clang-cl for MSVC 2019 to ci.yml by @t-mat in #637
- [NEON] Split XXH3 into 6 NEON lanes and 2 scalar lanes on aarch64 by @easyaspi314 in #632
- Fix some ARM/clang-cl feature detection issues by @easyaspi314 in #623
- Add QEMU/gcc matrix to ci.yml by @t-mat in #640
- fix #625 by @Cyan4973 in #638
- fix #627 by @Cyan4973 in #639
- added m68k emulation tests to GA by @Cyan4973 in #643
- Document some nerdy ARM stuff, move scalarRound down. by @easyaspi314 in #642
- fix minor static analyzer warning by @Cyan4973 in #644
- fix man page installation by @Cyan4973 in #648
- fix cmake --install by @Cyan4973 in #649
- Use attribute((aligned)) instead of packed by @Hello71 in #650
- [ARM/AArch64] Fix multiple GCC codegen problems by @easyaspi314 in #651
- removed XXH3 declarations when XXH_NO_XXH3 is defined by @Cyan4973 in #653
- new build macro XXH_NO_STDLIB by @Cyan4973 in #654
- improved nostdlib test by @Cyan4973 in #656
- added attribute((const)) by @Cyan4973 in #657
- added attribute((malloc)) by @Cyan4973 in #658
- added attribute((pure)) by @Cyan4973 in #659
- Documentation update by @easyaspi314 in #661
- Makefile: add dedicated install targets by @ffontaine in #665
- XXH_HAS_C_ATTRIBUTE(x)?! by @easyaspi314 in #662
- do no longer depend on
<assert.h>
for XXH_STATIC_ASSERT by @Cyan4973 in #670 - Properly fix altivec namespace collisions by @easyaspi314 in #672
- Introduce XXH_SIZE_OPT and XXH_NO_STREAM by @easyaspi314 in #667
- Remove duplicated definition of XXH3_128bits by @mterron in #676
- Removed windows-2016 from ci.yml by @t-mat in #690
- tipi.build instructions by @pysco68 in #688
- Fix issue #695 by @t-mat in #698
- Build fix for Haiku by @Begasus in #696
- Use inline assembler for Power/IBM XL Compiler by @MaxiBoether in #708
- test filename-escape by @Cyan4973 in #710
- avoid add_compile_definitions for cmake < v3.12 by @Cyan4973 in #711
- just more cmake v2.8.12 tests by @Cyan4973 in #721
- CPack Added in #719
- Remove stream loads and slightly improve avx512 seed generation by @goldsteinn in #726
- Fix: brace expansion by @t-mat in #729
- Fix issue #724 by @t-mat in #730
- Remove macOS-10.15 from ci.yml by @t-mat in #736
- blind fix for fallthrough on icc by @Cyan4973 in #718
- Optimize XXH3_accumulate_512_neon by @dougallj in #734
- Fix typos found by codespell by @DimitriPapadopoulos in #739
- ci: fix tipi build error on github CI workflow by @hzhuang1 in #749
- Update GitHub Actions by @DimitriPapadopoulos in #742
- xxhash: support SVE by intrinsic code by @hzhuang1 in #752
- fix issues reported by cppcheck by @hzhuang1 in #746
- CI: fix missing space by @hzhuang1 in #758
- Fixing tipi-build / Build as dependency CI step by @pysco68 in #760
- Customize full accumulating loop for SVE by @hzhuang1 in #756
- added macos-12 test to GH CI by @Cyan4973 in #765
- Small improvement to x86 vectorized hashes and medium-sizes hash. by @goldsteinn in #754
- dispatch: Use attribute((constructor)) on XXH_setDispatch by @goldsteinn in #773
- Fix typo found by codespell by @di...
v0.8.1
xxHash v0.8.1
is a general clean up of the code base, following the stabilization of xxh3
and xxh128
in v0.8.0
.
There are a few welcomed evolutions and improvements, but for the most part, this release consists of fixes for multiple corner cases and scenarios, that shall improve usability of libxxhash
and xxhsum
across a wide range of platforms.
Stable API
entry points have not changed, all entry points labelled "stable" will continue to work as intended in this release and future ones.
Improved performance
While the "big picture" is unchanged, there are a few notable improvements.
XXH3
/ XXH128
feature a large speed improvement in streaming mode, which is particularly sensible for gcc
and MSVC
(clang
was already in good shape), by as much as +40%, making streaming speed essentially on par with single-shot mode when ingesting large quantities of data.
XXH64
and even XXH32
feature improved latency performance for small inputs of random sizes. Perhaps as importantly, their binary size is smaller.
New capabilities
There is a new experimental XXH3
variant, named _withSecretandSeed()
. In a nutshell, it combines seed
for small inputs, with secret
for large inputs.
The main driver for this variant is a wish to skip the delay from secret
's transparent generation when using _withSeed()
variant with large inputs, resulting in measurable performance drop for "not so large" sizes (< 1 KB) (note: this delay is insensible for "large" inputs, such as > 256 KB). Coupled with new function XXH3_generateSecret_fromSeed()
, which generates the same secret as the one generated internally when using the _withSeed()
variant, it results in exactly the same return values, while skipping the secret
generation stage, thus improving speed.
Experimental XXH3_generateSecret()
has been extended to allow generation of secret
of any size (though respecting the specification's minimum size). It's generally recommended to use this generator to ensure a source of "high entropy" for the secret
.
On the CLI front, a highly demanded xxhsum
feature was an ability to generate XXH3
checksum values. This is achieved in v0.8.1
, using the --tag
format, which ensures that XXH3
results cannot be confused with (default) XXH64
ones, even though they feature the same 64-bit width.
Detailed changelist
- perf : much improved performance for
XXH3
streaming variants, notably ongcc
andmsvc
- perf : improved
XXH64
speed and latency on small inputs - perf : small
XXH32
speed and latency improvement on small inputs of random size - perf : minor stack usage improvement for
XXH32
andXXH64
- api : new experimental variants
XXH3_*_withSecretandSeed()
- api : updated
XXH3_generateSecret()
, can now generate secret of any size (>= XXH3_SECRET_SIZE_MIN
) - cli :
xxhsum
can now generate and checkXXH3
checksums, using command-H3
- build: can build xxhash without
XXH3
, with new build macroXXH_NO_XXH3
- build: fix
xxh_x86dispatch
build with MSVC, by @apankrat - build:
XXH_INLINE_ALL
can always be used safely, even afterXXH_NAMESPACE
or a previousXXH_INLINE_ALL
- build: improved PPC64LE vector support, by @mpe
- install: fix pkgconfig, by @ellert
- install: compatibility with Haiku, by @Begasus
- doc : code comments made compatible with doxygen, by @easyaspi314
- misc :
XXH_ACCEPT_NULL_INPUT_POINTER
is no longer necessary, all functions can acceptNULL
input pointers, as long assize == 0
- misc : complete refactor of CI tests on Github Actions, offering much larger coverage, by @t-mat
- misc :
xxhsum
code base split into multiple specialized units, within directorycli/
, by @easyaspi314
xxHash v0.8.0 - Stable XXH3
Stable XXH3
After more than a year in the making, XXH3
has finally reached stable
status, for both its 64-bit and 128-bit variants.
While the code itself was in good enough shape for production use, the generated values could still change between versions. This limited XXH3
to local sessions only.
From now on, output values produced by XXH3
for a given input and parameter set will remain identical across systems and across future versions. It makes it possible to store these values for later comparison, or to exchange them across network connections.
BSD-style checksums
Official stabilization being the main goal of this release, there are only minimal additional changes.
A notable one though is the ability for xxhsum
CLI to produce and check BSD-style checksum lines, using command --tag
.
One advantage of --tag
format is that it explicitly specifies the algorithm and format used to represent the checksum. For example, it explicitly mentions if a checksum value follows the canonical format (XXH32
) or the alternative little-endian format (XXH32_LE
).
Generating BSD-style checksum lines was actually already possible, but as the CLI was unable to --check
them, it remained a hidden option.
This situation changes with v0.8.0
, thanks to a patch by @WayneD which makes it possible to --check
BSD-style checksum lines.
Detailed list
- api : stabilize XXH3
- cli :
xxhsum
can produce BSD-style lines, with command--tag
- cli :
xxhsum
can parse and check BSD-style lines, using command--check
, by @WayneD - cli :
xxhsum -
accepts console input, requested by @jaki - cli :
xxhsum
accepts--
separator, by @jaki - cli : fix : print correct default algo for symlinked helpers, by @martinetd
- install: improved pkgconfig script, allowing custom install locations, requested by @ellert
xxHash v0.7.4 - Finalizing XXH3 and XXH128
xxHash v0.7.4
is the last evolution of xxh3
and xxh128
, primarily designed to finalize the algorithm.
It is considered release candidate for v0.8.0
, which means that if all goes right, this version will rebranded v0.8.0
, almost "as is", within the next few weeks, after receiving sufficient feedback.
v0.8.0
is the official version after which XXH3
and XXH128
are considered "stabilized", meaning that return values will never change given the same input
and seed
, making the hash suitable for long-term storage and transmission.
Beyond these "final touches", the new version also brings a few notable improvements.
Automatic vector detection
x86
/x64
systems can enjoy a new unit, xxh_x86dispatch
, which can detect at runtime the best vector instruction set present on host system (none, sse2
, avx2
or avx512
), thanks to a cpu feature detector designed by @easyaspi314. It then automatically runs the appropriate vector code.
This makes it safer to deploy a single binary with advanced vector instruction sets, such as AVX2
, since there is no hard requirement for all target systems to actually support it : the binary can automatically switch to SSE2
instead.
As a proof of concept, the windows builds provided alongside this release are compiled with this new capability.
AVX512
support
A new vector instruction set is supported, thanks to @gzm55 : AVX512
. It can be applied on XXH3
and XXH128
, using some of the most recent Intel cpus, such as IceLake on laptop. It typically offers +50% more performance compared to AVX2
.
Secret Generator
Advanced users can be interested in the highly customizable variant _withSecret()
, which makes it possible to run XXH3
and XXH128
algorithms using one's own secret
.
However, the quality of the hash depends on the high entropy (randomness) of the secret
. And sometimes, it can be difficult to ensure that the candidate secret
is "random enough".
In order to produce a secret
of high quality, a new function XXH3_generateSecret()
is proposed in the advanced API section. It will convert any blob of bytes, named customSeed
, into a high quality secret
which respects all conditions expected by XXH3
and XXH128
. This is true even if customSeed
itself is of poor quality, such as a bunch of \0
bytes or some short or repeated common sequence.
No API modification
The existing API present in 0.7.3 has remained unchanged in 0.7.4. Any programs linking with 0.7.3 should continue to work as-is.
Note however that xxh3
/xxh128
return values are not comparable across these versions.
0.7.x are labelled development versions, and should only be used for ephemeral data (hash produced and consume in the same local session).
(note : this limitation does not extend to XXH32
and XXH64
, which are considered fully stable and specified).
Changelist
There are multiple smaller bug fixes and minor improvements that have been brought to this repository by great contributors. Here is a summarized list:
- perf: automatic vector detection and selection at runtime (
xxh_x86dispatch.h
), initiated by @easyaspi314 - perf: added
AVX512
support, by @gzm55 - api : new: secret generator
XXH_generateSecret()
, suggested by @koraa - api : fix:
XXH3_state_t
is movable, identified by @koraa - api : fix: state is correctly aligned in AVX mode (unlike
malloc()
), by @easyaspi314 - api : fix: streaming generated wrong values in some combination of random ingestion lengths, reported by @WayneD
- cli : fix unicode print on Windows, by @easyaspi314
- cli : can
-c
check file generated bysfv
- build:
make DISPATCH=1
generatesxxhsum
andlibxxhash
with runtime vector detection (x86/x64 only) - install: cygwin installation support
- doc : Cryptol specification of
XXH32
andXXH64
, by @weaversa
xxHash v0.7.3
xxHash v0.7.3 is major evolution for xxh3
and xxh128
, with a focus on speed and dispersion performance.
Speed improvements
v0.7.3
pays a lot of attention to small data, by delivering generally faster latency metrics (about +10%).
Inlining is now a first class citizen, as it is generally key to best performance on small inputs.
Among the visible changes:
XXH_INLINE_ALL
can always be set before includingxxhash.h
, even ifxxhash.h
was previously included (for example transitively, as part of a prior*.h
header file).- The algorithm implementation has been transferred into
xxhash.h
. It's no longer necessary to keep a copy ofxxhash.c
in the/include
directory for inlining to work correctly.- Note:
xxhash.c
still exists, as it's useful to instantiate xxhash functions as public symbols accessible from a library or a*.o
object file. It also remains compatible with existing projects.
- Note:
Large data has also received a boost, which can go up to +20% for very large samples (> many MB).
Let's underline the remarkable optimization work of @easyaspi314, who hand optimized several hot loops and instructions, and even added a new Z-vector target for s390x
hardware.
No API modification
The API has remained completely stable between 0.7.2 and 0.7.3. Any programs linking with 0.7.2 should work as-is.
Note that xxh3
/xxh128
results are not comparable across these versions.
New test tool
Testing a 64-bit hash algorithm for its collision rate has remained elusive for most. The sheer volume of data required to assess quality at this scale is too large for traditional test tools like SMHasher
. As a general guide, it requires 4 billion hashes to reach a 50% probability of getting a single collision. Accurate collision ratio evaluation requires many more hashes to actually measure something meaningful.
A new open-source tool in tests/collisions
offers this capability. It requires a lot of memory to run, with a minimum of 32 GB to measure anything significant. But provided that one has a system with enough capacity, it can accurately measure the collision ratio of any 64-bit hash algorithm.
Several algorithms were measured thanks to this tool, the result of which is currently consolidated on this wiki page. More can be added in the future.
This new development round also introduced several improvements to the SMHasher
test suite, uncovering new requirements for new scenarios. This proved beneficial to improve the general dispersion qualities of xxh3
and xxh128
.
Changelist
Here is a summarized list of changes for this version:
- perf: improved speed for large inputs (~+20%)
- perf: improved latency for small inputs (~10%)
- perf: s390x Vectorial code, by @easyaspi314
- cli: Improved support for Unicode filenames on Windows, thanks to @easyaspi314 and @t-mat
- api:
xxhash.h
can now be included in any order, multiple times, with and withoutXXH_STATIC_LINKING_ONLY
orXXH_INLINE_ALL
- build: xxHash's implementation has been transferred into
xxhash.h
. There is no more need to havexxhash.c
in the/include
directory forXXH_INLINE_ALL
to work - install: created pkg-config file, by @bket
- install: VCpkg installation instructions, by @LilyWangL
- doc: Highly improved code documentation, by @easyaspi314
- misc: New test tool in
/tests/collisions
: brute force collision tester for 64-bit hashes
xxHash v0.7.2
This a maintenance release, focused on the newer 128-bit variant.
Note that XXH3
is still labelled experimental : return values from this version are not comparable with other versions.
- Fixed collision ratio of
XXH128
for some specific input lengths, reported by @svpv - Improved
VSX
andNEON
variants, by @easyaspi314 - Improved performance of scalar code path (
XXH_VECTOR=0
), by @easyaspi314 xxhsum
: can generate 128-bit hash with command-H2
(note : for experimental purposes only !XXH128
is not yet frozen)xxhsum
: option-q
removes status notifications
xxHash v0.7.1
The main feature of this release is an update of XXH3
, building upon many user feedbacks during this test period. The main points are :
- Secret first : the algorithm computation can be altered by providing a "secret", which is any blob of bytes, of size >=
XXH3_SECRET_SIZE_MIN
. seed
is still available, and acts as a secret generator- As a consequence of these changes, note that new return values of
XXH3
are not compatible with v0.7.0 - updated
ARM NEON
variant by @easyaspi314 - Streaming implementation is available
- Improve compatibility and performance with Visual Studio, with help from @aras-p
- Better integration when using
XXH_INLINE_ALL
: do not pollute host namespace, use its own macros, such asXXH_ASSERT()
,XXH_ALIGN
, etc. - 128-bits variant provide helper function, for comparison of hashes.
Note that XXH3
is still considered experimental at this stage. It will have to remain stable for at least 2 releases before being branded "stable". After which stage, the algorithm and produced results will no longer evolve.
Several general improvements are also present in this release :
- Better
clang
generation ofrotl
instruction, thanks to @easyaspi314 XXH_REROLL
build macro, to reduce binary size, by @easyaspi314- Improved
cmake
script, by @Mezozoysky - Full benchmark program provided in
/tests/bench
xxHash v0.7.0
The main highlight of this release is the introduction of XXH3
, a new hash algorithm offering much improved speed, for both large and small inputs.
XXH3
is still labelled experimental, and must be unlocked with macro XXH_STATIC_LINKING_ONLY
. The source code is located into its own xxh3.h
file, which is automatically included (and therefore required) by xxhash.c
. It's also possible to include xxh3.h
directly, which will have a similar effect as triggering XXH_INLINE_ALL
.
At this stage, XXH3
is suitable for ephemeral data and tests, but avoid storing long term hash values yet.
XXH3
will be transferred into stable in a future release, after a period dedicated to gather users' feedback.
For more details on XXH3
performance, see this article.
note : there are known compilation issues under Visual Studio, which have been later fixed in dev
branch.
xxHash v0.6.5
- Improved performance on small keys, thanks to suggestions from Jens Bauer
- New build macro,
XXH_INLINE_ALL
, extremely effective for small keys of fixed length (see this article for details) XXH32()
: better performance on OS-Xclang
by disabling auto-vectorization- Improved benchmark measurements accuracy on small keys
- Included xxHash specification document
xxHash v0.6.4
- build: new target
make lib
- build:
make install
also installs librarylibxxhash
- build:
cmake
builds library by default