Base64 is a standard approach to represent any binary data as ASCII. It is part of the email standard (MIME) and is commonly used to embed data in XML, HTML or JSON files. For example, images can be encoded as text using base64. Base64 is also used to represent cryptographic keys.
Our processors have fast instructions (SIMD) that can process blocks of data at once. They are ideally suited to encode and decode base64. The C# .NET runtime library has fast (SIMD-based) base64 functions1 when the input is UTF-8.
Encoding is somewhat easier than decoding. Decoding is a more challenging problem than base64 encoding because of the presence of allowable white space characters and the need to validate the input. Indeed, all inputs are valid for encoding, but only some inputs are valid for decoding. Having to skip white space characters makes accelerated decoding somewhat difficult. We refer to this decoding as WHATWG forgiving-base64 decoding.
To handle spaces and validation, we recently designed faster base64 decoding algorithm. It has been deployed in the simdutf C++ library and used in production systems (e.g., the JavaScript runtime systems Node.js and Bun). With this new algorithm, we beat the C# .NET runtime functions by 1.7 x to 2.3 x on realistic inputs of a few kilobytes.
The algorithm is unpatented (free) and we make our C# code available under a liberal open-source licence (MIT).
We use the enron base64 data for benchmarking, see benchmark/data/email.
We process the data as UTF-8 (ASCII) using the .NET accelerated functions
as a reference (System.Buffers.Text.Base64.DecodeFromUtf8
). Our benchmark results are
fully reproducible.
processor and base freq. | SimdBase64 (GB/s) | .NET speed (GB/s) | speed up |
---|---|---|---|
Apple M2 processor (ARM, 3.5 Ghz) | 6.5 | 3.8 | 1.7 x |
AWS Graviton 3 (ARM, 2.6 GHz) | 3.6 | 2.0 | 1.8 x |
Intel Ice Lake (2.0 GHz) | 6.5 | 3.4 | 1.9 x |
AMD EPYC 7R32 (Zen 2, 2.8 GHz) | 6.8 | 2.9 | 2.3 x |
The .NET runtime did not accelerate the Convert.FromBase64String(mystring)
functions.
We can multiply the decoding speed compared to the .NET standard library.
Replace the following code based on the standard library...
byte[] newBytes = Convert.FromBase64String(s);
with our version...
byte[] newBytes = SimdBase64.Base64.FromBase64String(s);
processor and base freq. | SimdBase64 (GB/s) | .NET speed (GB/s) | speed up |
---|---|---|---|
Apple M2 processor (ARM, 3.5 Ghz) | 4.0 | 1.1 | 3.6 x |
Intel Ice Lake (2.0 GHz) | 2.5 | 0.65 | 3.8 x |
As for .NET 9, the support for AVX-512 remains incomplete in C#. In particular, important VBMI2 instructions are missing. Hence, we are not using AVX-512 under x64 systems at this time. However, as soon as .NET offers the necessary support, we will update our results.
We require .NET 9 or better: https://dotnet.microsoft.com/en-us/download/dotnet/9.0
The library only provides Base64 decoding functions, because the .NET library already has
fast Base64 encoding functions. We support both Span<byte>
(ASCII or UTF-8) and
Span<char>
(UTF-16) as input. If you have C# string, you can get its Span<char>
with
the AsSpan()
method.
string base64 = "SGVsbG8sIFdvcmxkIQ=="; // could be span<byte> in UTF-8 as well
byte[] buffer = new byte[SimdBase64.Base64.MaximalBinaryLengthFromBase64(base64.AsSpan())];
int bytesConsumed; // gives you the number of characters consumed
int bytesWritten;
var result = SimdBase64.Base64.DecodeFromBase64(base64.AsSpan(), buffer, out bytesConsumed, out bytesWritten, false); // false is for regular base64, true for base64url
// result == OperationStatus.Done
var answer = buffer.AsSpan().Slice(0, bytesWritten); // decoded result
// Encoding.UTF8.GetString(answer) == "Hello, World!"
dotnet test
To get a list of available tests, enter the command:
dotnet test --list-tests
To run specific tests, it is helpful to use the filter parameter:
dotnet test -c Release --filter DecodeBase64CasesScalar
To run the benchmarks, run the following command:
cd benchmark
dotnet run -c Release
To run just one benchmark, use a filter:
cd benchmark
dotnet run -c Release --filter "SimdUnicodeBenchmarks.RealDataBenchmark.AVX2DecodingRealDataUTF8(FileName: \"data/email/\")"
If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:
cd benchmark
sudo dotnet run -c Release
For UTF-16 benchmarks, you need to pass a flag as they are not enabled by default:
cd benchmark
dotnet run -c Release --anyCategories UTF16
cd src
dotnet build
We recommend you use dotnet format
. E.g.,
cd test
dotnet format
You can print the content of a vector register like so:
public static void ToString(Vector256<byte> v)
{
Span<byte> b = stackalloc byte[32];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}
public static void ToString(Vector128<byte> v)
{
Span<byte> b = stackalloc byte[16];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}
You can convert an integer to a hex string like so: $"0x{MyVariable:X}"
.
- Be careful:
Vector128.Shuffle
is not the same asSsse3.Shuffle
nor isVector256.Shuffle
the same asAvx2.Shuffle
. Prefer the latter. - Similarly
Vector128.Shuffle
is not the same asAdvSimd.Arm64.VectorTableLookup
, use the latter. stackalloc
arrays should probably not be used in class instances.- In C#,
struct
might be preferable toclass
instances as it makes it clear that the data is thread local. - You can ask for an asm dump:
DOTNET_JitDisasm=NEON64HTMLScan dotnet run -c Release
. See Viewing JIT disassembly and dumps.
- Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
- Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2 Instructions, ACM Transactions on the Web 12 (3), 2018.
- base64 encoding with simd-support
- gfoidl.Base64: original code that lead to the SIMD-based code in the runtime
- simdutf's base64 decode
- WHATWG forgiving-base64 decode
- https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/
- https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/coding-style/coding-conventions
Footnotes
-
The .NET runtime appear to have received some of its fast SIMD base64 functions from gfoidl.Base64 who built on earlier work by Klomp, Muła and others. See Faster Base64 Encoding and Decoding using AVX2 Instructions for a review. ↩