Recently I came across the code which encodes several values (strings and GUID) concatenated into Base64 encoded string which is then used as a primary key in the document database.
In addition the resulted Base64 encoded string has to be “url safe” – so that the character ‘/’ is replaced with ‘_’, the character ‘+’ replaced with ‘-‘ and trailing characters ‘=’ are skipped.
As you can imagine, this code is on the hot pah. The computation should be done with a special care so that it is as fast as possible w/o triggering a memory pressure.
Let’s first start with defining what needs to be encoded. In order to simulate a real word scenario I use 1 randomly generated Guid (could be representing an object id) and 2 strings (could be representing settings ids).
Version 1: Base64UrlEncoder
The initial version was using method Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode from the package Microsoft.Identitymodel.Tokens. The implementation was as following:
Let’s count the number of allocations:
- Constructing string from GUID + several strings and then calling Encode() method
- Getting byte array from string using Utf8 encoding
- Converting to base 64 string
- Splitting in case of base 64 pad character ‘=’
- Replacing ‘/’
- Replacing ‘+’
As you can see there are 6 allocations while 5 of them are just temporary. Garbage collector is designed for frequent allocations of the short living objects but it still creates a memory pressure in longer period. How much? Let’s see memory diagnoser results from benchmark.
Version 2: WebEncoders
In order to avoid temporary allocations 3,4,5 from previous version we can use Microsoft.AspNetCore.WebUtilities.WebEncoders class from Microsoft.Aspnetcore.Webutilities Nuget package, precisely the method
public static int Base64UrlEncode (byte input, int offset, char output, int outputOffset, int count);
The method is able to encode input bytes into the output char array without any allocations. More info here.
For allocation #2 (getting byte array from string) and new extra allocation for the output char array we could use shared pool of bytes and characters, ArrayPool<T>. The resulted code looks like the following:
The benchmark shows the following results:
When comparing with previous version there is almost 30% speed improvement and 70% memory improvement! But let’s go further.
Version 3: Spans and System.Buffers.Text.Base64
Version #2 eliminated allocations 2,3,4,5 and we leveraged shared ArrayPool<T>. There is still an initial allocation, building a string. Let’s also eliminate it. Instead of building the string we could just write the raw bytes representing input artifacts (guid, strings) into the byte array (e.g. rent from ArrayPool or allocated on the stack) used for further processing.
In addition to this, the ArrayPool<T> is super beneficial for large arrays. Our arrays are small, size limited. We could use so called stackalloc-ed arrays instead, accessible in the form of Span<T> for it. Let’s look at the code which is leveraging System.Buffers Nuget package very much, especially Utf8Formatter.TryFormat and Base64.EncodeToUtf8InPlace.
The benchmark results are:
The end result is almost 20% faster than previous version and 30-40% of memory reduction.
It is still possible to improve it little bit.
Version 4: Binary Guid version
All the versions above format Guid into the 32 digits separated by 4 hyphens. If it is acceptable, it is possible to get just raw 12 Guid bytes instead of 36 bytes.
The benchmark results are:
The version is around 20% faster then previous version and up to 20% memory reduction.
That’s all. Source is here.