High-performing base64 encoding

Recently I came across the code which encodes several values (strings and GUID) concatenated into Base64 encoded string which is then used as a primary key in the document database.

In addition the resulted Base64 encoded string has to be “url safe” – so that the character ‘/’ is replaced with ‘_’, the character ‘+’ replaced with ‘-‘ and trailing characters ‘=’ are skipped.

As you can imagine, this code is on the hot pah. The computation should be done with a special care so that it is as fast as possible w/o triggering a memory pressure.

The input

Let’s first start with defining what needs to be encoded. In order to simulate a real word scenario I use 1 randomly generated Guid (could be representing an object id) and 2 strings (could be representing settings ids).

 

Version 1: Base64UrlEncoder

The initial version was using method Microsoft.IdentityModel.Tokens.Base64UrlEncoder.Encode from the package Microsoft.Identitymodel.Tokens. The implementation was as following:

Base64UrlEncoder

Let’s count the number of allocations:

  1. Constructing string from GUID + several strings and then calling Encode() method
  2. Getting byte array from string using Utf8 encoding
  3. Converting to base 64 string
  4. Splitting in case of base 64 pad character ‘=’
  5. Replacing ‘/’
  6. Replacing ‘+’

 

As you can see there are 6 allocations while 5 of them are just temporary. Garbage collector is designed for frequent allocations of the short living objects but it still creates a memory pressure in longer period. How much? Let’s see memory diagnoser results from benchmark.

BenchBase64UrlEncoder

 

Version 2: WebEncoders

In order to avoid temporary allocations 3,4,5 from previous version we can use Microsoft.AspNetCore.WebUtilities.WebEncoders class from Microsoft.Aspnetcore.Webutilities Nuget package, precisely the method

public static int Base64UrlEncode (byte[] input, int offset, char[] output, int outputOffset, int count);

The method is able to encode input bytes into the output char array without any allocations. More info here.

For allocation #2 (getting byte array from string) and new extra allocation for the output char array we could use shared pool of bytes and characters, ArrayPool<T>. The resulted code looks like the following:

The benchmark shows the following results:

BenchWebEncoder

When comparing with previous version there is almost 30% speed improvement and 70% memory improvement! But let’s go further.

Version 3: Spans and System.Buffers.Text.Base64

Version #2 eliminated allocations 2,3,4,5 and we leveraged shared ArrayPool<T>. There is still an initial allocation, building a string. Let’s also eliminate it. Instead of building the string we could just write the raw bytes representing input artifacts (guid, strings) into the byte array (e.g. rent from ArrayPool or allocated on the stack) used for further processing.

In addition to this, the ArrayPool<T> is super beneficial for large arrays. Our arrays are small, size limited. We could use so called stackalloc-ed arrays instead, accessible in the form of Span<T> for it. Let’s look at the code which is leveraging System.Buffers Nuget package very much, especially Utf8Formatter.TryFormat and Base64.EncodeToUtf8InPlace.

The benchmark results are:

BenchBuffersBase64

The end result is almost 20% faster than previous version and 30-40% of memory reduction.

It is still possible to improve it little bit.

Version 4: Binary Guid version

All the versions above format Guid into the 32 digits separated by 4 hyphens. If it is acceptable, it is possible to get just raw 12 Guid bytes instead of 36 bytes.

The benchmark results are:

BenchBuffersBase64BinaryGuid

The version is around 20% faster then previous version and up to 20% memory reduction.

 

That’s all. Source is here.

 

ASP.NET Core logging scope filtering

ASP.NET Core logging infrastructure supports so called “scopes”. In short, a scope can group a set of logical operations. You can find more information about it here.

In order to use it, let’s have a sample configuration:

 

In order to keep track of scopes it is necessary to “turn on” the flag “includeScopes” per logger or globally for all loggers. After that the logger is able to track the scope over IExternalScopesProvider interface. The default implementation is LoggerExternalScopeProvider.

 

Side note:  It is also possible to pass the external scope provider from outside (e.g. when using a singleton instance and setting the scope from outside).

The problem

The problem we are facing with the scopes is that any code can populate the scope, e.g. ASP.NET Core code. This means any sensitive/private information can be logged in a plain text format. See the following example when using default logging setup.

secrets

This is not acceptable for us due to privacy reasons. We need to be able to control the format of the scopes or filter them out. I brainstormed the possible solution with my colleagues, Jan Guttek and Martin Obratil. We came up with the following proposal.

The proposal

All loggers have names, so called categories. The developer has a possibility to adjust the logging level. This means the messages on the lower levels are filter out. This proposal is using similar approach for controlling the scope.

So the messages of any level are logged but scopes are populated only from the category  starting with the name ScopeFilteringWebApplication.

The implementation

Thankfully to ASP.NET Core logging infrastructure it is possible to solve this issue easily.

All logging providers are kept inside the class LoggerFactory. This is a repository of all logger providers as well as a factory for creating logger instances. When creating the logger instance for the particular category name, all providers are enumerated and each of them creates its new instance of the logger. Then all logger instances are aggregated under a general Logger instance which is then returned as ILogger. This is the instance which is used in the application code.

The solution is to wrap the returned ILogger instance with a logger with scope filtering capabilities.

I was inspired by the log level settings implemented in ASP.NET Core itself. The code for parsing and using the following configuration:

I did the same for the scope filtering. The scope filtering settings can be handled by the class LoggerScopeSettings. This class is responsible for parsing the configuration and providing the scope setting for particular logger category.

The whole idea is to wrap each logger instance created by LoggerFactory so that scope filtering is applied. This can be done by wrapping LoggerFactory with ScopeFilteringLoggerFactory. When creating the logger instance,  the factory wraps the logger instances with ScopeFilteringLogger instances. This class is responsible for handling scopes based on the configuration (handled by LoggerScopeSettings).

That’s all. Let’s see it in the action.

Configuration

Code

Let’s have a sample controller code.

 

Let’s use the scope filtering over logging.

The whole scope filtering is done using the extension method UseScopeFiltering().

Results

no_secrets_in_scope

The scope filtering is reflected based on the configuration.

The code can be found here.

 

HTTP2 – HttpClient Connection Pooling in .NET Core

Steve Gordon published great post describing the history of HttpClient, its evolution (WinHttpHandler, SocketHttpHandler, etc.) and the connection pooling details under the HttpClient in .NET Core.

I was interested especially in the connection pooling with HTTP/2. .NET Core brings HTTP/2 support (together with TLS support). For more details see: https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-3-0#http2-support

Unfortunately, all the connection pooling tests and details mentioned in Steve’s blog are applying only to HTTP 1.1 and not to HTTP2.

I’ll cover HTTP2 in this blog post.

Show me the code!

I built the sample .NET Core application based on the code from Steve’s post. I changed it to display IP(v4,v6) addresses and mainly to use HTTP/2.

As you can see, I try to set MaxConnectionsPerServer to 20. The program also outputs a IPv4 as well as IPv6 address retrieved from DNS.

Starting, press a key to continue ...
2a00:1450:4014:80c::2004
216.58.201.100
Press a key to exit...

I do the same as Steve did to check what connections are open using netstat command.

netstat -ano | findstr 2a00:1450:4014:80c::2004

The result is:

TCP [2a00:...:e4e1]:5472 [2a00:1450:4014:80c::2004]:443 ESTABLISHED 19744

As you can see, in case of HTTP2, there is only 1 connection created. The settings I tried to apply are only for HTTP1.1. Sending the messages (streams) over 1 connection has its own limitations. RFC defines at least 100 streams over 1 connection. By default, in .NET Core implementation the number of concurrent streams transferred over 1 connection is int.Max (see the code Http2Connection.cs#L118) unless adjusted by the server using settings frame (Http2Connection.cs#L470).

We run our services in high volume scenarios. We need the connection pooling together with Http/2 support, adjusting maximum number of streams, etc. If you know about any implementation covering this, please, let me know.

Let’s deep dive little bit

Let’s deep dive into the code proving the theory about 1 connection. The class responsible for creating connections is Http2Connection.cs

The observability is built-inside the code using TraceSource. Let’s look under-the-hood what’s going on.

Steps to do:

  1. Run the netcore application
  2. Run dotnet-trace ps to list the processes and its IDs

     21104 Http2NetCoreApp .....\bin\Debug\netcoreapp3.1\Http2NetCoreApp.exe
    
  3. Run dotnet-trace collect –process-id 21104 –providers Microsoft-System-Net-Http
  4. Move on with the application (hit Enter in netcoreapp)
  5. Switch to the tracing window, the trace recording is in progress.
  6. Once the netcore application is done, close it (hit Enter)
  7. Recording the trace finished. The whole trace is stored into a file with nettrace suffix.
    Provider Name                           Keywords            Level               Enabled By
    Microsoft-System-Net-Http               0xFFFFFFFFFFFFFFFF  Verbose(5)          --providers
    
    Process        : .....\bin\Debug\netcoreapp3.1\Http2NetCoreApp.exe
    Output File    : C:\temp\http2netcoreapp\trace.nettrace
    
    [00:00:00:22]   Recording trace 2.4378   (MB)
    Press  or <Ctrl+C> to exit...
    
    Trace completed.
    
    

     

    Let’s see what’s inside. We can inspect it with perfview!

  8. Download, run perfview and open the nettrace file.
  9. Navigate into “Events”.
  10. Double-click the event Microsoft-System-Net-Http/HandlerMessage to see the events with this name. Pay attention to column called Rest.perfview

    This column contains all custom event details. After inspecting it you can find out that there is only 1 event with message “Attempting new HTTP2 connection.”

 

That’s all for now.

.NET Core application hosted from PaasV1

What?! Why?! These days!? You are probably wondering …

There are many service owners running their services on Azure PaasV1 – aka Cloud services. There are several reasons why it is needed, e.g. compliance requirements.

If you are in the similar space and want to leverage the power of .NET Core runtime read on.

It’s not possible write worker roles on .NET Core. By default, PaasV1 hosts the work role inside WaWorkerHost process which is running on full .NET runtime. If we want to leverage .NET Core, we need to use another path.

Let’s explore the path

The trick is using ProgramEntryPoint from Azure ServiceDefinition file. You can read more about the whole schema here. It’s enough just to add the desired application into the package and then execute it. Azure worker hosting is able to tracks the process.

ProgramEntryPoint

.NET Core publish command tool is able to export the whole application into a folder with all files needed to run it. What’s more! .NET Core 3.0 preview 5 comes with a possibility to publish just 1 file containing whole application. That’s great, isn’t it?

Ok, we have .NET Core application and it’s running within PaasV1. We need to integrate the application into Azure PaasV1 hosting runtime, in other works leveraging Microsoft.WindowsAzure.ServiceRuntime. In .NET Core, as Nuget package! Not possible.

But there is a way.

There is a package called Unofficial.Microsoft.WindowsAzure.ServiceRuntime.Msshrtmi  filling this need. It’s basically a package containing managed wrappers around native modules and full of P/Invokes. AFAIK, it was not possible to use such so called mixed (C++/CLI) assemblies from .NET Core directly. It looks like the situation has changed with .NET Core 3.0 preview 6 on Windows OS.

That’s all.

Show me the code!

.NET Core application

In the sample application we just initialize Role manager interop and we are able to read the configuration from cscfg, register the callbacks for role status checks, shutdown, etc.

.NET Core publish

This command publish the whole .NET Core application into one executable file.

 

Azure service definition

See ProgramEntryPoint tag starting the application.

 

Result

Lines with the prefix WaWorkerHost.exe Information: 0 : NetCoreConsoleApp comes from .NET Core application. We are able to read the configuration settings, response to role status checks, reacts to shutdown callbacks and more.

Complete sample

Whole sample is downloadable from https://github.com/kadukf/blog.msdn/tree/master/.NETCore/AzureCloudServiceNetCore

 

.NET Core 3.0 rocks! Happy coding 😉

Isolated ASP.NET Core MVC testing

The tests should be isolated so that we can run them in any order and in parallel way.

Introduction

Recently I was working on a feature which was targeting with ASP.NET Core MVC. I was able to test it on class-level. These tests were pure unit tests and all was fine.

In order to validate that it works correctly, I wanted to be sure that it works as expected when it is integrated into ASP.NET Core MVC, especially that it works in conjunction with Controllers. So I decided to expand the scope of my test. I call this integration unit test.
I found ASP.NET Core to be quite powerful for writing isolated tests spanning MVC handlers, formatters, etc. together with Controllers.

 

Coding

Let’s have a custom formatter which we need to test.

Let’s see how to use it in the startup class:

Let’s write an integration test for it.

First we need a controller with will be used for testing. The aim of the controller is to return the input data.

Test

Let’s write the test finally! The goal is to test that the POST request w/o specifying content-type is processed the same way as it would be wen using content-type: application/json.

 

All works fine until there are several such tests with different controllers and routes. Just a side note, when using ASP.NET Core MVC, there is Microsoft.AspNetCore.Mvc.Controllers.ControllerFeatureProvider class responsible for discovering the controllers to be used.

ASP.NET Core is greatly customizable and it covers also controllers discovery process.

 

Test Isolation

In order to have highly isolated integration tests we need to discover specific controllers for our tests. This is possible by configuring another MVC component Microsoft.AspNetCore.Mvc.ApplicationParts.ApplicationPartManager. Let’s see:

I created it as an extension methods so I can use them from the tests.

Final version

Now, the integration test is fully isolated and contains only controllers needed for the test.

 

Summary

As usual, the whole sample is located at https://github.com/kadukf/blog.msdn/tree/master/.NET/ControllerFeatureProvider

Happy coding!

CosmosDB change feed support for manual checkpoint

It was quite long time I wrote last post about the change feed processor. Let’s continue the series. This time, let’s look into a manual checkpoint feature.

Basics

Let’s refresh the knowledge. CosmosDB change feed processor SDK is a library for dispatching the document-related changes (inserts, updates) as a stream of documents. The documents can be split into multiple partitions (determined by partition key). The change feed processor is able to read the changes from all partitions and the reading is executed in the batches. In order to move forward with reading the documents, the change feed need to track the last processed batch. The tracking process is in the form of storing the continuation token from the last processed batch – that’s done using “checkpoint” method. The checkpoints are kept in the lease documents.

Automatic checkpoint

By default, the change feed processor checkpoints the processed batches automatically. Updating the lease document takes some time and costs some request units (RUs). In order to fine-tune it, it’s possible to set the frequency of this process.

Let’s look at the change feed processing logic for one partition in the pseudo steps:

  1. read the change feed from the last known “continuation token”
  2. dispatch the documents for processing by calling IObserver.ProcessChangesAsync
  3. wait for the observer to process whole batch
  4. if it’s time to checkpoint (based on the configured checkpoint frequency), update the lease document with the continuation token from the last processed batch of documents. That’s so called “checkpoint” step.
  5. repeat from step 1

The whole process is shown on the following picture:

In most cases, the above algorithm works without issues. But do you see a possible problem?

Stuck partition problem

It’s the step 3. The automatic checkpoint algorithm waits until whole batch is processed and then checkpoints. While this waiting no other documents are read and hence processed. It could happen that, e.g. in the batch with 10 documents, there is one document whose processing takes several minutes. In such case, whole change feed partition processing is “paused” and you can’t do anything about it. Such situation can be easily detected when change feed estimator is used (for more information see my previous post). See the following graph showing the remaining work on such stuck partition:

I simulated that the processing of some document takes several hours. This stopped reading from the change feed (the number of polling reads per second dropped to zero). The remaining work started to grow. If there would be a defined SLA agreement stating that the events are scheduled for handling within 2 seconds from the time they were inserted into the collection, then the system would not able to meet it.

Solution – manual checkpoint

The solution is to use the manual checkpoint. It’s possible to turn off that the automatic checkpoint process. See the following code:

By doing this, the result lease document is never automatically updated. See the following:

That’s not good. After the process restart, the change feed would start back from the beginning. We need to checkpoint manually. That’s possible using CheckpointAsync method exposed on the interface IChangeFeedObserverContext. But when? We can not wait for whole batch to be processed and then call CheckpointAsync. The simple solution could be like this:

  1. read the change feed from the last known “continuation token”
  2. dispatch the documents for processing by calling IObserver.ProcessChangesAsync
  3. schedule processing all documents, this will create as many Tasks as we have documents,
  4. register scheduled Tasks and IChangeFeedObserverContext instance into manual periodic checkpointer
  5. repeat from step 1

See the sample code of the observer:

The code above is for demonstrating purposes. It integrates the periodic partition checkpointer into the observer. The checkpointer is started when the observer is opened and stops when the observer is closed. In between, when the batches are dispatched for processing, the processing tasks are scheduled and the whole batch is enqueued into the checkpointer. The periodically checks for the last processed batch in the queue. For the running example check the sample here.

After running the sample, the result is:

And the console would look like:

Prons and cons of the solutions

The above solution solves the issue with stuck partition, that’s great (advantage). The change feed processor is reading the changes while the previously read documents are still being processed. But if the reading documents and scheduling the work is much faster than its actual processing then it’s possible that the instance will be overloaded (disadvantage). It’s necessary to limit the number of concurrent document processings, e.g. using Semaphore.

That’s all for now. In the future posts I’ll describe the pitfalls of the change feed processing in general.

CosmosDB change feed processor 2.1 released (monitoring enhancements)

There is new version of CosmosDB Change feed processor library released, version 2.1 and two main enhancements are:

  1. Upgrade to Microsoft.Azure.DocumentDB 2.0 nuget package
  2. Extensions to the monitoring

 

Microsoft.Azure.DocumentDB upgrade

The change feed library was built against latest SDK 2.0. Short! New SDK version brings several improvements and one of the most important change was the new multiplexing connection model. More about it in future blog post.

 

Extensions to the monitoring

When you take change feed processing seriously, I mean, production ready, you need to be sure that the processing of the feed is working as expected. You need to be sure that “the feed is flowing” which means the feed is in progress and as soon as there is a document change (insert, replace), the change feed processor receives it.

 

I wrote about the monitoring in my previous post. The whole monitoring is built on so called “remaining work estimator” which estimates the number of the documents till the end of the feed. There was an improvement introduced into the library.

It’s possible to get the estimation per partition now! Why does it matter? Because it give you better visibility into the system. You are able to see what partition is left behind and how far.

Let’s see how to create the remaining work estimator instance:

And let’s see how to use it:

And that’s the result:

 

 

Real life scenario

I’m working on the event sourcing system powered by CosmosDB (I’ll be writing about it in the future posts) and we are heavily dependent on the change feed. We need to be sure the system works 24/7. We need to be prepared for failures also when using change feed. In this case when a partition processing is stuck.  So, we are monitoring change feed on several levels.

 

 

Who process what

We monitor what process consumes what partition. In other words, we are able to say what process is consuming what partitions. In practice, we record the following metric: data center, role, instance id, database account, partitions.

 

What’s the estimated work left to process

We have a runner which runs periodically and reports what is the estimated work left per account/partition. That’s the input for the graphs and alerting. It is built on top of the estimator shown in this post. If any estimated work hits a limit, it means we have a problem with “stuck” partition.

 

See the graph of the simulation from our test environment:

 

The red dotted line is an alert level. Once we are alerted, we are able to see what partitions are stuck. Because we record who process what, we are able to find out the instance which was processing the partition last time and diagnose the issue.

 

That’s all for now, happy monitoring!

 

 

Previous posts:

  1. New Azure CosmosDB change feed processor released!
  2. Logging in CosmosDB change feed processor
  3. CosmosDB change feed monitoring
  4. CosmosDB change feed processor – leases and work balancing strategy

CosmosDB change feed processor – leases and work balancing strategy

The work distribution is the one of the core change feed processor features. The purpose of the change feed processor is to receive and dispatch all the documents from all partitions of the feed collection for further processing. In case of the high load, it’s necessary to scale-out the number of processors, so that the documents are processed within desired time SLA.

As you can see, the change feed processing system has to deal with it. The system has to distribute the work between instances somehow.  That’s the aim of the so called “partitions balancing strategy”. The change feed processor library contains a default implementation of this strategy “EqualPartitionsBalancingStrategy”.

 

 

EqualPartitionsBalancingStrategy

This strategy tries to distribute the work evenly between all active hosts. The algorithm is based on the lease documents.

Just a recap. (Taken from my first post in this serie)There are two collections in the game:

  • Data collection – it’s known also as a feed collection. It’s a collection of the data over which change feed is processed. It could be created as a partitioned collection so that data are split into separated physical locations.
  • Lease collection – it’s a collection where the change feed keeps the pointer of last processed document per partition.

 

Let’s see an example of such lease document:

{
  "id": "account.documents.azure.com_EH9XAA==_EH9XAP7EOwA=..2",
  "PartitionId": "2",
  "Owner": "unique_owner_id",
  "ContinuationToken": ""2468759"",
  "properties": {},
  "timestamp": "2018-08-01T20:43:54.9476294Z"
}


There will be as many lease documents created as there are partitions in the feed collection.

Id

Each lease document has “id” which is in the following form:

{prefix}{feed database account}_{feed database resource id}_{collection resource id}..{partition range ID}

The prefix is used to differentiate the change feed processor groups if they use same lease collection. The rest of the parameters in the “id” is self-explanatory.

 

Owner

That’s a string which is uniquely identifying the processor instance. Make sure this string is unique per each processor instance. Otherwise the same documents will be processed as many time as you have the processor instances. We used to name it the following way: {deployment id}_{processor name}_{instance id}

 

Continuation token

That’s the pointer of the last successfully processed document change on the feed.

 

Timestamp

That’s the epoch timestamp of the last change feed checkpoint time or heartbeat time. This timestamp is used by the work balancing strategy to determine whether the lease is expired or not.

If the lease is expired, then it’s available to be taken over by any change feed processor instance.

If the lease is not expired, it means that the instance is active and the feed is processed.

The aim of the strategy is to distribute the work evenly between all active instances.

 

Properties

That’s the free form dictionary of string to string. The properties are the form how the strategy executions communicate between each other. It give you a possibility to create more advanced strategies, e.g. when shifting the load between “A/B”  deployments.

The properties can be manipulated by the strategy execution and then they will be persisted.

 

Configuration – Lease expiration interval

The strategy is configured by the ChangeFeedProcessorOptions.LeaseExpirationInterval. It’s the interval for which the lease is taken by the particular owner. It’s an interval determining how often the strategy is executed.

So, if the lease is not renewed or checkpoint-ed within this interval, it will be identified as expired and ownership of the partition will move to another change feed processor instance.

If the processor is stopped, the lease is freed-up. It means that the partition won’t be processed until some processor instance takes it (which depends on lease expiration interval).

There are special situations when you need to enhance the default work balancing strategy. Imagine you stop the instance. The partitions processed by the stopped instance are not processed until some other instance takes it over. This “gap” depends on the expiration interval. So, if you have strongly time-limited SLA, you will be required to write the custom strategy and leveraging the properties. Ping me if you need more information about it.

Bear in mind, this is really rare condition and almost in all cases you will be more than happy with default strategy. We run quite heavy system using change feed processor with default strategy without issues.

 

Conclusion, consequences

In case the change feed processor instance crashed and then starts within lease expiration time, it will take back all previously owned leases immediately after start.

In case the instance is gracefully stopped, the owner is reset which means the lease is free to be owned by other change feed processor instance.

In case the number of change feed processor instances is scaled-out, the new ideal work balance is calculated based on the new active instances. The new instance(s) will try to take the ownership of partitions by 1 per each execution cycle configured by lease expiration interval.

Keep in mind that the leases are stored in the lease collection. The collection has its own throughput (== money). In case you set low expiration interval, you can end up with the throtlling issues because leases are scanned once per lease expiration interval by each instance. This can cause that the leases are not taken for longer time and their processing will be delayed.

 

Previous posts:

  1. New Azure CosmosDB change feed processor released!
  2. Logging in CosmosDB change feed processor
  3. CosmosDB change feed monitoring