Skip to main content

Serilog with Application Insights: Correlating logs with other telemetry by using Operation Id

Despite the odds, Serilog and Application Insights are a fairly common combination. Let's dive in and find out if such partnership is well justified.

Introduction

Serilog is an extremely popular structured logging library, with powerful DSL and serialization features. The benefits of structured logging are well known an widely appreciated, so if you're not convinced yet, do spend some time to read up on the topic.

Application Insights is a very popular APM SaaS offering from Microsoft on Azure, especially in the .NET world.

Motivation

Now, you might wonder - why put those two together? After all, not all great tech plays together well.
And I completely agree with that.

In fact, when starting a greenfield application, Elastic Search seems to be a better choice for storing and searching structured logs data. One of the obvious benefits would be the data ingestion pipeline speed. But logs are only part of the story, however. When we look at the APM in general, Elastic APM, specifically for .NET is somewhat behind Application Insights.

Ok, but why not simply use ILogger from Microsoft.Extensions.Logging and Application Insights? ILogger supports structured logging, by allowing to pass arbitrary generic state, as can be seen from the Log method. Indeed this combination of tools seems to be a proper choice, and I would recommend it for any new project.

However, still, you might be in a situation where you have App Insights and Serilog, for example when

  • you are introducing APM to an application that is already using Serilog
  • your application is using Serilog and Elastic Search for logs, and you are adding APM using Application Insights

So, sometimes, you need to use Serilog and AI together, so let's find out how.

Hold on, why should I read a whole article about simply installing a sink?
Can't I just install a serilog sink for application insights and be "done" with it?

Well, as usual, it depends on your definition of "done".
If you simply need logs to appear in App Insights and Log Analytics, then yes - simply installing a sink will be enough.
However, if you want your logs to be correlated with the other telemetry (by means of Operation Id) then you do need to jump some hoops to set it up.

Prerequisites

Since we are trying to correlate telemetry, we need to understand how it's being done by App Insights.
Knowing the HTTP Correlation Protocol proposal and, how Hierarchical Request-Id work is really helpful.
Also, because Activity plays a key role in propagating operation context within the code, it would be beneficial to get accustomed with Activity User Guide.

Disclaimer

To limit the scope of this article, keep things modular, and allow incremental approach, some of the "under the hood" implementation details are moved to another post. 
All the code samples that you need to have to make things work are provided, but all the code samples from corefx libraries that explain why things work, are extracted to a separate article.
Link will follow.
I would like to you to actually finish reading this article before you grow old and retire!

Assumptions

  • You have enabled Application Insights for your application
  • Your application is a web service
  • You are running ASP.NET Core 2.0 or newer
  • Business operation that we would like to have both logs and telemetry correlated is triggered from an HTTP endpoint (So that Application Insights can hook up to  ASP.NET Core diagnostic events and create a Request Telemetry)

Implementation

Quite obviously, using an existing Serilog sink for Application Insights makes for a good choice.
Under the assumptions above we are guaranteed to be in the scope of a business transaction, within an Application Insights Request, and having non-null current Activity, with the Activity.Id matching the Operation Id (see Access to Operation Id)

Log Event Enricher

When logging something in the scope of a business transaction we need to make sure we are adding the information on the current Activity.Id (and maybe Activity.RootId) to the log event.
In Serilog lingo this technique is called enrichment, and is widely used. I bet you've used some of the standard enrichers, like thread or process - on your project before.
Conceptually, this is as simple as adding some extra information to the event data.

So, first we create the enricher itself called OperationIdEnricher
Then we create an extension method to register it, and call it WithOperationId to be aligned with existing conventions.
And lastly, add this enricher (WithOperationId) to the config file.
Please note, that we have to supply the assembly name, that holds the extension method WithOperationId inside the "Using" section

Telemetry Converter

You have probably spotted OperationTelemetryConverter in the config above. That is not by accident.
As described by the Serilog sink for Application Insights , all the custom properties are logged as custom dimensions. So event enricher is not enough - we will just have a custom dimension called "Operation Id" and it will have nothing to do with the Operation.Id that is used for distributed tracing and telemetry correlation.
This is where a custom converter comes in handy (and by the way docs above give a fairly decent example)
Please be aware that casting to scalar value and extracting real value is necessary. Otherwise, LogEventPropertyValue does not know the type of your value, and it cannot assume that it's a scalar, it might well be an collection - so the .ToString() method will wrap the value in double quotes. 
On UI the value will look identical, but the events will simply not be correlated.

Conclusion

As we can see, while the initial investigation and setup might take some time to research, the end solution is quite simple and generic.
Stay tuned for next posts that will dive in the internals to uncover how things work under the hood.
Also, the topic of distributed tracing propagation through brokered messaging is coming right up!

Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. This is as really good article, but I'm not sure if it's accurate anymore.

    If you look at the operation_parentId currently sent to my Application Insights for .NET Core 3.0, they look like this:
    |35c86dcf90b921419985976240b4540c.b7a4285dce1b2a46.
    and can be constructed like this:

    logEvent.AddPropertyIfAbsent(new LogEventProperty(OperationTelemetryConverter.OperationId, new ScalarValue(activity.TraceId)));
    if (activity.Parent != null)
    {
    // Like internal method in AI W3CUtilities.FormatTelemetryId
    var parentId = $"|{activity.TraceId}.{activity.Parent.SpanId}.";
    logEvent.AddPropertyIfAbsent(new LogEventProperty(OperationTelemetryConverter.OperationParentId, new ScalarValue(parentId)));
    }

    Or at least it seems that way.

    ReplyDelete

Post a Comment

Popular posts from this blog

EnumeratorCancellation: CancellationToken parameter from the generated IAsyncEnumerable.GetAsyncEnumerator will be unconsumed

Introduction If you're lucky enough to be using moderately new tech at work, or you just love trying out all the new goodies, you've probably had a chance to play around with IAsyncEnumerable<T> It does not take long until you come across CS8425 compiler warning, specifically if you're using yield and await keywords, and letting compiler do the heavy lifting of generating an implementation for you. CS8425 Async-iterator member has one or more parameters of type 'CancellationToken' but none of them is decorated with the 'EnumeratorCancellation' attribute, so the cancellation token parameter from the generated 'IAsyncEnumerable<>.GetAsyncEnumerator' will be unconsumed Don't know about you, but I didn't really understand what this warning actually means the first time I saw it. And the second time too. 😁 But hey - as application developers we've authored quite a lot of unreadable error messages ourselves, we have ...

Elastic Index Lifecycle Management

Elastic Stack is quite capable of running blazing fast queries against your data. However, the more data you have, the more time it will take to query it. Most of the times, however, Elastic Stack will not be the mechanism you use for long time data retention. Consequently you will have to get rid of the old data. Let's find out how. Brief overview of steps required Pick a naming convention Create index lifecycle management policy Create an index template that connects the alias and the ilm policy Create an index  Associate an  alias  with the index Let's dig into the details. Pick a naming convention Depending on the size of your company, your topology, team and org structure, you will probably have a Elastic Stack deployment that is shared between several teams and used by multiple services (distributed tracing benefits).  Namespaces is a useful notion that we can leverage while naming ELK objects, even though the notion itself is not dir...