Skip to main content

Monitoring, Tracing and Instrumentation

Aplication Performance Monitoring in Elastic Stack

Elastic APM is a separate component that takes input from APM agents and puts data to Elasticsearch, guide


Elastic APM Agent for .NET is a library that basically runs the apm agent in-process, so there is no need to install it on the host machine.

You might argue that nobody does the "installs" nowadays, and everything is built on containers, so we just run on docker
docker pull docker.elastic.co/apm/apm-server:7.0.1or take a Helm chart 
I would agree, using APM in-process and out-of-process results in a different architecture, with it's pros and cons, which we will talk about in another post.

Apm-agent-dotnet writes data to APM Server, and not into Elasticsearch directly.
Apm-agent-dotnet uses two mechanisms to provide automatic instrumentation:
  1. ASP.NET Core middleware and in particular ApmMiddleware
  2. DiagnosticSource and in particular AspNetCoreDiagnosticListener, EfCoreDiagnosticListener, and HttpDiagnosticListener  (search https://github.com/elastic/apm-agent-dotnet for these classes)
The main purpose of apm-agent-dotnet here is to
Having that specific data will allow the APM section in Kibana to show a number of default visualizations, including distributing tracing chart
Now that looks awesome, but we do have to keep in mind, it's just a middleware, and diagnostic listeners for ASP.NET and EF Core, so it does not cover some of the frequently used real life scenarios:

  1. If you are not using HTTP for communication between services (any brokered messaging product like RabbitMQ, MassTransit, Azure Service Bus, that talks AMQP, or WCF, really anything except HTTP)
  2. You are running some resource-intense operations inside one business transaction, that consists of several distinct steps, and you would like visibility into which step separately
  3. You're not using EF Core, but perhaps Dapper or NHibernate
  4. And many more...

So, there's lots of reasons to be interested in  providing custom metrics / data.
Is that easy to do?
Is it even possible without wrapping everything in Transactions and usings, etc?
Can we "simply" hook up using DiagnosticSource and Activity classes, a standard mechanism in .NET?

I've raised a question about that with Elastic team, and a GitHub issue was created.
So, it might be added eventually, or maybe not, but what if you need it, like now?

We can aspire to what apm-agent-dotnet does internally, let's look at HttpDiagnosticListenerImplBase, specifically in OnNext and ProcessStartEvent

We can use a similar approach and use a public method StartSpan - this will automatically persist data in elastic apm server.

That will do the job, if we are ready to instrument our code with DiagnostcSource (firm yes!) and also, map DiatnostiSource and Activity manually to Transactions and Spans (vague eh..)

Hard truth is: no one will do the instrumentation for us.
But, maybe someone can do the mapping?

Elastic APM supports the Open Tracing bridge, (aka Open Tracing Project), so what can we learn from their github and from OpenTracing API Contributions github

The docs are somewhat brief, but when we dig into the code starting with
services.AddOpenTracing();
Looking into we can see it calls .AddCoreFx() and in turn it calls builder.AddDiagnosticSubscriber<GenericDiagnostics>();
The comments for GenericDiagnostics say
    /// <summary>
    /// A <see cref="DiagnosticListener"/> subscriber that logs ALL events to <see cref="ITracer.ActiveSpan"/>.
    /// </summary>
That's exactly what we need, let's looks how it works under the hood.
First off GenericDiagnostcs uses a GenericDiagnosticsSubscription, so let's look how it implements IObservable
Obviously, GenericEventProcessor is the next place we look:
So, Activity class is used here, which is a good thing, but notice that only Activity.Tags are carried over, and Activity.Baggage is ignored. The next sad news is that object untypedArg as you might have noticed is not used at all, and this is the context that was passed to diagnosticSource.StartActivity(activity, context)
(Internally, DiagnosticSource.StartActivity calls Write that is implemented in inherited class DiagnosticListener.Write - and write calls OnNext on all subscriptions - this is the standard "publishing" mechanism)
That does not look good.
Will it get fixed? Unlikely. At the time of writing, last commits to that repo were back in 2018.
But what if we were to accept these issues (and maybe contribute to the project on GitHub later) and use the package anyway.
How can we get this info to Elastic stack?

Looks like we would have to harness Events API of Elastic stack.
We could try to utilize NEST and Elasticsearch.Net, but do they cover Events API?
That seems like a long shot. I'd say it's not worth it.

Well then, how about Azure Diagnostics EventFlow ?
This one has an ElasticSearch sink and DiagnosticSource listener.
I could ramble on about what's inside, but I'll cut straight to the bone - ElasticSearch sink will write data to ElastiSearch, not to Elastic APM, so you won't see your data "out of the box" in Kibana APM section.

So, how do we go about this? Is there really no solution for getting telemetry for already instrumented code out there to some SaaS?
Stay tuned, next time we will look into Azure Application Insights.

Comments

Popular posts from this blog

EnumeratorCancellation: CancellationToken parameter from the generated IAsyncEnumerable.GetAsyncEnumerator will be unconsumed

Introduction If you're lucky enough to be using moderately new tech at work, or you just love trying out all the new goodies, you've probably had a chance to play around with IAsyncEnumerable<T> It does not take long until you come across CS8425 compiler warning, specifically if you're using yield and await keywords, and letting compiler do the heavy lifting of generating an implementation for you. CS8425 Async-iterator member has one or more parameters of type 'CancellationToken' but none of them is decorated with the 'EnumeratorCancellation' attribute, so the cancellation token parameter from the generated 'IAsyncEnumerable<>.GetAsyncEnumerator' will be unconsumed Don't know about you, but I didn't really understand what this warning actually means the first time I saw it. And the second time too. 😁 But hey - as application developers we've authored quite a lot of unreadable error messages ourselves, we have ...

Serilog with Application Insights: Correlating logs with other telemetry by using Operation Id

Despite the odds, Serilog and Application Insights are a fairly common combination. Let's dive in and find out if such partnership is well justified. Introduction Serilog  is an extremely popular structured logging library, with powerful DSL and serialization features. The  benefits of structured logging  are well known an widely appreciated, so if you're not convinced yet, do spend some time to read up on the topic. Application Insights  is a very popular APM SaaS offering from Microsoft on Azure, especially in the .NET world. Motivation Now, you might wonder -  why put those two together ? After all, not all great tech plays together well. And I completely agree with that. In fact, when starting a greenfield application, Elastic Search seems to be a better choice for storing and searching structured logs data. One of the obvious benefits would be the data ingestion pipeline speed. But logs are only part of the story, however. When we look at the AP...

Elastic Index Lifecycle Management

Elastic Stack is quite capable of running blazing fast queries against your data. However, the more data you have, the more time it will take to query it. Most of the times, however, Elastic Stack will not be the mechanism you use for long time data retention. Consequently you will have to get rid of the old data. Let's find out how. Brief overview of steps required Pick a naming convention Create index lifecycle management policy Create an index template that connects the alias and the ilm policy Create an index  Associate an  alias  with the index Let's dig into the details. Pick a naming convention Depending on the size of your company, your topology, team and org structure, you will probably have a Elastic Stack deployment that is shared between several teams and used by multiple services (distributed tracing benefits).  Namespaces is a useful notion that we can leverage while naming ELK objects, even though the notion itself is not dir...