03-03-2020, 05:45 AM
ASP.NET Core Apps Observability
<div style="margin: 5px 5% 10px 5%;"><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="150" height="150" title="" alt="" /></div><div><div class="row justify-content-center">
<div class="col-md-4">
<div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="58" height="58" alt="Francisco Beltrao" class="avatar avatar-58 wp-user-avatar wp-user-avatar-58 alignnone photo"></p>
<p>Francisco</p>
</div>
</div>
</div>
<div class="entry-meta">
<p>February 26th, 2020</p>
</p></div>
<p><!-- .entry-meta --> </p>
<p><em>Thank you <a href="https://github.com/SergeyKanzhelev/" rel="noopener noreferrer" target="_blank">Sergey Kanzhelev</a> for the support and review of this ASP.NET Core Apps Observability article.</em></p>
<p>Modern software development practices value quick and continuous updates, following processes that minimize the impact of software failures. As important as identifying bugs early, finding out if changes are improving business value are equally important. These practices can only work when a monitoring solution is in place. This article explores options for adding observability to .NET Core apps. They have been collected based on interactions with customers using .NET Core in different environments. We will be looking into OpenTelemetry and Application Insights SDKs to add observability to a sample distributed application.</p>
<p>Identifying software error and business impact require a monitoring solution with the ability to observe and report how the system and users behave. The collected data must provide the required information to analyze and identify a bad update. Answering questions such as:</p>
<ul>
<li>Are we observing more errors than before? </li>
<li>Were there new error types?</li>
<li>Did the request duration unexpectedly increase compared to previous versions?</li>
<li>Has the throughput (req/sec) decreased?</li>
<li>Has the CPU and/or Memory usage increased?</li>
<li>Were there changes in our KPIs? </li>
<li>Is it selling less than before? </li>
<li>Did our visitor count decrease?</li>
</ul>
<p>The impact of a bad system update can be minimized by combining the monitoring information with progressive deployment strategies. Such as canary, mirroring, rings, blue/green, etc.</p>
<h2>Observability is Built on 3 Pillars:</h2>
<ul>
<li>
<p><strong>Logging</strong>: collects information about events happening in the system. Helping the team analyze unexpected application behavior. Searching through the logs of suspect services can provide the necessary hint to identify the problem root cause. Such as: service throwing out of memory exceptions and app configuration not reflecting expected values. As well as calls to external service with incorrect address, calls to external service returns with unexpected results, and incoming requests with unexpected input.</p>
</li>
<li>
<p><strong>Tracing</strong>: collects information to create an end-to-end view of how transactions are executed in a distributed system. A trace is like a stack trace spanning multiple applications. Once a problem has been recognized, traces are a good starting point in identifying the source in distributed operations. Like calls from service A to B are taking longer than normal, service payment calls are failing, etc.</p>
</li>
<li>
<p><strong>Metrics</strong>: provide a real-time indication of how the system is running. It can be leveraged to build alerts, allowing proactive reactance to unexpected values. As opposed to logs and traces, the amount of data collected using metrics remains constant as the system load increases. Application problems are often first detected through abnormal metric values. Such as CPU usage being higher than before, payment error count spiking, and queued item count keeps growing.</p>
</li>
</ul>
<h2>Adding Observability to a .NET Core Application</h2>
<p>There are many ways to add observability aspects to an application. <a href="https://dapr.io/" rel="noopener noreferrer" target="_blank">Dapr</a> for example, is a runtime to build distributed applications, <a href="https://github.com/dapr/docs/blob/master/concepts/distributed-tracing/README.md" rel="noopener noreferrer" target="_blank">transparently adding distribute tracing</a>. Another example is through the usage of service meshes in Kubernetes (<a href="https://istio.io/docs/tasks/observability/distributed-tracing/overview/" rel="noopener noreferrer" target="_blank">Istio</a>, <a href="https://linkerd.io/2/features/distributed-tracing/" rel="noopener noreferrer" target="_blank">Linkerd</a>).</p>
<p>Built-in and transparent tracing are typically covering basic scenarios and answering generic questions, such as observed request duration or CPU trends. Other questions, such as custom KPIs or user behavior, require adding instrumentation to your code.</p>
<p>To illustrate how observability can be added to a .NET Core application we will be using the following asynchronous distributed transaction example:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.png" alt="Sample Observability Application Overview" width="484" height="239" class="alignnone size-full wp-image-23256"></a></p>
<ol>
<li><em>Main Api</em> receives a request from a “source”.</li>
<li><em>Main Api</em> enriches the request body with current day, obtained from <em>Time Api</em>.</li>
<li><em>Main Api</em> enqueues enriched request to a RabbitMQ queue for asynchronous processing.</li>
<li><em>RabbitMQProcessor</em> dequeues request.</li>
<li><em>RabbitMQProcessor</em>, as part of the request processing, calls <em>Time Api</em> to get dbtime.</li>
<li><em>Time Api</em> calls SQL Server to get current time.</li>
</ol>
<p>To run the sample application locally (including dependencies and observability tools), follow this <a href="https://github.com/Azure-Samples/application-insights-aspnet-sample-opentelemetry#setup---quickstart-with-docker-compose" rel="noopener noreferrer" target="_blank">guide</a>. The article will walkthrough adding each observability pillar (logging, tracing, metrics) into the sample asynchronous distributed transaction.</p>
<p>Note: for information on bootstrapping OpenTelemetry or Application Insights SDK please refer to the documentation: <a href="https://github.com/open-telemetry/opentelemetry-dotnet" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> and <a href="https://docs.microsoft.com/azure/azure-monitor/app/asp-net-core" rel="noopener noreferrer" target="_blank">Application Insights</a>.</p>
<p>Logging was redesigned in .NET Core, bringing an integrated and extensible API. <a href="https://docs.microsoft.com/aspnet/core/fundamentals/logging/#built-in-logging-providers" rel="noopener noreferrer" target="_blank">Built-in</a> and <a href="https://docs.microsoft.com/aspnet/core/fundamentals/logging/#third-party-logging-providers" rel="noopener noreferrer" target="_blank">external logging providers</a> allow the collection of logs in multiple formats and targets. When deciding a logging platform, consider the following features:</p>
<ul>
<li>Centralized: allowing the collection/storage of all system logs in a central location.</li>
<li>Structured logging: allows you to add searchable metadata to logs. </li>
<li>Searchable: allows searching by multiple criteria (app version, date, category, level, text, metadata, etc.) </li>
<li>Configurable: allows changing verbosity without code changes (based on log level and/or scope). </li>
<li>Integrated: integrated into tracing, facilitating analysis of traces and logs in the same tool. </li>
</ul>
<p>The sample application uses the ILogger interface for logging. The snippet below demonstrates an example of structure logging. Which captures events using <a href="https://messagetemplates.org/" rel="noopener noreferrer" target="_blank">message template</a> and generates information that is human and machine readable.</p>
<pre class="prettyprint">var result = await repository.GetTimeFromSqlAsync();
logger.LogInformation("{operation} result is {result}", nameof(repository.GetTimeFromSqlAsync), result);</pre>
<p>When using a logging backend that understands structured logs, such as Application Insights, search instances of the example log items where “operation” is equal to “GetTimeForSqlAsync”:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.png" alt="Observability Application Insights structured log search" width="640" height="321" class="alignnone size-large wp-image-23258"></a> </p>
<p>Tracing collects required information to enable the observation of a transaction as it “walks” through the system. It must be implemented in every service taking part of the transaction to be effective.</p>
<p>.NET Core defines a common way in which traces can be defined through the System.Diagnostics.Activity class. Through the usage of this class, dependency implementations (i.e. HTTP, SQL, Azure, EF Core, StackExchange.Redis, etc.) can create traces in a neutral way, independent of the monitoring tool used.</p>
<p>It is important to notice that those activities will not be available automatically in a monitoring system. Publishing them is responsibility of the monitoring SDK used. Typically, SDKs have built-in collectors to common activities, transferring them to the destination platform automatically.</p>
<p>In the last quarter of 2019 <a href="https://opentelemetry.io/" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> was announced, promising to standardize telemetry instrumentation and collection across languages and tools. Before OpenTelemetry (or its predecessors OpenCensus and OpenTracing), adding observability would often mean adding proprietary SDKs (in)directly to the code base.</p>
<p>The OpenTelemetry .NET SDK is currently in alpha. The Azure Monitor Application Insights team is investing in OpenTelemetry as a next step of Azure Monitor SDKs evolution.</p>
<h3>Quick Intro on Tracing with OpenTelemetry</h3>
<p>In a nutshell, <a href="https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> collects traces using spans. A span delimits an operation (HTTP request processing, dependency call). It contains start and end time (among other properties). It has a unique identifier (SpanId, 16 characters, 8 bytes) and a trace identifier (TraceId, 32 characters, 16 bytes). The trace identifier is used to correlate all spans for a given transaction. A span can contain children spans (as calls in a stack trace). If you are familiar with Azure Application Insights, the following table might be helpful to understand OpenTelemetry terms:</p>
<table>
<thead>
<td> <strong>Application Insights</strong> </td>
<td> <strong>OpenTelemetry</strong> </td>
</thead>
<tbody>
<tr>
<td> Request, PageView </td>
<td> Span with span.kind = server </td>
</tr>
<tr>
<td> Dependency </td>
<td> Span with span.kind = client </td>
</tr>
<tr>
<td> Id of Request and Dependency </td>
<td> SpanId </td>
</tr>
<tr>
<td> Operation_Id </td>
<td> TraceId </td>
</tr>
<tr>
<td> Operation_ParentId </td>
<td> ParentId </td>
</tr>
</tbody>
</table>
<h2>Adding Tracing to a .NET Core Application</h2>
<p>As mentioned previously, an SDK is needed in order to collect and publish distributed tracing in a .NET Core application. Application Insights SDK sends traces to its centralized database while OpenTelemetry supports multiple exporters (including Application Insights). When configured to use OpenTelemetry, the sample application sends traces to a <a href="https://www.jaegertracing.io/" rel="noopener noreferrer" target="_blank">Jaeger</a> instance.</p>
<p>In the asynchronous distributed transaction scenario, track the following operations:</p>
<h4>HTTP Requests between microservices</h4>
<p>HTTP correlation propagation is part of both SDKs. With the only requirement of setting activity id format to <a href="https://www.w3.org/TR/trace-context-1/" rel="noopener noreferrer" target="_blank">W3C</a> at application start:</p>
<pre class="prettyprint">public static void Main(string[] args)
{ Activity.DefaultIdFormat = ActivityIdFormat.W3C; Activity.ForceDefaultIdFormat = true; // rest is omitted
}</pre>
<h4>Dependency calls (SQL, RabbitMQ)</h4>
<p>Unlike Application Insights SDK, OpenTelemetry (in early alpha) does not yet have support for SQL Server trace collection. A simple way to track dependencies with OpenTelemetry is to wrap the call like the following example:</p>
<pre class="prettyprint">var span = this.tracer.StartSpan("My external dependency", SpanKind.Client);
try
{ return CallToMyDependency();
}
catch (Exception ex)
{ span.Status = Status.Internal.WithDescription(ex.ToString()); throw;
}
finally
{ span.End();
}
</pre>
<h4>Asynchronous Processing / Queued Items</h4>
<p>There is no built-in trace correlation between publishing and processing a RabbitMQ message. Custom code is required, creating the publishing activity (optional) and referencing the parent trace during the item dequeuing.</p>
<p>We covered previously creating traces by wrapping the dependency call. This option allows expressing additional semantic information such as links between spans for batching and other fan-in patterns. Another option is to use System.Diagnostics.Activity, which is a SDK independent way to create traces. This option has limited set of features, however, is built-in into .NET.</p>
<p>These two options work well with each other and .NET team is <a href="https://github.com/dotnet/corefx/issues/42307" rel="noopener noreferrer" target="_blank">working on making .NET Activity and OpenTelemetry spans integration better</a>.</p>
<h3>Creating an Operation Trace</h3>
<p>The snippet below demonstrates how the publish operation trace can be created. It adds the trace information to the enqueued message header, which will later be used to link both operations.</p>
<pre class="prettyprint">Activity activity = null;
if (diagnosticSource.IsEnabled("Sample.RabbitMQ"))
{ // Generates the Publishing to RabbitMQ trace // Only generated if there is an actual listener activity = new Activity("Publish to RabbitMQ"); diagnosticSource.StartActivity(activity, null);
} // Add current activity identifier to the RabbitMQ message
basicProperties.Headers.Add("traceparent", Activity.Current.Id); channel.BasicPublish(...) if (activity != null)
{ // Signal the end of the activity diagnosticSource.StopActivity(activity, null);
}</pre>
<p>A collector, which subscribes to target activities, is required to publish the trace to a backend. Implementing a collector is not a straightforward task and is intended to be used by SDK implementors. The snippet below is taken from the sample application, where a simplified and not production-ready, RabbitMQ collector for OpenTelemetry was implemented:</p>
<pre class="prettyprint">public class RabbitMQListener : ListenerHandler
{ public override void OnStartActivity(Activity activity, object payload) { var span = this.Tracer.StartSpanFromActivity(activity.OperationName, activity); foreach (var kv in activity.Tags) span.SetAttribute(kv.Key, kv.Value); } public override void OnStopActivity(Activity activity, object payload) { var span = this.Tracer.CurrentSpan; span.End(); if (span is IDisposable disposableSpan) { disposableSpan.Dispose(); } }
} var subscriber = new DiagnosticSourceSubscriber(new RabbitMQListener("Sample.RabbitMQ", tracer), DefaultFilter);
subscriber.Subscribe();
</pre>
<p>For more information on how to build collectors, please refer to OpenTelemetry/Application Insights built-in collectors as well as this <a href="https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md" rel="noopener noreferrer" target="_blank">user guide</a>.</p>
<h2>Activity</h2>
<p>As mentioned, HTTP requests in ASP.NET have built-in activity correlation injected by the framework. That is not the case for the RabbitMQ consumer. In order to continue the distributed transaction, we must create the span referencing the parent trace. This was injected into the message by the publisher. The snippet below uses an extension method to build the activity:</p>
<pre class="prettyprint">public static Activity ExtractActivity(this BasicDeliverEventArgs source, string name)
{ var activity = new Activity(name ?? Constants.RabbitMQMessageActivityName); if (source.BasicProperties.Headers.TryGetValue("traceparent", out var rawTraceParent) && rawTraceParent is byte[] binRawTraceParent) { activity.SetParentId(Encoding.UTF8.GetString(binRawTraceParent)); } return activity;
}</pre>
<p>The activity is then used to create the concrete trace. In OpenTelemetry the code looks like this:</p>
<pre class="prettyprint">// Note: OpenTelemetry requires the activity to be started
activity.Start();
tracer.StartActiveSpanFromActivity(activity.OperationName, activity, SpanKind.Consumer, out span);
</pre>
<p>The snippet below creates the telemetry using Application Insights SDK:</p>
<pre class="prettyprint">// Note: Application Insights will start the activity
var operation = telemetryClient.StartOperation<Dependencytelemetry>(activity);
</pre>
<p>The usage of activities gives flexibility in terms of SDK used, as it is a neutral way to create traces. Once instrumented the distributed end-to-end transaction in Jaeger looks like this:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-2.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-2.png" alt="Distributed Trace in Jaeger" width="640" height="222" class="alignnone size-large wp-image-23260"></a></p>
<p>The same transaction in Application Insights looks like this:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.jpg"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.jpg" alt="Distributed Trace in Application Insights" width="640" height="183" class="alignnone size-large wp-image-23261"></a></p>
<p>When using single monitoring solution for traces and logs, such as Application Insights, the logs become part of the end-to-end transaction:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-3.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-3.png" alt="Observability Application Insights: traces and logs" width="640" height="420" class="alignnone size-large wp-image-23263"></a></p>
<h2>Metrics</h2>
<p>There are common metrics applicable to most applications, like CPU usage, allocated memory, and request time. As well as business specific like visitors, page views, sold items, and sent items. Exposing business metrics in a .NET Core application typically requires using an SDK.</p>
<p>Collection metrics in .NET Core happens through 3rd-party SDKs which aggregate values locally, before sending to a backend. Most libraries have built-in collection for common application metrics. However, business specific metrics need to be built in the application logic, since they are created based on events that occur in the application domain.</p>
<p>In the sample application we are using metric counters for: enqueued items, successfully processed items and unsuccessfully processed items. The implementation in both SDKs is similar, requiring setting up a metric, dimensions and finally, tracking the counter values.</p>
<p>OpenTelemetry supports multiple exporters and we will be using <a href="https://prometheus.io/" rel="noopener noreferrer" target="_blank">Prometheus</a> exporter. Prometheus combined with <a href="https://grafana.com/" rel="noopener noreferrer" target="_blank">Grafana</a>, for visualization and alerting, is a popular choice for open source monitoring. Application Insights supports metrics as any other instrumentation type, requiring no additional SDK or tool.</p>
<p>Defining a metric and tracking values using OpenTelemetry looks like this:</p>
<pre class="prettyprint">// Create counter
var simpleProcessor = new UngroupedBatcher(exporter, TimeSpan.FromSeconds(5));
var meterFactory = MeterFactory.Create(simpleProcessor);
var meter = meterFactory.GetMeter("Sample App");
var enqueuedCounter = meter.CreateInt64Counter("Enqueued Item"); // Incrementing counter for specific source
var labelSet = new Dictionary<string, string>() { { "Source", source } }; enqueuedCounter.Add(context, 1L, this.meter.GetLabelSet(labelSet));
</pre>
<p>The visualization with Grafana is illustrated in the image below:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-4.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-4.png" alt="Metrics with Grafana/Prometheus" width="640" height="186" class="alignnone size-large wp-image-23265"></a></p>
<p>The snippet below demonstrates how to define a metric and track its values using the Application Insights SDK:</p>
<pre class="prettyprint">// create counter
var enqueuedCounter = telemetryClient.GetMetric(new MetricIdentifier("Sample App", "Enqueued Item", "Source")); // Incrementing counter for specific source
enqueuedCounter.TrackValue(metricValue, source);
</pre>
<p>The visualization in Application Insights is illustrated below:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-5.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-5.png" alt="Observability Application Insights custom metrics" width="640" height="339" class="alignnone size-large wp-image-23266"></a></p>
<h2>Troubleshooting</h2>
<p>Now that we have added the 3 observability pillars to a sample application, let’s use them to troubleshoot a scenario where the application is experiencing problems.</p>
<p>The first signals of an application problems are usually detected by anomalies in metrics. The snapshot below illustrates such a scenario, where the number of failed processed items spikes (red line).</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-6.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-6.png" alt="Metrics indicating failure" width="640" height="323" class="alignnone size-large wp-image-23248"></a></p>
<p>A possible next step is to look for hints in distributed traces. This should help us identify where the problem is happening. In Jaeger, searching with the tag “error=true” filters the results, listing transaction where at least one error happened.</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-7.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-7.png" alt="Jaeger traces with error" width="640" height="243" class="alignnone size-large wp-image-23249"></a></p>
<p>In Application Insights, we can search for errors in end-to-end transactions by looking in the Failures/Dependencies or Failures/Exceptions.</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-8.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-8.png" alt="Search traces with error in Application Insights" width="640" height="252" class="alignnone size-large wp-image-23251"></a></p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-9.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-9.png" alt="Application Insights error details in trace" width="640" height="159" class="alignnone size-large wp-image-23252"></a></p>
<p>The problem seems to be related to the Sample.RabbitMQProcessor service. Logs of this service can help us identify the problem. When using Application Insights logging provider, log and traces are correlated, being displayed in the same view:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-10.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-10.png" alt="Observability Application Insights errors and logs" width="640" height="382" class="alignnone size-large wp-image-23253"></a></p>
<p>Looking at the details, we discover that the exception <em>InvalidEventNameException</em> is being raised. Since we are logging the message payload, details of the failed message are available in the monitoring tool. It appears the message being processed has the <em>eventName</em> value of “error”, which is causing the exception to be raised.</p>
<p>When introducing observability into a .NET Core application, two decisions need to be taken:</p>
<ul>
<li>The backend(s) where collected data will be stored and analyzed.</li>
<li>How instrumentation will be added to the application code.</li>
</ul>
<p>Depending on your organization, the monitoring tool might already be selected. However, if you do have the chance to make this decision, consider the following:</p>
<ul>
<li>Centralized: having all data in a single place makes it simple to correlate information. For example, logs, distribute traces and CPU usage. If they are split, more effort is required.</li>
<li>Manageability: how simple is to manage the monitoring tool? Is it hosted in the same machines/VMs where your application is running? In that case, shared infrastructure unavailability might leave you in the dark. When monitoring is not working, alerts won’t be triggered and metrics won’t be collected.</li>
<li>Vendor Locking: if you need to run the same application in different environments (i.e. on premises and cloud), choosing a solution that can run everywhere might be favored.</li>
<li>Application Dependencies: parts of your infrastructure or tooling that might require you to use a specific monitoring vendor. For example, Kubernetes scaling and/or progressive deployment based on Prometheus metrics. </li>
</ul>
<p>Once the monitoring tool has been defined, choosing an SDK is limited to two options. Using the one provided by the monitoring vendor or a library capable of integrating to multiple backends.</p>
<p>Vendor SDKs typically yield little/no surprises regarding stability and functionality. That is the case with Application Insights, for example. It is stable with a rich feature set, including live stream, which is a feature-specific to this specific monitoring system.</p>
<h2>OpenTelemetry</h2>
<p>Using OpenTelemetry SDK gives you more flexibility, offering integration with multiple monitoring backends. You can even mesh them: a centralized monitoring solution for all collected data, while having a subset sent to Prometheus to fulfill a requirement. If you are unsure whether OpenTelemetry is a good fit for your project, consider the following:</p>
<ul>
<li>When is your project going to production? The SDK is currently in alpha, meaning breaking changes and non-production-ready is expected.</li>
<li>Are you using vendor specific features not yet available through the OpenTelemetry SDK (specific collectors, etc.)?</li>
<li>Is your monitoring backend supported by the SDK?</li>
<li>Are you replacing a vendor SDK with OpenTelemetry? Plan some time to compare both SDKs, OpenTelemetry exporters might have differences compared to how the vendor SDK collects data.</li>
</ul>
<p>Source code with the sample application can be found <a href="https://github.com/Azure-Samples/application-insights-aspnet-sample-opentelemetry">in this GitHub Repository</a>.</p>
<div class="authorinfoarea">
<div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="96" height="96" alt="Francisco Beltrao" class="avatar avatar-96 wp-user-avatar wp-user-avatar-96 alignnone photo"></div>
</p></div>
</div>
https://www.sickgaming.net/blog/2020/02/...rvability/
<div style="margin: 5px 5% 10px 5%;"><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="150" height="150" title="" alt="" /></div><div><div class="row justify-content-center">
<div class="col-md-4">
<div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="58" height="58" alt="Francisco Beltrao" class="avatar avatar-58 wp-user-avatar wp-user-avatar-58 alignnone photo"></p>
<p>Francisco</p>
</div>
</div>
</div>
<div class="entry-meta">
<p>February 26th, 2020</p>
</p></div>
<p><!-- .entry-meta --> </p>
<p><em>Thank you <a href="https://github.com/SergeyKanzhelev/" rel="noopener noreferrer" target="_blank">Sergey Kanzhelev</a> for the support and review of this ASP.NET Core Apps Observability article.</em></p>
<p>Modern software development practices value quick and continuous updates, following processes that minimize the impact of software failures. As important as identifying bugs early, finding out if changes are improving business value are equally important. These practices can only work when a monitoring solution is in place. This article explores options for adding observability to .NET Core apps. They have been collected based on interactions with customers using .NET Core in different environments. We will be looking into OpenTelemetry and Application Insights SDKs to add observability to a sample distributed application.</p>
<p>Identifying software error and business impact require a monitoring solution with the ability to observe and report how the system and users behave. The collected data must provide the required information to analyze and identify a bad update. Answering questions such as:</p>
<ul>
<li>Are we observing more errors than before? </li>
<li>Were there new error types?</li>
<li>Did the request duration unexpectedly increase compared to previous versions?</li>
<li>Has the throughput (req/sec) decreased?</li>
<li>Has the CPU and/or Memory usage increased?</li>
<li>Were there changes in our KPIs? </li>
<li>Is it selling less than before? </li>
<li>Did our visitor count decrease?</li>
</ul>
<p>The impact of a bad system update can be minimized by combining the monitoring information with progressive deployment strategies. Such as canary, mirroring, rings, blue/green, etc.</p>
<h2>Observability is Built on 3 Pillars:</h2>
<ul>
<li>
<p><strong>Logging</strong>: collects information about events happening in the system. Helping the team analyze unexpected application behavior. Searching through the logs of suspect services can provide the necessary hint to identify the problem root cause. Such as: service throwing out of memory exceptions and app configuration not reflecting expected values. As well as calls to external service with incorrect address, calls to external service returns with unexpected results, and incoming requests with unexpected input.</p>
</li>
<li>
<p><strong>Tracing</strong>: collects information to create an end-to-end view of how transactions are executed in a distributed system. A trace is like a stack trace spanning multiple applications. Once a problem has been recognized, traces are a good starting point in identifying the source in distributed operations. Like calls from service A to B are taking longer than normal, service payment calls are failing, etc.</p>
</li>
<li>
<p><strong>Metrics</strong>: provide a real-time indication of how the system is running. It can be leveraged to build alerts, allowing proactive reactance to unexpected values. As opposed to logs and traces, the amount of data collected using metrics remains constant as the system load increases. Application problems are often first detected through abnormal metric values. Such as CPU usage being higher than before, payment error count spiking, and queued item count keeps growing.</p>
</li>
</ul>
<h2>Adding Observability to a .NET Core Application</h2>
<p>There are many ways to add observability aspects to an application. <a href="https://dapr.io/" rel="noopener noreferrer" target="_blank">Dapr</a> for example, is a runtime to build distributed applications, <a href="https://github.com/dapr/docs/blob/master/concepts/distributed-tracing/README.md" rel="noopener noreferrer" target="_blank">transparently adding distribute tracing</a>. Another example is through the usage of service meshes in Kubernetes (<a href="https://istio.io/docs/tasks/observability/distributed-tracing/overview/" rel="noopener noreferrer" target="_blank">Istio</a>, <a href="https://linkerd.io/2/features/distributed-tracing/" rel="noopener noreferrer" target="_blank">Linkerd</a>).</p>
<p>Built-in and transparent tracing are typically covering basic scenarios and answering generic questions, such as observed request duration or CPU trends. Other questions, such as custom KPIs or user behavior, require adding instrumentation to your code.</p>
<p>To illustrate how observability can be added to a .NET Core application we will be using the following asynchronous distributed transaction example:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.png" alt="Sample Observability Application Overview" width="484" height="239" class="alignnone size-full wp-image-23256"></a></p>
<ol>
<li><em>Main Api</em> receives a request from a “source”.</li>
<li><em>Main Api</em> enriches the request body with current day, obtained from <em>Time Api</em>.</li>
<li><em>Main Api</em> enqueues enriched request to a RabbitMQ queue for asynchronous processing.</li>
<li><em>RabbitMQProcessor</em> dequeues request.</li>
<li><em>RabbitMQProcessor</em>, as part of the request processing, calls <em>Time Api</em> to get dbtime.</li>
<li><em>Time Api</em> calls SQL Server to get current time.</li>
</ol>
<p>To run the sample application locally (including dependencies and observability tools), follow this <a href="https://github.com/Azure-Samples/application-insights-aspnet-sample-opentelemetry#setup---quickstart-with-docker-compose" rel="noopener noreferrer" target="_blank">guide</a>. The article will walkthrough adding each observability pillar (logging, tracing, metrics) into the sample asynchronous distributed transaction.</p>
<p>Note: for information on bootstrapping OpenTelemetry or Application Insights SDK please refer to the documentation: <a href="https://github.com/open-telemetry/opentelemetry-dotnet" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> and <a href="https://docs.microsoft.com/azure/azure-monitor/app/asp-net-core" rel="noopener noreferrer" target="_blank">Application Insights</a>.</p>
<p>Logging was redesigned in .NET Core, bringing an integrated and extensible API. <a href="https://docs.microsoft.com/aspnet/core/fundamentals/logging/#built-in-logging-providers" rel="noopener noreferrer" target="_blank">Built-in</a> and <a href="https://docs.microsoft.com/aspnet/core/fundamentals/logging/#third-party-logging-providers" rel="noopener noreferrer" target="_blank">external logging providers</a> allow the collection of logs in multiple formats and targets. When deciding a logging platform, consider the following features:</p>
<ul>
<li>Centralized: allowing the collection/storage of all system logs in a central location.</li>
<li>Structured logging: allows you to add searchable metadata to logs. </li>
<li>Searchable: allows searching by multiple criteria (app version, date, category, level, text, metadata, etc.) </li>
<li>Configurable: allows changing verbosity without code changes (based on log level and/or scope). </li>
<li>Integrated: integrated into tracing, facilitating analysis of traces and logs in the same tool. </li>
</ul>
<p>The sample application uses the ILogger interface for logging. The snippet below demonstrates an example of structure logging. Which captures events using <a href="https://messagetemplates.org/" rel="noopener noreferrer" target="_blank">message template</a> and generates information that is human and machine readable.</p>
<pre class="prettyprint">var result = await repository.GetTimeFromSqlAsync();
logger.LogInformation("{operation} result is {result}", nameof(repository.GetTimeFromSqlAsync), result);</pre>
<p>When using a logging backend that understands structured logs, such as Application Insights, search instances of the example log items where “operation” is equal to “GetTimeForSqlAsync”:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.png" alt="Observability Application Insights structured log search" width="640" height="321" class="alignnone size-large wp-image-23258"></a> </p>
<p>Tracing collects required information to enable the observation of a transaction as it “walks” through the system. It must be implemented in every service taking part of the transaction to be effective.</p>
<p>.NET Core defines a common way in which traces can be defined through the System.Diagnostics.Activity class. Through the usage of this class, dependency implementations (i.e. HTTP, SQL, Azure, EF Core, StackExchange.Redis, etc.) can create traces in a neutral way, independent of the monitoring tool used.</p>
<p>It is important to notice that those activities will not be available automatically in a monitoring system. Publishing them is responsibility of the monitoring SDK used. Typically, SDKs have built-in collectors to common activities, transferring them to the destination platform automatically.</p>
<p>In the last quarter of 2019 <a href="https://opentelemetry.io/" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> was announced, promising to standardize telemetry instrumentation and collection across languages and tools. Before OpenTelemetry (or its predecessors OpenCensus and OpenTracing), adding observability would often mean adding proprietary SDKs (in)directly to the code base.</p>
<p>The OpenTelemetry .NET SDK is currently in alpha. The Azure Monitor Application Insights team is investing in OpenTelemetry as a next step of Azure Monitor SDKs evolution.</p>
<h3>Quick Intro on Tracing with OpenTelemetry</h3>
<p>In a nutshell, <a href="https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md" rel="noopener noreferrer" target="_blank">OpenTelemetry</a> collects traces using spans. A span delimits an operation (HTTP request processing, dependency call). It contains start and end time (among other properties). It has a unique identifier (SpanId, 16 characters, 8 bytes) and a trace identifier (TraceId, 32 characters, 16 bytes). The trace identifier is used to correlate all spans for a given transaction. A span can contain children spans (as calls in a stack trace). If you are familiar with Azure Application Insights, the following table might be helpful to understand OpenTelemetry terms:</p>
<table>
<thead>
<td> <strong>Application Insights</strong> </td>
<td> <strong>OpenTelemetry</strong> </td>
</thead>
<tbody>
<tr>
<td> Request, PageView </td>
<td> Span with span.kind = server </td>
</tr>
<tr>
<td> Dependency </td>
<td> Span with span.kind = client </td>
</tr>
<tr>
<td> Id of Request and Dependency </td>
<td> SpanId </td>
</tr>
<tr>
<td> Operation_Id </td>
<td> TraceId </td>
</tr>
<tr>
<td> Operation_ParentId </td>
<td> ParentId </td>
</tr>
</tbody>
</table>
<h2>Adding Tracing to a .NET Core Application</h2>
<p>As mentioned previously, an SDK is needed in order to collect and publish distributed tracing in a .NET Core application. Application Insights SDK sends traces to its centralized database while OpenTelemetry supports multiple exporters (including Application Insights). When configured to use OpenTelemetry, the sample application sends traces to a <a href="https://www.jaegertracing.io/" rel="noopener noreferrer" target="_blank">Jaeger</a> instance.</p>
<p>In the asynchronous distributed transaction scenario, track the following operations:</p>
<h4>HTTP Requests between microservices</h4>
<p>HTTP correlation propagation is part of both SDKs. With the only requirement of setting activity id format to <a href="https://www.w3.org/TR/trace-context-1/" rel="noopener noreferrer" target="_blank">W3C</a> at application start:</p>
<pre class="prettyprint">public static void Main(string[] args)
{ Activity.DefaultIdFormat = ActivityIdFormat.W3C; Activity.ForceDefaultIdFormat = true; // rest is omitted
}</pre>
<h4>Dependency calls (SQL, RabbitMQ)</h4>
<p>Unlike Application Insights SDK, OpenTelemetry (in early alpha) does not yet have support for SQL Server trace collection. A simple way to track dependencies with OpenTelemetry is to wrap the call like the following example:</p>
<pre class="prettyprint">var span = this.tracer.StartSpan("My external dependency", SpanKind.Client);
try
{ return CallToMyDependency();
}
catch (Exception ex)
{ span.Status = Status.Internal.WithDescription(ex.ToString()); throw;
}
finally
{ span.End();
}
</pre>
<h4>Asynchronous Processing / Queued Items</h4>
<p>There is no built-in trace correlation between publishing and processing a RabbitMQ message. Custom code is required, creating the publishing activity (optional) and referencing the parent trace during the item dequeuing.</p>
<p>We covered previously creating traces by wrapping the dependency call. This option allows expressing additional semantic information such as links between spans for batching and other fan-in patterns. Another option is to use System.Diagnostics.Activity, which is a SDK independent way to create traces. This option has limited set of features, however, is built-in into .NET.</p>
<p>These two options work well with each other and .NET team is <a href="https://github.com/dotnet/corefx/issues/42307" rel="noopener noreferrer" target="_blank">working on making .NET Activity and OpenTelemetry spans integration better</a>.</p>
<h3>Creating an Operation Trace</h3>
<p>The snippet below demonstrates how the publish operation trace can be created. It adds the trace information to the enqueued message header, which will later be used to link both operations.</p>
<pre class="prettyprint">Activity activity = null;
if (diagnosticSource.IsEnabled("Sample.RabbitMQ"))
{ // Generates the Publishing to RabbitMQ trace // Only generated if there is an actual listener activity = new Activity("Publish to RabbitMQ"); diagnosticSource.StartActivity(activity, null);
} // Add current activity identifier to the RabbitMQ message
basicProperties.Headers.Add("traceparent", Activity.Current.Id); channel.BasicPublish(...) if (activity != null)
{ // Signal the end of the activity diagnosticSource.StopActivity(activity, null);
}</pre>
<p>A collector, which subscribes to target activities, is required to publish the trace to a backend. Implementing a collector is not a straightforward task and is intended to be used by SDK implementors. The snippet below is taken from the sample application, where a simplified and not production-ready, RabbitMQ collector for OpenTelemetry was implemented:</p>
<pre class="prettyprint">public class RabbitMQListener : ListenerHandler
{ public override void OnStartActivity(Activity activity, object payload) { var span = this.Tracer.StartSpanFromActivity(activity.OperationName, activity); foreach (var kv in activity.Tags) span.SetAttribute(kv.Key, kv.Value); } public override void OnStopActivity(Activity activity, object payload) { var span = this.Tracer.CurrentSpan; span.End(); if (span is IDisposable disposableSpan) { disposableSpan.Dispose(); } }
} var subscriber = new DiagnosticSourceSubscriber(new RabbitMQListener("Sample.RabbitMQ", tracer), DefaultFilter);
subscriber.Subscribe();
</pre>
<p>For more information on how to build collectors, please refer to OpenTelemetry/Application Insights built-in collectors as well as this <a href="https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md" rel="noopener noreferrer" target="_blank">user guide</a>.</p>
<h2>Activity</h2>
<p>As mentioned, HTTP requests in ASP.NET have built-in activity correlation injected by the framework. That is not the case for the RabbitMQ consumer. In order to continue the distributed transaction, we must create the span referencing the parent trace. This was injected into the message by the publisher. The snippet below uses an extension method to build the activity:</p>
<pre class="prettyprint">public static Activity ExtractActivity(this BasicDeliverEventArgs source, string name)
{ var activity = new Activity(name ?? Constants.RabbitMQMessageActivityName); if (source.BasicProperties.Headers.TryGetValue("traceparent", out var rawTraceParent) && rawTraceParent is byte[] binRawTraceParent) { activity.SetParentId(Encoding.UTF8.GetString(binRawTraceParent)); } return activity;
}</pre>
<p>The activity is then used to create the concrete trace. In OpenTelemetry the code looks like this:</p>
<pre class="prettyprint">// Note: OpenTelemetry requires the activity to be started
activity.Start();
tracer.StartActiveSpanFromActivity(activity.OperationName, activity, SpanKind.Consumer, out span);
</pre>
<p>The snippet below creates the telemetry using Application Insights SDK:</p>
<pre class="prettyprint">// Note: Application Insights will start the activity
var operation = telemetryClient.StartOperation<Dependencytelemetry>(activity);
</pre>
<p>The usage of activities gives flexibility in terms of SDK used, as it is a neutral way to create traces. Once instrumented the distributed end-to-end transaction in Jaeger looks like this:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-2.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-2.png" alt="Distributed Trace in Jaeger" width="640" height="222" class="alignnone size-large wp-image-23260"></a></p>
<p>The same transaction in Application Insights looks like this:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.jpg"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-1.jpg" alt="Distributed Trace in Application Insights" width="640" height="183" class="alignnone size-large wp-image-23261"></a></p>
<p>When using single monitoring solution for traces and logs, such as Application Insights, the logs become part of the end-to-end transaction:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-3.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-3.png" alt="Observability Application Insights: traces and logs" width="640" height="420" class="alignnone size-large wp-image-23263"></a></p>
<h2>Metrics</h2>
<p>There are common metrics applicable to most applications, like CPU usage, allocated memory, and request time. As well as business specific like visitors, page views, sold items, and sent items. Exposing business metrics in a .NET Core application typically requires using an SDK.</p>
<p>Collection metrics in .NET Core happens through 3rd-party SDKs which aggregate values locally, before sending to a backend. Most libraries have built-in collection for common application metrics. However, business specific metrics need to be built in the application logic, since they are created based on events that occur in the application domain.</p>
<p>In the sample application we are using metric counters for: enqueued items, successfully processed items and unsuccessfully processed items. The implementation in both SDKs is similar, requiring setting up a metric, dimensions and finally, tracking the counter values.</p>
<p>OpenTelemetry supports multiple exporters and we will be using <a href="https://prometheus.io/" rel="noopener noreferrer" target="_blank">Prometheus</a> exporter. Prometheus combined with <a href="https://grafana.com/" rel="noopener noreferrer" target="_blank">Grafana</a>, for visualization and alerting, is a popular choice for open source monitoring. Application Insights supports metrics as any other instrumentation type, requiring no additional SDK or tool.</p>
<p>Defining a metric and tracking values using OpenTelemetry looks like this:</p>
<pre class="prettyprint">// Create counter
var simpleProcessor = new UngroupedBatcher(exporter, TimeSpan.FromSeconds(5));
var meterFactory = MeterFactory.Create(simpleProcessor);
var meter = meterFactory.GetMeter("Sample App");
var enqueuedCounter = meter.CreateInt64Counter("Enqueued Item"); // Incrementing counter for specific source
var labelSet = new Dictionary<string, string>() { { "Source", source } }; enqueuedCounter.Add(context, 1L, this.meter.GetLabelSet(labelSet));
</pre>
<p>The visualization with Grafana is illustrated in the image below:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-4.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-4.png" alt="Metrics with Grafana/Prometheus" width="640" height="186" class="alignnone size-large wp-image-23265"></a></p>
<p>The snippet below demonstrates how to define a metric and track its values using the Application Insights SDK:</p>
<pre class="prettyprint">// create counter
var enqueuedCounter = telemetryClient.GetMetric(new MetricIdentifier("Sample App", "Enqueued Item", "Source")); // Incrementing counter for specific source
enqueuedCounter.TrackValue(metricValue, source);
</pre>
<p>The visualization in Application Insights is illustrated below:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-5.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-5.png" alt="Observability Application Insights custom metrics" width="640" height="339" class="alignnone size-large wp-image-23266"></a></p>
<h2>Troubleshooting</h2>
<p>Now that we have added the 3 observability pillars to a sample application, let’s use them to troubleshoot a scenario where the application is experiencing problems.</p>
<p>The first signals of an application problems are usually detected by anomalies in metrics. The snapshot below illustrates such a scenario, where the number of failed processed items spikes (red line).</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-6.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-6.png" alt="Metrics indicating failure" width="640" height="323" class="alignnone size-large wp-image-23248"></a></p>
<p>A possible next step is to look for hints in distributed traces. This should help us identify where the problem is happening. In Jaeger, searching with the tag “error=true” filters the results, listing transaction where at least one error happened.</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-7.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-7.png" alt="Jaeger traces with error" width="640" height="243" class="alignnone size-large wp-image-23249"></a></p>
<p>In Application Insights, we can search for errors in end-to-end transactions by looking in the Failures/Dependencies or Failures/Exceptions.</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-8.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-8.png" alt="Search traces with error in Application Insights" width="640" height="252" class="alignnone size-large wp-image-23251"></a></p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-9.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-9.png" alt="Application Insights error details in trace" width="640" height="159" class="alignnone size-large wp-image-23252"></a></p>
<p>The problem seems to be related to the Sample.RabbitMQProcessor service. Logs of this service can help us identify the problem. When using Application Insights logging provider, log and traces are correlated, being displayed in the same view:</p>
<p><a href="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-10.png"> <img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability-10.png" alt="Observability Application Insights errors and logs" width="640" height="382" class="alignnone size-large wp-image-23253"></a></p>
<p>Looking at the details, we discover that the exception <em>InvalidEventNameException</em> is being raised. Since we are logging the message payload, details of the failed message are available in the monitoring tool. It appears the message being processed has the <em>eventName</em> value of “error”, which is causing the exception to be raised.</p>
<p>When introducing observability into a .NET Core application, two decisions need to be taken:</p>
<ul>
<li>The backend(s) where collected data will be stored and analyzed.</li>
<li>How instrumentation will be added to the application code.</li>
</ul>
<p>Depending on your organization, the monitoring tool might already be selected. However, if you do have the chance to make this decision, consider the following:</p>
<ul>
<li>Centralized: having all data in a single place makes it simple to correlate information. For example, logs, distribute traces and CPU usage. If they are split, more effort is required.</li>
<li>Manageability: how simple is to manage the monitoring tool? Is it hosted in the same machines/VMs where your application is running? In that case, shared infrastructure unavailability might leave you in the dark. When monitoring is not working, alerts won’t be triggered and metrics won’t be collected.</li>
<li>Vendor Locking: if you need to run the same application in different environments (i.e. on premises and cloud), choosing a solution that can run everywhere might be favored.</li>
<li>Application Dependencies: parts of your infrastructure or tooling that might require you to use a specific monitoring vendor. For example, Kubernetes scaling and/or progressive deployment based on Prometheus metrics. </li>
</ul>
<p>Once the monitoring tool has been defined, choosing an SDK is limited to two options. Using the one provided by the monitoring vendor or a library capable of integrating to multiple backends.</p>
<p>Vendor SDKs typically yield little/no surprises regarding stability and functionality. That is the case with Application Insights, for example. It is stable with a rich feature set, including live stream, which is a feature-specific to this specific monitoring system.</p>
<h2>OpenTelemetry</h2>
<p>Using OpenTelemetry SDK gives you more flexibility, offering integration with multiple monitoring backends. You can even mesh them: a centralized monitoring solution for all collected data, while having a subset sent to Prometheus to fulfill a requirement. If you are unsure whether OpenTelemetry is a good fit for your project, consider the following:</p>
<ul>
<li>When is your project going to production? The SDK is currently in alpha, meaning breaking changes and non-production-ready is expected.</li>
<li>Are you using vendor specific features not yet available through the OpenTelemetry SDK (specific collectors, etc.)?</li>
<li>Is your monitoring backend supported by the SDK?</li>
<li>Are you replacing a vendor SDK with OpenTelemetry? Plan some time to compare both SDKs, OpenTelemetry exporters might have differences compared to how the vendor SDK collects data.</li>
</ul>
<p>Source code with the sample application can be found <a href="https://github.com/Azure-Samples/application-insights-aspnet-sample-opentelemetry">in this GitHub Repository</a>.</p>
<div class="authorinfoarea">
<div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/03/asp-net-core-apps-observability.jpg" width="96" height="96" alt="Francisco Beltrao" class="avatar avatar-96 wp-user-avatar wp-user-avatar-96 alignnone photo"></div>
</p></div>
</div>
https://www.sickgaming.net/blog/2020/02/...rvability/