APM tips blog

Blog about application monitoring.

Deployments, Scale Ups and Downs

| Comments

I lost track of what deployed in staging slot of a cloud service once. I also was wondering how other people deploying that service. This post shows how you can answer questions like this using Application Insights Analytics queries.

The service I am looking at is deployed as two cloud services in different regions. It uses automatic code versioning using BuildInfo.config file. New version is deployed in staging slot and then VIP swapped into production.

As I said Application Insights is configured to report application version with every telemetry item. So you can group by application version and find when new version got deployed.

1
2
3
4
performanceCounters
| where timestamp >= ago(5d)
| where name == "Requests/Sec" 
| summarize dcount(cloud_RoleInstance) by application_Version, bin(timestamp, 15m)

The query above detects deployments to staging, but it will not detect the VIP swap accurately. When VIP swap happens the same computers are running the same code. So the number of role instances reporting specific application version in the query above does not change. The only thing changes during the VIP swap is a virtual IP address of those computers.

I posted before how Application Insights will associate the IP address of incoming connection with the telemetry item if telemetry item by itself doesn’t have it specified. So all the performance counters will have client_IP field of the incoming connection. In case of cloud service it will be an IP address of the slot sending telemetry. Let’s use this fact and extend application_Version with the client_IP.

1
2
3
4
5
6
7
let interval = 5d;
performanceCounters
| where timestamp >= ago(interval)
| where name == "Requests/Sec" 
| extend deployment = strcat(application_Version, " ", client_IP)
| summarize dcount(cloud_RoleInstance) by deployment, bin(timestamp, 5m)
| render areachart

This query gave me this picture:

There are two regions this application is deployed to. Hence two general areas - 5 instances and 3 instances. You can also see the spikes when deployments were happening. You can also notice that staging slot doesn’t last long. Spike is very short. Turns out that the staging computers are shut down as part of a release procedure. Typically you would see scaled down number of staging computers running all the time to speed up the rollback when it’s needed.

Let’s zoom into the single deployment:

1
2
3
4
5
6
7
8
9
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
performanceCounters
| where timestamp >= fromDate
| where timestamp <= toDate
| where name == "Requests/Sec" 
| extend deployment = strcat(application_Version, " ", client_IP)
| summarize dcount(cloud_RoleInstance) by deployment, bin(timestamp, 1m)
| render areachart  

The result is quite interesting:

You can see the new version of an application deployed into the staging environment in one region and running for ~10 minutes. The same version was deployed in the staging of a different region for much shorter time. It seems that the production traffic started the application initialization after VIP swap. Which typically a bad practice, by the way. At least some smoke tests needs to be run against the staging slot to validate the configuration.

Dig deeper

Analyzing the picture is not easy. Let’s modify the query to print out every deployment, scale up and scale down. Basically, we need to query for every time interval when the previous interval had a different number of role instances reporting the same application version.

Here is a query that returns number of instances per minute:

1
2
3
4
5
6
7
8
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
performanceCounters
| where timestamp >= _fromDate
| where timestamp <= _toDate
| where name == "Requests/Sec" 
| summarize num_instances = dcount(cloud_RoleInstance) 
    by application_Version, client_IP, bin(timestamp, 1m) };

You can call this query query(fromDate, toDate). Now let’s join it with the same results a minute back:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
  performanceCounters
    | where timestamp >= _fromDate
    | where timestamp <= _toDate
    | where name == "Requests/Sec" 
    | summarize num_instances = dcount(cloud_RoleInstance) 
        by application_Version, client_IP, bin(timestamp, 1m) 
};
query(fromDate, toDate) | extend ttt = timestamp | join kind=leftouter 
(
  query(fromDate - 1m, toDate + 1m) | extend ttt = timestamp + 1m
) on ttt, application_Version, client_IP

Note the use of leftouter join in the query. The only thing left is to filter the results and make it more human readable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
performanceCounters
| where timestamp >= _fromDate
| where timestamp <= _toDate
| where name == "Requests/Sec" 
| summarize num_instances = dcount(cloud_RoleInstance) by application_Version, client_IP, bin(timestamp, 1m) };
query(fromDate, toDate) | extend ttt = timestamp | join kind=leftouter (
query(fromDate - 1m, toDate + 1m) | extend ttt = timestamp + 1m
) on ttt, application_Version, client_IP
| project timestamp, before = num_instances1, after = num_instances, application_Version, client_IP
| where after != before
| extend name = 
  strcat( 
      iff(isnull(before), "Deployment", iff(after > before, "Scale Up", "Scale Down")),
      " in ",
      iff(client_IP == "52.175.18.0" or client_IP == "13.77.108.0", "Production", "Staging")
  )
| order by timestamp 

The resulting table will look like this:

timestamp before after application_Version client_IP name
2017-01-18T21:54:00Z null 2 vstfs:///Build/Build/3562348 13.77.107.0 Deployment in Staging
2017-01-18T21:59:00Z 2 3 vstfs:///Build/Build/3562348 13.77.107.0 Scale Up in Staging
2017-01-18T22:06:00Z 3 2 vstfs:///Build/Build/3555787 52.175.18.0 Scale Down in Production
2017-01-18T22:07:00Z 2 3 vstfs:///Build/Build/3555787 52.175.18.0 Scale Up in Production
2017-01-18T22:07:00Z 5 1 vstfs:///Build/Build/3555787 13.77.108.0 Scale Down in Production
2017-01-18T22:07:00Z null 3 vstfs:///Build/Build/3555787 13.77.107.0 Deployment in Staging
2017-01-18T22:08:00Z null 3 vstfs:///Build/Build/3562348 13.77.108.0 Deployment in Production
2017-01-18T22:09:00Z 3 5 vstfs:///Build/Build/3562348 13.77.108.0 Scale Up in Production
2017-01-18T22:09:00Z 3 2 vstfs:///Build/Build/3555787 52.175.18.0 Scale Down in Production
2017-01-18T22:09:00Z null 1 vstfs:///Build/Build/3555787 168.63.221.0 Deployment in Staging
2017-01-18T22:10:00Z null 3 vstfs:///Build/Build/3562348 52.175.18.0 Deployment in Production

Using ad hoc analytical queries I found that deployments of this service can be improved. Smoke tests should be added for the staging deployment and staging machines should run for some time after deployment in case you’d need to VIP swap the deployment back.

Automatically detect deployments and scale up and downs may be useful in other scenarios. You may want to notify the service owner by writing a connector for your favorite chat platform. Or you can list the latest deployment to production and staging to know what and when was deployed. You can even report those deployments back to Application Insights as release annotation to see markers on charts. With the power of Analytical Queries in Application Insights it is easy to automate any of these scenarios.

Alerting Over Analytics Queries

| Comments

This is DYI post on how you can use Availability Tests and Data Access API together to enable most popular requests in Application Insights uservoice.

Application Insights uservoce has these 4 very popular items. It is not hard to implement them yourself using Application Insights extensibility points.

Let’s start with the alert on segmented metric. Let’s say I want to recieve alert when nobody opens any posts on this site. Posts differ from the default and about page by /blog/ substring in url. You can go to Application Insights Analytics and write a query like this to get the number of viewed posts:

1
2
3
4
5
pageViews
| where timestamp > ago(10min)
| where timestamp < ago(5min)
| where url !contains "/blog/" 
| summarize sum(itemCount)

Note also that I’m using 5 minutes in the past to allow some time for data to arrive. Typical latency for the telemetry is under the minute. I’m being on a safe side here.

In order to convert this query into a Pass/Fail statement I can do something like this:

1
2
3
4
5
6
pageViews
| where timestamp > ago(10min)
| where timestamp < ago(5min)
| where url !contains "/blog/" 
| summarize isPassed = (sum(itemCount) > 1)
| project iff(isPassed, "PASSED", "FAILED")

This query will return a single value PASSED or FAILED.

Now I can go to the query API explorer at dev.applicationinsights.io. Enter appId and API key and the query. You will get the URL like this:

1
2
3
GET /beta/apps/cbf775c7-b52e-4533-8673-bd6fbd7ab04a/query?query=pageViews%7C%20where%20timestamp%20%3E%20ago(10min)%7C%20where%20timestamp%20%3C%20ago(5min)%7C%20where%20url%20!contains%20%22%2Fblog%2F%22%20%7C%20summarize%20isPassed%20%3D%20(sum(itemCount)%20%3E%201)%7C%20project%20iff(isPassed%2C%20%22PASSED%22%2C%20%22FAILED%22) HTTP/1.1
Host: api.applicationinsights.io
x-api-key: 8083guxbvatm4bq7kruraw8p8oyj7yd2i2s4exnr

Instead of a header you can pass api key as a query string parameter. Use the parameter name &api_key. Resulting URL will look like this:

1
2
3
https://api.applicationinsights.io/beta/apps/cbf775c7-b52e-4533-8673-bd6fbd7ab04a/query
?query=pageViews%7C%20where%20timestamp%20%3E%20ago(10min)%7C%20where%20timestamp%20%3C%20ago(5min)%7C%20where%20url%20!contains%20%22%2Fblog%2F%22%20%7C%20summarize%20isPassed%20%3D%20(sum(itemCount)%20%3E%201)%7C%20project%20iff(isPassed%2C%20%22PASSED%22%2C%20%22FAILED%22)
&api_key=8083guxbvatm4bq7kruraw8p8oyj7yd2i2s4exnr

Final step will be to set up a ping test that will query this Url and make a content match success criteria to search for the keyword PASSED.

You can change queries to satisfy other requests. You can query customEvents by name same way as I queried pageViews by url. You can set an alert when CPU is very high at least on one instance instead of standard averge across all instances:

1
2
3
4
5
6
performanceCounters
| where timestamp > ago(10min) and timestamp < ago(5min)
| where category == "Process" and counter == "% Processor Time"
| summarize cpu_per_instance = avg(value) by cloud_RoleInstance
| summarize isPassed = (max(cpu_per_instance) > 80)
| project iff(isPassed, "PASSED", "FAILED")

You can also join multiple metrics or tables:

1
2
3
4
5
6
7
8
exceptions
| where timestamp > ago(10min) and timestamp < ago(5min)
| summarize exceptionsCount = sum(itemCount) | extend t = "" | join
(requests 
| where timestamp > ago(10min) and timestamp < ago(5min)
| summarize requestsCount = sum(itemCount) | extend t = "") on t
| project isPassed = 1.0 * exceptionsCount / requestsCount > 0.5
| project iff(isPassed, "PASSED", "FAILED")

Some thoughts about this implementation:

  • Availability tests runs once in 5 minutes from a single location. With 5 locations analytics query will run about every minute.
  • The limit on number of analytics queries is 1500 per day. It allows to run a single ping test once a minute or more tests more rarely
  • If query is too long you may need to use POST instead of GET. You can implement POST as multi-step test. But multi-step tests costs money. So you may be better off implementing a simple proxy that will run queries. Same way as I set certificate expiration monitoring.

Update to the Last Post - Set the Name in MVC Web API

| Comments

Answering the quesiton in this comment - how to set the name of the request for attribute-based MVC Web API routing. It can be done as an extension to the previous post. Something like this would work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class ApplicationInsightsCorrelationHttpActionFilter : System.Web.Http.Filters.ActionFilterAttribute, ITelemetryInitializer
{
    private static AsyncLocal<RequestTelemetry> currentRequestTelemetry = new AsyncLocal<RequestTelemetry>();

    public override Task OnActionExecutingAsync(HttpActionContext actionContext, CancellationToken cancellationToken)
    {
        var template = actionContext.RequestContext.RouteData.Route.RouteTemplate;

        var request = System.Web.HttpContext.Current.GetRequestTelemetry();
        request.Name = template;
        request.Context.Operation.Name = request.Name;

        currentRequestTelemetry.Value = request;

        return base.OnActionExecutingAsync(actionContext, cancellationToken);
    }
}

Update: More complete version of this filter is posted by @snboisen at github.

This is an action filter for Web API. In the beggining of action execution the name can be taken from the route data.

Action filter wouldn’t work when execution didn’t reach the controller. So you may need to duplicate the logic in telemetry initializer Initialize method itself. However in this case you’d need to get the currently executing request and it may not always be available.

Manual Correlation in ASP.NET MVC Apps

| Comments

I already wrote that correlation is not working well in ASP.NET MVC applications. Here is how you can fix it manually.

Assuming you are using Microsoft.ApplicationInsights.Web nuget package - you will have access to the RequestTelemetry stored in HttpContext.Current. You can store it in AsyncLocal (for FW 4.5 you can use CallContext) so it will ba available for all telemetry - sync and async run inside the action.

This is an example implementation that uses the same class as Action Filter and Telemetry Initializer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
namespace ApmTips
{
    public class ApplicationInsightsCorrelationActionFilter : ActionFilterAttribute, ITelemetryInitializer
    {
        private static AsyncLocal<RequestTelemetry> currentRequestTelemetry = new AsyncLocal<RequestTelemetry>();

        public override void OnActionExecuting(ActionExecutingContext filterContext)
        {
            var request = HttpContext.Current.GetRequestTelemetry();
            currentRequestTelemetry.Value = request;

            base.OnActionExecuting(filterContext);
        }

        public override void OnActionExecuted(ActionExecutedContext filterContext)
        {
            currentRequestTelemetry.Value = null;

            base.OnActionExecuted(filterContext);
        }

        public override void OnResultExecuting(ResultExecutingContext filterContext)
        {
            var request = HttpContext.Current.GetRequestTelemetry();
            currentRequestTelemetry.Value = request;

            base.OnResultExecuting(filterContext);
        }

        public override void OnResultExecuted(ResultExecutedContext filterContext)
        {
            currentRequestTelemetry.Value = null;

            base.OnResultExecuted(filterContext);
        }

        public void Initialize(ITelemetry telemetry)
        {
            var request = currentRequestTelemetry.Value;

            if (request == null)
                return;

            if (string.IsNullOrEmpty(telemetry.Context.Operation.Id) && !string.IsNullOrEmpty(request.Context.Operation.Id))
            {
                telemetry.Context.Operation.Id = request.Context.Operation.Id;
            }

            if (string.IsNullOrEmpty(telemetry.Context.Operation.ParentId) && !string.IsNullOrEmpty(request.Id))
            {
                telemetry.Context.Operation.ParentId = request.Id;
            }

            if (string.IsNullOrEmpty(telemetry.Context.Operation.Name) && !string.IsNullOrEmpty(request.Name))
            {
                telemetry.Context.Operation.Name = request.Name;
            }

            if (string.IsNullOrEmpty(telemetry.Context.User.Id) && !string.IsNullOrEmpty(request.Context.User.Id))
            {
                telemetry.Context.User.Id = request.Context.User.Id;
            }

            if (string.IsNullOrEmpty(telemetry.Context.Session.Id) && !string.IsNullOrEmpty(request.Context.Session.Id))
            {
                telemetry.Context.Session.Id = request.Context.Session.Id;
            }
        }
    }
}

Here is how you’d register it in Global.asax.cs:

1
2
3
var filter = new ApplicationInsightsCorrelationActionFilter();
GlobalFilters.Filters.Add(filter);
TelemetryConfiguration.Active.TelemetryInitializers.Add(filter);

You can always use one of community-supported MVC monitoring NuGets which will be doing a similar things to enable this correlation.

Build Information in Different Environments

| Comments

I wrote before about automatic telemetry versioning you can implement for ASP.NET apps. With a single line change in the project file you can generate the BuildInfo.config file. This file contains the basic build information including build id.

Note, that when you build an application locally - BuildInfo.config will be generated under bin/ folder and will have AutoGen_<GUID> build id. With the new VSTS build infrastracture, the same AutoGen_ appears in production builds as well.

The reason is that VSTS build infrastructure defined a new build property names. Specifically, BuildUri was renamed to Build.BuildUri. Here is the list of all predefined variables in VSTS builds. So the fix for BuildInfo.config generation is easy:

1
2
<BuildUri Condition="$(BuildUri) == ''">$(Build_BuildUri)</BuildUri>
<GenerateBuildInfoConfigFile>true</GenerateBuildInfoConfigFile>

You can review the file C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v14.0\BuildInfo\Microsoft.VisualStudio.ReleaseManagement.BuildInfo.targets for other properties that got broken. For instance, you may want to fix BuildLabel as well. Fix above will make BuildLabel to use BuildId:

1
2
<BuildLabel kind="label">vstfs:///Build/Build/3497900</BuildLabel>
<BuildId kind="id">vstfs:///Build/Build/3497900</BuildId>

instead of

1
2
build id: 3497900
build name: 20161214.1

You can use the same trick for Azure Web Apps. When you set continues integration for Azure Web App from github - Kudu will download sources and build them locally. Every deployment is identified by commit ID. So you can set buildId like I did it in this commit in Glimpse.ApplicationInsights:

1
2
<BuildId Condition="$(BuildId) == ''">$(SCM_COMMIT_ID)</BuildId>
<GenerateBuildInfoConfigFile>true</GenerateBuildInfoConfigFile>

Once implemented I can see the deployment id as an application version in Glimpse:

You can also filter by it in azure portal:

Using this deployment ID you can query deployment information using the link https://%WEBSITE_HOSTNAME%.scm.azurewebsites.net/api/deployments/<deployment id> to see something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "id": "7e5aeb37764b195a721d193be2b3ab8601276ef4",
  "status": 4,
  "status_text": "",
  "author_email": "SergKanz@microsoft.com",
  "author": "Sergey Kanzhelev",
  "deployer": "GitHub",
  "message": "commit ID\n",
  "progress": "",
  "received_time": "2016-12-14T21:59:50.8705503Z",
  "start_time": "2016-12-14T21:59:51.0919654Z",
  "end_time": "2016-12-14T22:05:29.940095Z",
  "last_success_end_time": "2016-12-14T22:05:29.940095Z",
  "complete": true,
  "active": true,
  "is_temp": false,
  "is_readonly": false,
  "url": "https://ai-glimpse-web-play-develop.scm.azurewebsites.net/api/deployments/7e5aeb37764b195a721d193be2b3ab8601276ef4",
  "log_url": "https://ai-glimpse-web-play-develop.scm.azurewebsites.net/api/deployments/7e5aeb37764b195a721d193be2b3ab8601276ef4/log",
  "site_name": "ai-glimpse-web-play"
}

As mentioned in this issue you may also override BuildId for other platforms. For AppVeyor it seems that this property will work: APPVEYOR_BUILD_VERSION.

Request Success and Response Code

| Comments

Application Inisghts monitors web application requests. This article explains the difference between two fields representing the request - success and responseCode.

There are many ways you use an application monitoring tool. You can use it for the daily status check, bugs triage or deep diagnostics. For the daily status check you want to know quickly whether anything unusual is going on. The commonly used chart is the number of failed requests. When this number is higher then yesterday - time comes for triage and deep diagnositcs. You want to know how exactly these requests failed.

For the web applications Application Inisghts defines request as failed when the response code is less the 400 or equal to 401. Quite straightforward. So why there are two fields being sent - responseCode and success. Wouldn’t it be easier to map the response code to success status on backend?

Response code 401 is marked as “successful” as it is part of a normal authentication handshake. Marking it as “failed” can cause an alert in the middle of a night when people on the different continent just came to work and login to the application. However this logic is oversimplified. You probably want to get notified when all these people who just came to work cannot login to the applicaiton because of some recent code change. Those 401 responses would be a legitimate “faiures”.

So you may want to override the default success = true value for 401 response code when the authentication has actually failed.

There are other cases when response code is not mapped directly to the request success.

Response code 404 may indicate “no records” which can be part of regular flow. It also may indicate a broken link. For the broken links you can even implement a logic that will mark broken links as failures only when those links are located on the same web page (by analyzing urlReferrer) or accessed from the company’s mobile application. Similarly 301 and 302 will indicate failure when accessed from the client that doesn’t support redirect.

Partially accepted content206 may indicate a failure of an overall request. For instance, Application Insights endpoint allows to send a batch of telemetry items as a single request. It will return 206 when some items sent in request were not processed successfully. Increasing rate of 206 indicates a problem that needs to be investigated. Similar logic applies to 207 Multi-Status where the success may be the worst of separate response codes.

You may want to set success = false for 200 responses representing an error page.

And definitely set success = true for the 418 I'm a teapot (RFC 2324) as request for cofffee should never fail.

Here is how you can set the success flag for the request telemetry.

Implement telemetry initializer

You can write a simple telemetry initializer that override the default behavior:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class SetFailedFor401 : ITelemetryInitializer
{
    public void Initialize(ITelemetry telemetry)
    {
        if (telemetry is RequestTelemetry)
        {
            var r = (RequestTelemetry)telemetry;

            if (r.ResponseCode == "401")
            {
                r.Success = false;
            }
        }
    }
}

You can make telemetry initializer configurable. This telemetry initializer will set success to true for the 404 requests from the external sites.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class SetSuccesFor404FromExternalSite : ITelemetryInitializer
{
    public string ApplicationHost { get; set; }

    public void Initialize(ITelemetry telemetry)
    {
        if (telemetry is RequestTelemetry)
        {
            var r = (RequestTelemetry)telemetry;

            if (r.ResponseCode == "404" &&
                (HttpContext.Current.Request.UrlReferrer != null &&
                 !HttpContext.Current.Request.UrlReferrer.Host.Contains(this.ApplicationHost)
                )
               )
            {
                r.Success = true;
            }
        }
    }
}

You’d need to configure it like this:

1
2
3
<Add Type="SetSuccesFor404FromExternalSite, WebApplication1" >
    <ApplicationHost>apmtips.com</ApplicationHost>
</Add>

From Code

From anywhere in code you can set the succes status of the request. This value will not be overriden by standard request telemetry collection code.

1
2
3
4
if (returnEmptyCollection)
{
    HttpContext.Current.GetRequestTelemetry().Success = true;
}

How Application Insights Status Monitor DOES NOT Monitor Dependencies

| Comments

In this article I’ll address the common misperception that Status Monitor collects telemetry. I’ll show how it helps to collect (but not collects itself) application dependencies information.

Application Insights collects information about application dependencies. Most of the time you don’t need to do anything special to collect all outbound HTTP and SQL calls with the basic informaiton like URL and stored procedure name. Application Insights SDK will use the EventSource traces that .NET framework 4.6 emits.

However if you need more information about the dependencies like raw SQL statement we recommend to install Status Monitor.

I created a small demo application that shows how Status Monitor helps to collect dependencies. In this demo you can learn how to collect SmtpMail.Send method as an external dependency call.

The demo application doesn’t use Status Monitor. Instead it downloads binaries distributed by Status Monitor as NuGet packages:

1
2
<package id="Microsoft.ApplicationInsights.Agent_x64" version="2.0.5" />
<package id="Microsoft.ApplicationInsights.Agent_x86" version="2.0.5" />

Status Monitor will install the binaries (called Runtime Instrumentation Agent) from those NuGet packages into the folder %ProgramFiles%\Microsoft Application Insights\Runtime Instrumentation Agent.

If you run this demo application from Visual Studio you’ll get the message: Please run this application using Startup.cmd script..

Running the script will instruct .NET Framework to enable Runtime Instrumentation Agent for the current process. When you run the application using Startup.cmd script - you’ll see the message Application started with the Runtime Instrumentation Agent enabled. Press any key to continue.... Looking into the code you may notice that the method SmtpClient.Send was already called by that moment. However as Status Monitor do NOT monitor dependencies - Runtime Instrumentation Agent did nothing when this call was made.

We want to report every call of the method SmtpClient.Send as a dependency call. We know what information to collect with that dependency call - mail subject, smtp host, etc.

This is how to configure the monitoring. First - import another NuGet package:

1
<package id="Microsoft.ApplicationInsights.Agent.Intercept" version="2.0.5" />

Second - call the method Decorate and pass the following method information - assembly name, module name and full method name.

1
2
3
Decorator.InitializeExtension();
Functions.Decorate("System", "System.dll", "System.Net.Mail.SmtpClient.Send",
    OnBegin, OnEnd, OnException, false);

You also need to pass three callbacks:

1
2
3
public static object OnBegin(object thisObj, object arg1)
public static object OnEnd(object context, object returnValue, object thisObj, object arg1)
public static void OnException(object context, object exception, object thisObj, object arg1)

The call to Decorate will do the magic. It finds the method you specified and using the Runtime Instrumentation Agent inserts those callbacks into the beggining, end and in the global try{}catch statement of that method. This magic only allowed when Runtime Instrumentation Agent is enabled for the process.

In the callbacks implementation I start the operation in OnBegin callback:

1
2
3
4
5
6
7
8
9
10
11
12
13
public static object OnBegin(object thisObj, object arg1)
{
    // start the operation
    var operation = new TelemetryClient().StartOperation<DependencyTelemetry>("Send");
    operation.Telemetry.Type = "Smtp";
    operation.Telemetry.Target = ((SmtpClient)thisObj).Host;
    if (arg1 != null)
    {
        operation.Telemetry.Data = ((MailMessage)arg1).Subject;
    }
    // save the operation in the local context
    return operation;
}

And stop operaiton in OnEnd and OnException callbacks:

1
2
3
4
5
6
7
8
public static void OnException(object context, object exception, object thisObj, object arg1)
{
    // mark operation as failed and stop it. Getting the operation from the context
    var operation = (IOperationHolder<DependencyTelemetry>)context;
    operation.Telemetry.Success = false;
    operation.Telemetry.ResultCode = exception.GetType().Name;
    new TelemetryClient().StopOperation(operation);
}

Notice the runtime arguments passed to the original method are used in those callbacks to collect information. Argument called thisObj is an instance of SmtpClient that made a call and arg1 is a MailMessage that was passed as an argument.

So all the data collection logic is implemented in the application itself. Runtime Instrumentation Agent just provided a way to inject the callbacks into the methods of interest.

This is how Status Monitor helps to collect dependencies. During installation Status Monitor enables Runtime Instrumentation Agent for all IIS-based applications. Agent does nothing if application does not use Application Insights SDK. It has zero impact in the runtime. Only when Application Insights SDK is initialized - it can use Runtime Instrumentation Agent to monitor any methods it chooses. Status Monitor doesn’t have information on what needs to be instrumented, what data should be collected and where this information have to be send. It all defined in code by Application Insights SDK.

This approach allows to version Application Insights SDK data collection logic without the need to re-install the agent. It also guarantees that the update of an agent will not change the way telemetry is being collected for your application.

I’ll describe other functions of Status Monitor in the next blog posts.

BTW, the same Runtime Instrumentation Agent is installed by Microsoft Monitoring Agent and Application Insights Azure Web App extension.

Sync Channel

| Comments

Application Insights API library provides a basic methods to track telemetry in the application. Like TrackTrace and TrackMetric. It also implements the basic channel to send this data to the Application Insights.

If you are using Application Insights API in short running tasks you may hit a problem that some telemetry wasn’t send. Basic channel do not provide control over data delivery. Method Flush makes an effort to flush telemetry left in buffers, but do not guarantee delivery either.

This is a very simple sync channel you can use to overcome the issues above. It has two features:

  1. No need to Flush. When Track method complete - event is delivered
  2. Sending is synchronous. No new threads/tasks will be started
  3. When delivery fails - method Track will throw an exception and you can re-try.

Here is the code with the usage example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
using System;
using System.Collections.Generic;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Channel;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.ApplicationInsights.Extensibility.Implementation;

namespace DemoSyncChannel
{
    class Program
    {
        class SyncTelemetryChannel : ITelemetryChannel
        {
            private Uri endpoint = new Uri("https://dc.services.visualstudio.com/v2/track");

            public bool? DeveloperMode { get; set; }

            public string EndpointAddress { get; set; }

            public void Dispose() { }

            public void Flush() { }

            public void Send(ITelemetry item)
            {
                byte[] json = JsonSerializer.Serialize(new List<ITelemetry>() { item }, true);
                Transmission transimission = new Transmission(endpoint, json, "application/x-json-stream", JsonSerializer.CompressionType);
                var t = transimission.SendAsync();
                t.Wait();
            }
        }

        static void Main(string[] args)
        {
            TelemetryConfiguration.Active.TelemetryChannel = new SyncTelemetryChannel();

            TelemetryConfiguration.Active.InstrumentationKey = "c92059c3-9428-43e7-9b85-a96fb7c9488f";
            new TelemetryClient().TrackTrace("Sync trace");

            // this will throw exception
            TelemetryConfiguration.Active.InstrumentationKey = "invalid instrumentation key";
            new TelemetryClient().TrackTrace("Sync trace");
        }
    }
}

Friday Post - Application Insights Misspellings

| Comments

Here is the list of popular misspellings of Application Insights. Welcome to my blog if you found this post by searching one of those. Here are some Application Insights links for you.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
lication insights         application inishgt      application inssights
application insight       application inishgts     application instight
spplication insight       application inisight     application instights
application insights      application insghits     application insugfhts
aapalication insight      application insght       application isights
aoolication insight       application insghts      application isnights
aoolication insights      application insgiht      application isnisghts
aaplication insight       application insgihts     application iunsights
pplication insights       application insgith      application lisights
aaplication insights      apllication insights     application sinsights
aapplication insights     application insgiths     application sisights
appliaction insights      application insgsights   apoplication insights
appliation insight        application inshgts      applicationb insights
appliation insights       application insi         applicationinisghts
applicaino insights       application insifhts     applicationinsight
applicaion insights       application insigghts    applicationinsights
applicaition insights     application insigh       applicationinsigts
applicaiton inisght       application insighes     applicationm insights
applicaiton inisghts      application insighets    applicationn insights
applicaiton insigh        application insighhts    applicatioon insights
applicaiton insight       application insighrs     applicatipn insights
applicaiton insights      application insighs      applicatno insights
applicaiton isnights      application insight      applicatoin insights
applicaitoninsights       application insightas    applicatoininsights
aplicaiton insgights      application insightd     applicaton insight
aplicatationm insisghts   application insighte     applicaton insights
applicarion inaights      application insightes    applicatuon insights
applicarion insights      application insightrs    appliciation insight
applicartion insights     application insights     applicsation insights
aplication insight        application insightsa    applictaion insight
aplication insights       application insightsd    appliction insight
aplication insites        application insightss    appliction insights
applicat5io insigh        application insighys     appicaiotn insight
applicataion insight      application insigight    appication insghts
applicated insight        application insignts     appication insight
applicatin insights       application insigt       appication insights
applicatino insighs       application insigth      appliocation insights
applicatino insights      application insigths     applkication insghts
applicatio insight        application insigts      appllication insights
applicatio insights       application insiguts     appilcation insights
applicatio insigths       application insihts      appilication insight
applicatioin insight      application insiights    applucation insights
applicatiom insights      application insinghts    appolication insights
application hinsights     application insioghts    appplication insight
application in sights     application insisghts    appplication insights
application inaighta      application insites      applation insights
application ingishts      application insitghs     applcation insights
application inights       application insitghts    applciaiton insights
application inisghts      application insoghts     applciation insights

SSL Expiration Monitoring

| Comments

Now this blog is available via https. Thanks for the courtesy of Let’s Encrypt and great step by step instruction how to install it on Azure Web App.

SSL certificate from Let’s Encrypt expires in 3 month. Instruction above configures a web job to update certificate before it expire. However, you may want to set up an extra reminder for the certificate expiration.

Application Insights web test will fail when certificate is invalid. It will be a little bit late as certificate is already expired.

So I created a little tool that will return certificate information for a given domain name.

When you call http://webteststools.azurewebsites.net/certificate/apmtips.com/ - it will return a JSON with certificate information like this:

1
2
3
4
5
6
7
{
    "ExpirationDate":"10/18/2016 6:30:00 AM",
    "ExpiresIn10Days":false,
    "IssuerName":"CN=Let's Encrypt Authority X3, O=Let's Encrypt, C=US",
    "Subject":"CN=apmtips.com",
    "Error":false
}

So you can set up an Application Insights web test that will call that url and validate response:

Now when "ExpiresIn10Days":false will turn into "ExpiresIn10Days":true - alert will fire and there will be 10 more days to fix a certificate.

There is now a new point of failure - this new tool. If it is down - you will get a false alarm. Considering Azure Web Apps SLA and the fact that certificates do not expire too often - it may be a good compromise.