APM tips blog

Blog about application monitoring.

Find Application by It’s Instrumentation Key

| Comments

Meant to show how to use the new Azure Cloud Shell. Unfortunately two scenarios I wanted to use it for are not that easy to implement. If you have time - go comment and upvote these two issues: azure-cli#3457 and azure-cli#3641.

Here is how you can find out the name of the application given its instrumentation key. This situation is not that rare. Especially if you have access to quite a few subscriptions and monitor many services deployed to different environments and regions. You have an instrumentation key in configuration file, but not sure where to search for telemetry.

First got to Azure Cloud Shell. It gives you bash and allows you to access all your azure resources.

Second create a file findApplicationByIkey.sh with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash

if [ -z "$ikeyToFind" ]; then
    echo "specify the instrumentaiton key"
    exit
fi
echo "search for instrumentation key $1"
ikeyToFind=$1

# this function search for the instrumentation key in a given subscription
function findIKeyInSubscription {
  echo "Switch to subscription $1"
  az account set --subscription $1

  # list all the Application Insights resources.
  # for each of them take an instrumentation key 
  # and compare with one you looking for
  az resource list \
    --namespace microsoft.insights --resource-type components --query [*].[id] --out tsv \
      | while \
          read ID; \
          do  printf "$ID " && \
              az resource show --id "$ID" --query properties.InstrumentationKey --o tsv; \
        done \
      | grep "$ikeyToFind"
}

# run the search in every subscription...
az account list --query [*].[id] --out tsv \
    | while read OUT; do findIKeyInSubscription $OUT; done

Finally, run it: ./findApplicationByIkey.sh ce85cf15-de20-49bb-83d7-234b5116623b

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sergey@Azure:~/Sergey$ ./findApplicationByIkey.sh ce85cf15-de20-49bb-83d7-234b5116623b
search for instrumentation key ce85cf15-de20-49bb-83d7-234b5116623b
A few accounts are skipped as they don't have 'Enabled' state. Use '--all' to display them.
Switch to subscription 5fb94e1c-7bbf-4ab8-9c51-5dda40adc12e
Switch to subscription 52f57f24-51d5-479f-a532-facd9ee907a6
Switch to subscription eec57090-02b8-48f2-b78e-a38b7a53e1ab
/subscriptions/c3becfa8-419b-4b30-b08b-a2865ace64bf/resourceGroups/MY-RG/providers/
microsoft.insights/components/test-ai-app ce85cf15-de20-49bb-83d7-234b5116623b
Switch to subscription a8308a0b-9ee1-4548-9bbf-2b1d670e0767
The client 'Sergey@' with object id '03aa4cb5-650f-45bf-8d45-474664262685' does not have 
authorization to perform action 'Microsoft.Resources/subscriptions/resources/read' over 
scope '/subscriptions/edfd8475-8c5f-45c3-b533-a5132e8f9ada'.
Switch to subscription d6043348-75b2-41cd-ba7e-e1d317619002
...
...

The answer is: /subscriptions/c3becfa8-419b-4b30-b08b-a2865ace64bf/resourceGroups/MY-RG/providers/microsoft.insights/components/test-ai-app Better than guessing.

Page View and Telemetry Correlation

| Comments

For any monitoring and diagnostics solution, it is important to provide visibility into the transaction execution across multiple components. Application Insights data model supports telemetry correlation. So you can express interconnections of every telemetry item. A significant subset of interconnections is collected by default by Application Insights SDK. Let’s talk about page view correlations and its auto-collection.

Today you can enable telemetry correlation in JavaScript SDK setting the flag disableCorrelationHeaders to false.

1
2
3
4
// Default true. If false, the SDK will add correlation headers 
// to all dependency requests (within the same domain) 
// to correlate them with corresponding requests on the server side. 
disableCorrelationHeaders: boolean;

You get page view correlated to ajax calls and corresponding server requests. Something like shown on the picture:

As you may see, correlation assumes that page view initiated the correlation. Which is not always true. I explain scenarios later in the post.

Application Insights JavaScript SDK hijacks ajax calls and insert correlation headers to it. However, there is no easy way to correlate page views to other resources (scripts or images) without specialized browser extension or “hacky heuristics.” You can use referrer value or setting short-living cookies. But neither gives you a generic and reliable solution.

SPA or single page application may introduce multiple page views correlated to each other. React components may call/contain each other:

SPA is one of the reasons telemetry correlations is not enabled by default. SPA has only one page that initiates all communication to the server. Suddenly all application telemetry may become correlated to a single page view, which is not useful information.

BTW, ability to correlating page views is a primary reason for the github issue PageView should have own ID for proper correlation. As you see, PageViews may create their own execution hierarchy in SPA and Application Insights data model should support it.

You may also want to correlate page view with the originating server request:

It is easy to implement with the few lines of code. If you are using Application Insights Web SDK is 2.4-beta1 or higher, you can write something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
varappInsights=window.appInsights||function(config){
functioni(config){t[config]=function(){vari=arguments;t.queue.push(function(){t[config]......
    instrumentationKey:"a8cdcad4-2bcb-4ed4-820f-9b2296821ef8",
    disableCorrelationHeaders: false
});

window.appInsights = appInsights;
window.appInsights.queue.push(function () {
    var serverId ="@this.Context.GetRequestTelemetry().Context.Operation.Id";
    appInsights.context.operation.id = serverId;
});

appInsights.trackPageView();

If you are using lower version of Application Insights SDK (like 2.3) – the snippet is a bit more complicated as RequestTelemetry object needs to be initialized. But still easy:

1
2
3
4
5
var serverId ="@{
    var r = HttpContext.Current.GetRequestTelemetry();
    new Microsoft.ApplicationInsights.TelemetryClient().Initialize(r);
    @Html.Raw(r.Context.Operation.Id);
}";

This snippet renders server request ID as a JavaScript variable serverId and sets it as a context’s operation ID. So all telemetry from this page shares it with the originating server request.

This approach, however, may cause some troubles for the cached pages. Page can be cached on different layers and even shared between users. Often correlating telemetry from different users is not a desired behavior.

Also - make sure you are not making it to extreme. You may want to correlate the server request with the page view that initiated the request:

As a result, all the pages user visited are correlated. Operation ID is playing the role of session id here. I’d suggest for this kind of analysis employ some other mechanisms and not use telemetry correlation fields.

Import Datasets in Application Insights Analytics

| Comments

I was thinking how to improve querying experience in Application Insights Analytics. In the previous post, I demonstrated how to use datasets in your query. Particularly I needed timezones for countries and I used datatable operator to create a dictionary of country timezones. In this post, I show how to use Application Insights data import feature to work around the user voice request “Return ISO 2/3 letter country code in the REST API”.

I downloaded the country codes from UN website and saved it as a blob in Azure. Then defined the Application Insights Analytics open schema by uploading this file as an example. I named columns and asked to use ingestion time as a required time column.

Then I used code example from the documentation for data import feature. Get the reference to the blob, created a security token and notified Application Insights about this blob storage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var storageAccount
= CloudStorageAccount.Parse(ConfigurationManager.AppSettings.Get("StorageConnectionString"));
var blobClient = storageAccount.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("testopenschema");
var blob = container.GetBlobReferenceFromServer("countrycodes.csv");

var sasConstraints = new SharedAccessBlobPolicy();
sasConstraints.SharedAccessExpiryTime = DateTimeOffset.MaxValue;
sasConstraints.Permissions = SharedAccessBlobPermissions.Read;
string uri = blob.Uri + blob.GetSharedAccessSignature(sasConstraints);

AnalyticsDataSourceClient client = new AnalyticsDataSourceClient();
var ingestionRequest = new AnalyticsDataSourceIngestionRequest(
    ikey: "074608ec-29c0-41f1-a7c6-54f30d520629",
    schemaId: "440f9d45-9b1f-4760-9aa5-3d1bc828cedc",
    blobSasUri: uri);

await client.RequestBlobIngestion(ingestionRequest);

Originally I made a bug in an application and received the error. It means that the security token is verified right away. However the actual data upload happens after some delay. So set the expiration time for some time in future.

1
2
Ingestion request failed with status code: Forbidden.
    Error: Blob does not exist or not accessible.

Here is how successful requests and response look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
POST https://dc.services.visualstudio.com/v2/track HTTP/1.1
Content-Type: application/json; charset=UTF-8
Accept: application/json
Host: dc.services.visualstudio.com
Content-Length: 472

{
    "data": {
        "baseType":"OpenSchemaData",
        "baseData": {
            "ver":"2",
            "blobSasUri":"https://apmtips.blob.core.windows.net/testopenschema/countrycodes.csv?sv=2016-05-31&sr=b&sig=y3oWWTWvAefer7N%2FN%2B49sy4j%2BpR2NA%2F7797EvXQAQEk%3D&se=2017-05-12T00%3A09%3A12Z&sp=rl",
            "sourceName":"440f9d45-9b1f-4760-9aa5-3d1bc828cedc",
            "sourceVersion":"1"
        }
    },
    "ver":1,
    "name":"Microsoft.ApplicationInsights.OpenSchema",
    "time":"2017-05-11T00:09:14.6255207Z",
    "iKey":"074608ec-29c0-41f1-a7c6-54f30d520629"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
HTTP/1.1 200 OK
Content-Length: 49
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/8.5
x-ms-session-id: 0C2E28FE-6085-4DD7-BFB9-8A6195C73A2A
Strict-Transport-Security: max-age=31536000
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Name, Content-Type, Accept
Access-Control-Allow-Origin: *
Access-Control-Max-Age: 3600
X-Content-Type-Options: nosniff
X-Powered-By: ASP.NET
Date: Thu, 11 May 2017 00:09:15 GMT

{"itemsReceived":1,"itemsAccepted":1,"errors":[]}

Once data uploaded you can query it by joining standard tables with the imported data:

1
2
3
4
pageViews
  | join kind= innerunique (CountryCodes_CL)
      on $left.client_CountryOrRegion == $right.CountryOrRegion
  | project name, ISOalpha3code

Refresh data in this table periodically as Application Insights keeps data only for 90 days. You can set up an Azure Function to run every 90 days.

By the way, imported logs are also billed by size. You see it as a separate table in the bill blade. You can see how many times I run the application trying things =)…

Ip Lookup

| Comments

Application Insights makes automatic ip lookup for your telemetry. Geo information can be quite useful for monitoring, troubleshooting and usage scenarios.

I already wrote about IP address collection. Application Insights collects an IP address of the monitored service visitor. So you can group telemetry by the country of origin. This will allow to filter out long executing AJAX calls made from the countries with high latency or group usage metrics by “nighttime” visitors vs. “daytime” visitors.

First, a word of caution. Application Insights is using the snapshot of MaxMind geo IP database (Credits) from some time ago. So it may give wrong results at times and is not in sync with the demo.

For instance, this query demonstrates that not all availability tests locations geo mapped correctly by Application Insights.

1
2
3
4
5
6
7
8
9
availabilityResults 
  | where timestamp > ago(10m) 
  | join (requests 
    | where timestamp > ago(10m)) on $left.id == $right.session_Id
  | extend 
    originatingLocation = location, 
    recievedLocation = strcat(client_CountryOrRegion, " ", client_StateOrProvince, " ", client_City)
  | summarize count() 
    by originatingLocation, recievedLocation, client_IP 

This is a resultgin view. Note, some locations were not correctly mapped and some do not have city associated with it:

originatingLocation recievedLocation client_IP
US : CA-San Jose United States California San Jose 207.46.98.0
US : FL-Miami United States Florida Miami 65.54.78.0
US : TX-San Antonio United States Texas San Antonio 65.55.82.0
NL : Amsterdam Netherlands North Holland Amsterdam 213.199.178.0
US : IL-Chicago United States Illinois Chicago 207.46.14.0
IE : Dublin Ireland Leinster Dublin 157.55.14.0
JP : Kawaguchi Japan Tokyo Tokyo 202.89.228.0
RU : Moscow United Kingdom 94.245.82.0
CH : Zurich United Kingdom 94.245.66.0
HK : Hong Kong Hong Kong Long Keng 207.46.71.0
AU : Sydney United States Washington Redmond 70.37.147.0
BR : Sao Paulo Brazil Sao Paulo São Paulo 65.54.66.0
SE : Stockholm United Kingdom 94.245.78.0
SG : Singapore United States Delaware Wilmington 52.187.30.0
US : VA-Ashburn United States 13.106.106.0
FR : Paris United Kingdom 94.245.72.0

Try this query yourself for an up to date information.

I authored a simple query to check whether my blog is read during the day or night. This demo is not produciton ready and I might have messed with the timezones. However for an adhoc analysis it was OK. It also demonstrates the use of the operators datatable and the power of join:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
let timezones = datatable (timezone_location:string, shift:time)
    [
        "United States", time(-6h),
        "Canada", time(-6h),
        "Japan", time(9h),
        "Brazil", time(-3h),
        "United Kingdom", time(0),
        "Hong Kong", time(8h),
        "Ireland", time(0),
        "Switzerland", time(2h),
        "Slovenia", time(1h),
        "South Africa", time(2h),
        "Sweden", time(1h),
        "Poland", time(1h),
        "Ukraine", time(2h),
        "Netherlands", time(2h),
    ];
pageViews
 | extend timezone_location = client_CountryOrRegion
 | where timestamp > ago(10h) and timestamp < ago(5h)
 | join kind= leftouter (
     timezones
 ) on timezone_location
 | extend localtimehour = datepart("Hour", timestamp + shift)
 | project name, timezone_location, timestamp, localtimehour, isDay = iff(localtimehour > 5 and localtimehour < 20, "day", "night")
 | summarize count() by isDay
 | render piechart

Here is the result view:

Oneliner to Send Event to Application Insights

| Comments

Sometimes you need to send event to Application Insights from the command line and you cannot download the ApplicationInsights.dll and use powershell script like described here. You may need it for your startup task or deployment script. It’s a good thing Application Insights has an easy to use REST API. Here is a single line command line that runs powershell and pass the script as a parameter. I split it into multiple lines for readability, you will need to remove all newlines before running. Just replace an event name and add custom properties if needed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
powershell "$body = (New-Object PSObject 
    | Add-Member -PassThru NoteProperty name 'Microsoft.ApplicationInsights.Event' 
    | Add-Member -PassThru NoteProperty time $([System.dateTime]::UtcNow.ToString('o')) 
    | Add-Member -PassThru NoteProperty iKey "1aadbaf5-1497-ae49-8e89-cd0324aafe6b" 
    | Add-Member -PassThru NoteProperty tags (New-Object PSObject 
    | Add-Member -PassThru NoteProperty 'ai.cloud.roleInstance' $env:computername 
    | Add-Member -PassThru NoteProperty 'ai.internal.sdkVersion' 'one-line-ps:1.0.0') 
    | Add-Member -PassThru NoteProperty data (New-Object PSObject 
        | Add-Member -PassThru NoteProperty baseType 'EventData' 
        | Add-Member -PassThru NoteProperty baseData (New-Object PSObject 
            | Add-Member -PassThru NoteProperty ver 2 
            | Add-Member -PassThru NoteProperty name 'Event from one line script' 
            | Add-Member -PassThru NoteProperty properties (New-Object PSObject 
                | Add-Member -PassThru NoteProperty propName 'propValue')))) 
    | ConvertTo-JSON -depth 5; 
    Invoke-WebRequest -Uri 'https://dc.services.visualstudio.com/v2/track' -Method 'POST' -UseBasicParsing -body $body" 

Running it will return the status:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
StatusCode        : 200
StatusDescription : OK
Content           : {"itemsReceived":1,"itemsAccepted":1,"errors":[]}
RawContent        : HTTP/1.1 200 OK
                    x-ms-session-id: 960F3184-51B6-4E74-B113-88ACD106B7F3
                    Strict-Transport-Security: max-age=31536000
                    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Name, Content-Type,...
Forms             :
Headers           : {[x-ms-session-id, 960F3184-51B6-4E74-B113-88ACD106B7F3], [Strict-Transport-Security,
                    max-age=31536000], [Access-Control-Allow-Headers, Origin, X-Requested-With, Content-Name,
                    Content-Type, Accept], [Access-Control-Allow-Origin, *]...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        :
RawContentLength  : 49

And event will look like this in Application Insights Analytics.

name value
timestamp 2017-03-27T15:25:11.788Z
name Event from one line script
customDimensions {“propName”:“propValue”}
client_Type PC
client_Model Other
client_OS Windows 10
client_IP 167.220.1.0
client_City Redmond
client_StateOrProvince Washington
client_CountryOrRegion United States
client_Browser Other
cloud_RoleInstance SERGKANZ-VM
appId d4cbb70f-f58f-ac6d-8457-c2e326fcc587
appName test-application
iKey 1aadbaf5-1497-ae49-8e89-cd0324aafe6b
sdkVersion one-line-ps:1.0.0
itemId 927362e0-1301-11e7-88a4-211449da9ad2
itemType customEvent
itemCount 1

Note, sender’s IP address and location was added to the event. Also powershell will set the User-Agent like this User-Agent: Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.15063.0 so Application Insights detected that event was sent from Windows 10 machine.

It is much easier to use Application Insights using one of numerous SDKs, but when you need it - you can send data directly.

Clone Application Insights Dashboard for a Different Application

| Comments

You may have many environments where your application is running. For every environment you will create a separate Application Insights resource so you can set up access and billing for the production telemetry differnetly from the QA environment. However you may want to have the same dashboard for every environment. You may even want to deploy the dashboard updates alongside the application deployment. So when your application exposes new telemetry - dashboard will visualize it.

This blog post explains how to clone the dashboard and retarget it to the different Application Insights component using Azure Resource Management (ARM).

Let’s say you have a Dashboard A for the component A and you want to create the same dashboard for component B. In my example I simply pinned the servers chart to the dashboard, but it may be way more advanced in your case.

In order to clone the dashboard you need to share it first. Sharing places the dashboard definition into the resource group so you can see it in Azure Resource Management portal.

Once shared the URL for your dashboard will look like this: https://portal.azure.com/#dashboard/arm/subscriptions/6b984a40-aa54-452b-b975-acc3bf105fa7/resourcegroups/dashboards/providers/microsoft.portal/dashboards/7a2a64c5-a661-47c1-a1a3-afae823d7533. It includes subscription, resource group and dashboard unique name. Copy the dashbaord unique name (in this case 7a2a64c5-a661-47c1-a1a3-afae823d7533) and find it at https://resources.azure.com

Direct URL to your dashboard definition will look like this: https://resources.azure.com/subscriptions/6b984a40-aa54-452b-b975-acc3bf105fa7/resourceGroups/dashboards/providers/Microsoft.Portal/dashboards/7a2a64c5-a661-47c1-a1a3-afae823d7533

Now you can copy the dashboard definition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
{
  "properties": {
    "lenses": {
      "0": {
        "order": 0,
        "parts": {
          "0": {
            "position": {
              "x": 0,
              "y": 0,
              "rowSpan": 5,
              "colSpan": 6
            },
            "metadata": {
              "inputs": [
                {
                  "name": "ComponentId",
                  "value": {
                    "SubscriptionId": "6b984a40-aa54-452b-b975-acc3bf105fa7",
                    "ResourceGroup": "A",
                    "Name": "A"
                  }
                },
                {
                  "name": "MetricsExplorerJsonDefinitionId",
                  "value": "pinJson:?name={\n  \"version\": \"1.4.1\",\n  \"isCustomDataModel\": false,\n  \"items\": [\n    {\n      \"id\": \"b2f8708b-4a48-4b35-b96e-7622caca21ce\",\n      \"chartType\": \"Area\",\n      \"chartHeight\": 4,\n      \"metrics\": [\n        {\n          \"id\": \"performanceCounter.percentage_processor_time.value\",\n          \"metricAggregation\": \"Avg\",\n          \"color\": \"msportalfx-bgcolor-g0\"\n        }\n      ],\n      \"priorPeriod\": false,\n      \"clickAction\": {\n        \"defaultBlade\": \"SearchBlade\"\n      },\n      \"horizontalBars\": true,\n      \"showOther\": true,\n      \"aggregation\": \"Avg\",\n      \"percentage\": false,\n      \"palette\": \"blueHues\",\n      \"yAxisOption\": 0\n    },\n    {\n      \"id\": \"093583d1-bc86-4c2e-91d8-527a2411910b\",\n      \"chartType\": \"Area\",\n      \"chartHeight\": 1,\n      \"metrics\": [\n        {\n          \"id\": \"performanceCounter.available_bytes.value\",\n          \"metricAggregation\": \"Avg\",\n          \"color\": \"msportalfx-bgcolor-j1\"\n        }\n      ],\n      \"priorPeriod\": false,\n      \"clickAction\": {\n        \"defaultBlade\": \"SearchBlade\"\n      },\n      \"horizontalBars\": true,\n      \"showOther\": true,\n      \"aggregation\": \"Avg\",\n      \"percentage\": false,\n      \"palette\": \"greenHues\",\n      \"yAxisOption\": 0\n    },\n    {\n      \"id\": \"03fd5488-b020-417b-97e2-bf7564568d3b\",\n      \"chartType\": \"Area\",\n      \"chartHeight\": 1,\n      \"metrics\": [\n        {\n          \"id\": \"performanceCounter.io_data_bytes_per_sec.value\",\n          \"metricAggregation\": \"Avg\",\n          \"color\": \"msportalfx-bgcolor-g0\"\n        }\n      ],\n      \"priorPeriod\": false,\n      \"clickAction\": {\n        \"defaultBlade\": \"SearchBlade\"\n      },\n      \"horizontalBars\": true,\n      \"showOther\": true,\n      \"aggregation\": \"Avg\",\n      \"percentage\": false,\n      \"palette\": \"blueHues\",\n      \"yAxisOption\": 0\n    },\n    {\n      \"id\": \"c31fd4cc-be41-449e-a657-d16d2e9c8487\",\n      \"chartType\": \"Area\",\n      \"chartHeight\": 1,\n      \"metrics\": [\n        {\n          \"id\": \"performanceCounter.number_of_exceps_thrown_per_sec.value\",\n          \"metricAggregation\": \"Avg\",\n          \"color\": \"msportalfx-bgcolor-d0\"\n        }\n      ],\n      \"priorPeriod\": false,\n      \"clickAction\": {\n        \"defaultBlade\": \"SearchBlade\"\n      },\n      \"horizontalBars\": true,\n      \"showOther\": true,\n      \"aggregation\": \"Avg\",\n      \"percentage\": false,\n      \"palette\": \"fail\",\n      \"yAxisOption\": 0\n    },\n    {\n      \"id\": \"8b942f02-ef58-46ac-877a-2f4c16a17a4f\",\n      \"chartType\": \"Area\",\n      \"chartHeight\": 1,\n      \"metrics\": [\n        {\n          \"id\": \"performanceCounter.requests_per_sec.value\",\n          \"metricAggregation\": \"Avg\",\n          \"color\": \"msportalfx-bgcolor-b2\"\n        }\n      ],\n      \"priorPeriod\": false,\n      \"clickAction\": {\n        \"defaultBlade\": \"SearchBlade\"\n      },\n      \"horizontalBars\": true,\n      \"showOther\": true,\n      \"aggregation\": \"Avg\",\n      \"percentage\": false,\n      \"palette\": \"warmHues\",\n      \"yAxisOption\": 0\n    }\n  ],\n  \"title\": \"Servers\",\n  \"currentFilter\": {\n    \"eventTypes\": [\n      10\n    ],\n    \"typeFacets\": {},\n    \"isPermissive\": false\n  },\n  \"jsonUri\": \"MetricsExplorerPinJsonDefinitionId - Dashboard.f9bfee41-bd32-47a7-ae11-7d2038cd3c44 - Pinned from 'AspNetServersMetrics'\"\n}"
                },
                {
                  "name": "BladeId",
                  "value": "Dashboard.f9bfee41-bd32-47a7-ae11-7d2038cd3c44"
                },
                {
                  "name": "TimeContext",
                  "value": {
                    "durationMs": 86400000,
                    "createdTime": "2017-03-23T19:54:01.552Z",
                    "isInitialTime": false,
                    "grain": 1,
                    "useDashboardTimeRange": false
                  }
                },
                {
                  "name": "Version",
                  "value": "1.0"
                },
                {
                  "name": "DashboardTimeRange",
                  "value": {
                    "relative": {
                      "duration": 1440,
                      "timeUnit": 0
                    }
                  },
                  "isOptional": true
                }
              ],
              "type": "Extension/AppInsightsExtension/PartType/MetricsExplorerOutsideMEBladePart",
              "settings": {},
              "viewState": {
                "content": {}
              },
              "asset": {
                "idInputName": "ComponentId",
                "type": "ApplicationInsights"
              }
            }
          }
        }
      }
    },
    "metadata": {
      "model": {
        "timeRange": {
          "value": {
            "relative": {
              "duration": 24,
              "timeUnit": 1
            }
          },
          "type": "MsPortalFx.Composition.Configuration.ValueTypes.TimeRange"
        }
      }
    }
  },
  "id": "/subscriptions/6b984a40-aa54-452b-b975-acc3bf105fa7/resourceGroups/dashboards/providers/Microsoft.Portal/dashboards/7a2a64c5-a661-47c1-a1a3-afae823d7533",
  "name": "7a2a64c5-a661-47c1-a1a3-afae823d7533",
  "type": "Microsoft.Portal/dashboards",
  "location": "centralus",
  "tags": {
    "hidden-title": "Dashboard A"
  }
}

In order to retarget the dashboard just find all mentions of your Application Insights component and replace it to the new component. In my example there were only one mention:

1
2
3
4
5
6
7
8
9
"inputs": [
{
    "name": "ComponentId",
    "value": {
        "SubscriptionId": "6b984a40-aa54-452b-b975-acc3bf105fa7",
        "ResourceGroup": "B",
        "Name": "B"
    }
},.

Then rename the dashboard:

1
2
3
4
5
6
7
8
"id": "/subscriptions/6b984a40-aa54-452b-b975-acc3bf105fa7/resourceGroups
                    /dashboards/providers/Microsoft.Portal/dashboards/DashboardB",
"name": "DashboardB",
"type": "Microsoft.Portal/dashboards",
"location": "centralus",
"tags": {
    "hidden-title": "Dashboard B"
}

You can create the new dashboard in the ARM portal now. Type “DashboardB” as {Resource Name} and updated JSON as definition.

and start using your dashboard in the portal. Note, one perk of creating the dashboard manually is that the unique name of the dashboard you created is human readable, not the guid: https://portal.azure.com/#dashboard/arm/subscriptions/6b984a40-aa54-452b-b975-acc3bf105fa7/resourcegroups/dashboards/providers/microsoft.portal/dashboards/dashboardb

With Azure Resource Management you can automate this process and configure dashboards update/deployments alongside with the application. So the monitoring configuration will be a part of your service definition.

When 404 Is Not Tracked by Application Insights

| Comments

Sometimes Application Insights wouldn’t track web requests made with the bad routes resulting in the response code 404. The reason may not be clear initially. However once you opened the application from localhost and see the standard IIS error page - it become clearer. Without the default route set up in your applicaiton - 404 will be returned by StaticFile handler, not by the managed handler. This is what the error page says:

Easiest and most straightforward workaround is to change a web.config according to this blog post - add runAllManagedModulesForAllRequests="true" and remove preCondition="managedHandler":

1
2
3
4
5
<modules runAllManagedModulesForAllRequests="true">
  <remove name="ApplicationInsightsWebTracking" />
  <add name="ApplicationInsightsWebTracking"
   type="Microsoft.ApplicationInsights.Web.ApplicationInsightsHttpModule, Microsoft.AI.Web"/>
</modules>

This way Application Insights http module will be working on every request and you’ll capture all requests made to the bad routes.

Enable Application Insights Live Metrics From Code

| Comments

Small tip on how to enable Application Insights Live Metrics from code.

Application Insights allows to view telemetry like CPU and memory in real time. The feature is called Live Metrics. We also call it Quick Pulse. You’d typically use it when something is happenning with your application. Deploying a new version, investigating an ongoing incident or scaling it out. You can use it free of charge as a traffic to Live Stream endpoint is not counted towards the bill.

The feature is implemented in a NuGet Microsoft.ApplicationInsights.PerfCounterCollector. If you are using ApplicationInsights.config to configure monitoring you need to add a telemetry module and telemetry processor like you’d normally do:

1
2
3
4
5
6
7
8
9
<TelemetryModules>
  <Add Type="Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.
    QuickPulse.QuickPulseTelemetryModule, Microsoft.AI.PerfCounterCollector"/>
</TelemetryModules>

<TelemetryProcessors>
  <Add Type="Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.
    QuickPulse.QuickPulseTelemetryProcessor, Microsoft.AI.PerfCounterCollector"/>
<TelemetryProcessors>

However simply adding them in code like you’d expect wouldn’t work:

1
2
3
4
5
6
7
8
9
TelemetryConfiguration configuration = new TelemetryConfiguration();
configuration.InstrumentationKey = "9d3ebb4f-7a11-4fb1-91ac-7ca8a17a27eb";

configuration.TelemetryProcessorChainBuilder
    .Use((next) => { return new QuickPulseTelemetryProcessor(next); })
    .Build();

var QuickPulse = new QuickPulseTelemetryModule();
QuickPulse.Initialize(configuration);

You need to “connect” module and processor. So you’d need to store the processor when constructing the chain and register it with the telemetry module. The code will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
TelemetryConfiguration configuration = new TelemetryConfiguration();
configuration.InstrumentationKey = "9d3ebb4f-7a11-4fb1-91ac-7ca8a17a27eb";

QuickPulseTelemetryProcessor processor = null;

configuration.TelemetryProcessorChainBuilder
    .Use((next) =>
    {
        processor = new QuickPulseTelemetryProcessor(next);
        return processor;
    })
    .Build();

var QuickPulse = new QuickPulseTelemetryModule();
QuickPulse.Initialize(configuration);
QuickPulse.RegisterTelemetryProcessor(processor);

Now with the few lines of code you can start monitoring your application in real time for free.

Fast OPTIONS Response Using Url Rewrite

| Comments

Imagine you run a high load web application. If this application should be accessible from the different domains you need to configure your server to correctly respond to OPTIONS requests. With IIS - it is easy to configure UrlRewrite rule that will reply with the preconfigured headers without any extra processing cost.

You need to configure inbound rule that matches {REQUEST_METHOD} and reply 200 immidiately. Also you’d need a set of outbound rules which will set a proper response headers like Access-Control-Allow-Methods. It will look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<rewrite>
    <outboundRules>
        <rule name="Set Access-Control-Allow-Methods for OPTIONS response" preCondition="OPTIONS" patternSyntax="Wildcard">
            <match serverVariable="RESPONSE_Access-Control-Allow-Methods" pattern="*" negate="false" />
            <action type="Rewrite" value="POST" />
        </rule>
        <rule name="Set Access-Control-Allow-Headers for OPTIONS response" preCondition="OPTIONS" patternSyntax="Wildcard">
            <match serverVariable="RESPONSE_Access-Control-Allow-Headers" pattern="*" negate="false" />
            <action type="Rewrite" value="Origin, X-Requested-With, Content-Name, Content-Type, Accept" />
        </rule>
        <rule name="Set Access-Control-Allow-Origin for OPTIONS response" preCondition="OPTIONS" patternSyntax="Wildcard">
            <match serverVariable="RESPONSE_Access-Control-Allow-Origin" pattern="*" negate="false" />
            <action type="Rewrite" value="*" />
        </rule>
        <rule name="Set Access-Control-Max-Age for OPTIONS response" preCondition="OPTIONS" patternSyntax="Wildcard">
            <match serverVariable="RESPONSE_Access-Control-Max-Age" pattern="*" negate="false" />
            <action type="Rewrite" value="3600" />
        </rule>
        <rule name="Set X-Content-Type-Options for OPTIONS response" preCondition="OPTIONS" patternSyntax="Wildcard">
            <match serverVariable="RESPONSE_X-Content-Type-Options" pattern="*" negate="false" />
            <action type="Rewrite" value="nosniff" />
        </rule>
        <preConditions>
            <preCondition name="OPTIONS">
                <add input="{REQUEST_METHOD}" pattern="OPTIONS" />
            </preCondition>
        </preConditions>
    </outboundRules>
    <rules>
    <rule name="OPTIONS" patternSyntax="Wildcard" stopProcessing="true">
        <match url="*" />
        <conditions logicalGrouping="MatchAny">
            <add input="{REQUEST_METHOD}" pattern="OPTIONS" />
        </conditions>
        <action type="CustomResponse" statusCode="200" subStatusCode="0" statusReason="OK" statusDescription="OK" />
    </rule>
    </rules>
</rewrite>

I did some measurements locally and found that this simple rule saves a lot of CPU under high load. You can add this rule to your site web.config or for Azure Web Apps you can configure these rules using applicationHost.xdt file.

Now you configured it - how will you make sure it is working in production? Application Insights allows to run a multi-step availability tests. Configuring one for OPTIONS required two hacks.

First, Visual Studio didn’t allow to pick OPTIONS http method. Only GET and POST. To workaround this issue I simply opened my .webtest file in text editor and manually set the method to the value I needed:

1
<Request Method="OPTIONS" Version="1.1" Url="https://dc.services.visualstudio.com/v2/track"..

Second, there is no built-in response header value validator. So I configured the web test to run “bad” request if the value of extracted response header doesn’t match the expected value.

After I configured my web test I can see the test results in standard UI or simply run a query like in Application Analytics.

1
2
3
4
availabilityResults
| where timestamp > ago(1d)
| where name == "OPTIONS"
| summarize percentile(duration, 99) by location, bin(timespan, 15m)

Deployments, Scale Ups and Downs

| Comments

I lost track of what deployed in staging slot of a cloud service once. I also was wondering how other people deploying that service. This post shows how you can answer questions like this using Application Insights Analytics queries.

The service I am looking at is deployed as two cloud services in different regions. It uses automatic code versioning using BuildInfo.config file. New version is deployed in staging slot and then VIP swapped into production.

As I said Application Insights is configured to report application version with every telemetry item. So you can group by application version and find when new version got deployed.

1
2
3
4
performanceCounters
| where timestamp >= ago(5d)
| where name == "Requests/Sec" 
| summarize dcount(cloud_RoleInstance) by application_Version, bin(timestamp, 15m)

The query above detects deployments to staging, but it will not detect the VIP swap accurately. When VIP swap happens the same computers are running the same code. So the number of role instances reporting specific application version in the query above does not change. The only thing changes during the VIP swap is a virtual IP address of those computers.

I posted before how Application Insights will associate the IP address of incoming connection with the telemetry item if telemetry item by itself doesn’t have it specified. So all the performance counters will have client_IP field of the incoming connection. In case of cloud service it will be an IP address of the slot sending telemetry. Let’s use this fact and extend application_Version with the client_IP.

1
2
3
4
5
6
7
let interval = 5d;
performanceCounters
| where timestamp >= ago(interval)
| where name == "Requests/Sec" 
| extend deployment = strcat(application_Version, " ", client_IP)
| summarize dcount(cloud_RoleInstance) by deployment, bin(timestamp, 5m)
| render areachart

This query gave me this picture:

There are two regions this application is deployed to. Hence two general areas - 5 instances and 3 instances. You can also see the spikes when deployments were happening. You can also notice that staging slot doesn’t last long. Spike is very short. Turns out that the staging computers are shut down as part of a release procedure. Typically you would see scaled down number of staging computers running all the time to speed up the rollback when it’s needed.

Let’s zoom into the single deployment:

1
2
3
4
5
6
7
8
9
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
performanceCounters
| where timestamp >= fromDate
| where timestamp <= toDate
| where name == "Requests/Sec" 
| extend deployment = strcat(application_Version, " ", client_IP)
| summarize dcount(cloud_RoleInstance) by deployment, bin(timestamp, 1m)
| render areachart  

The result is quite interesting:

You can see the new version of an application deployed into the staging environment in one region and running for ~10 minutes. The same version was deployed in the staging of a different region for much shorter time. It seems that the production traffic started the application initialization after VIP swap. Which typically a bad practice, by the way. At least some smoke tests needs to be run against the staging slot to validate the configuration.

Dig deeper

Analyzing the picture is not easy. Let’s modify the query to print out every deployment, scale up and scale down. Basically, we need to query for every time interval when the previous interval had a different number of role instances reporting the same application version.

Here is a query that returns number of instances per minute:

1
2
3
4
5
6
7
8
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
performanceCounters
| where timestamp >= _fromDate
| where timestamp <= _toDate
| where name == "Requests/Sec" 
| summarize num_instances = dcount(cloud_RoleInstance) 
    by application_Version, client_IP, bin(timestamp, 1m) };

You can call this query query(fromDate, toDate). Now let’s join it with the same results a minute back:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
  performanceCounters
    | where timestamp >= _fromDate
    | where timestamp <= _toDate
    | where name == "Requests/Sec" 
    | summarize num_instances = dcount(cloud_RoleInstance) 
        by application_Version, client_IP, bin(timestamp, 1m) 
};
query(fromDate, toDate) | extend ttt = timestamp | join kind=leftouter 
(
  query(fromDate - 1m, toDate + 1m) | extend ttt = timestamp + 1m
) on ttt, application_Version, client_IP

Note the use of leftouter join in the query. The only thing left is to filter the results and make it more human readable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
let fromDate = datetime(2017-01-18 21:50:00z);
let toDate = datetime(2017-01-18 22:15:00z);
let query = (_fromDate:datetime, _toDate:datetime) 
{ 
performanceCounters
| where timestamp >= _fromDate
| where timestamp <= _toDate
| where name == "Requests/Sec" 
| summarize num_instances = dcount(cloud_RoleInstance) by application_Version, client_IP, bin(timestamp, 1m) };
query(fromDate, toDate) | extend ttt = timestamp | join kind=leftouter (
query(fromDate - 1m, toDate + 1m) | extend ttt = timestamp + 1m
) on ttt, application_Version, client_IP
| project timestamp, before = num_instances1, after = num_instances, application_Version, client_IP
| where after != before
| extend name = 
  strcat( 
      iff(isnull(before), "Deployment", iff(after > before, "Scale Up", "Scale Down")),
      " in ",
      iff(client_IP == "52.175.18.0" or client_IP == "13.77.108.0", "Production", "Staging")
  )
| order by timestamp 

The resulting table will look like this:

timestamp before after application_Version client_IP name
2017-01-18T21:54:00Z null 2 vstfs:///Build/Build/3562348 13.77.107.0 Deployment in Staging
2017-01-18T21:59:00Z 2 3 vstfs:///Build/Build/3562348 13.77.107.0 Scale Up in Staging
2017-01-18T22:06:00Z 3 2 vstfs:///Build/Build/3555787 52.175.18.0 Scale Down in Production
2017-01-18T22:07:00Z 2 3 vstfs:///Build/Build/3555787 52.175.18.0 Scale Up in Production
2017-01-18T22:07:00Z 5 1 vstfs:///Build/Build/3555787 13.77.108.0 Scale Down in Production
2017-01-18T22:07:00Z null 3 vstfs:///Build/Build/3555787 13.77.107.0 Deployment in Staging
2017-01-18T22:08:00Z null 3 vstfs:///Build/Build/3562348 13.77.108.0 Deployment in Production
2017-01-18T22:09:00Z 3 5 vstfs:///Build/Build/3562348 13.77.108.0 Scale Up in Production
2017-01-18T22:09:00Z 3 2 vstfs:///Build/Build/3555787 52.175.18.0 Scale Down in Production
2017-01-18T22:09:00Z null 1 vstfs:///Build/Build/3555787 168.63.221.0 Deployment in Staging
2017-01-18T22:10:00Z null 3 vstfs:///Build/Build/3562348 52.175.18.0 Deployment in Production

Using ad hoc analytical queries I found that deployments of this service can be improved. Smoke tests should be added for the staging deployment and staging machines should run for some time after deployment in case you’d need to VIP swap the deployment back.

Automatically detect deployments and scale up and downs may be useful in other scenarios. You may want to notify the service owner by writing a connector for your favorite chat platform. Or you can list the latest deployment to production and staging to know what and when was deployed. You can even report those deployments back to Application Insights as release annotation to see markers on charts. With the power of Analytical Queries in Application Insights it is easy to automate any of these scenarios.