Having worked with SCOM for a number of years, one of things I grew to really like is some of the performance reporting available from the SCOM Data Warehouse. Presently, Log Analytics offers no real out of box performance reporting. Some of the solutions offer dashboards and things like that. In this post I’ll be showing you how I created a Log Analytics Server Performance Report. As always with Log Analytics there is more than one way to accomplish the same result. By no means am I saying this is the only way to do it.
This is one of my favorite reports out of SCOM, I’ll be using it sort of as a target to get the same type of data and report out of our Azure Log Analytics performance data.
The Queries
First, we’ll use a few different summarizing queries and joins a lot of which I talked about here.
I’ll start by getting the avg, min and max of CPU usage.
//Get CPU Max, Min and Avg let CPU = Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and TimeGenerated > ago(Time) | summarize AvgCPU = avg(CounterValue) , MinCPU = min(CounterValue) , MaxCPU = max(CounterValue) by Computer, bin(TimeGenerated, Bin); //end CPU
I’ll do the same for Memory.
//Get Memory Max, Min, and Avg let Memory = Perf | where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use" and TimeGenerated > ago(Time) | summarize AvgMemory = avg(CounterValue) , MinMemory = min(CounterValue) , MaxMemory = max(CounterValue) by Computer, bin(TimeGenerated, Bin); //end Memory
Next, I use an inner unique join to join the two data sets together and only project the fields I want to carry over.
//join CPU and Memory together let Report = CPU | join (Memory) on Computer | project Computer, TimeGenerated, AvgMemory, MinMemory, MaxMemory, AvgCPU, MinCPU, MaxCPU; //end join
You might ask why I didn’t use project on each of these first queries to only carry over importan fields. Recall from my Summarize operator post, that the summarize operator actually works like project. Only the fields behind summarize get carried over.
The last metrics I’ll get, at least for now, are % Free Space, Avg Disk read/write metrics for LogicalDisk.
//get disk metrics Free Space, Write and Read let Disk = Perf | where (ObjectName == "LogicalDisk" and CounterName == "Avg. Disk sec/Write" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total") //and InstanceName == "_Total" ) or (ObjectName == "LogicalDisk" and CounterName == "Avg. Disk sec/Read" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total") or (ObjectName == "LogicalDisk" and CounterName == "% Free Space" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total") and TimeGenerated > ago(Time) | summarize avg(CounterValue) by Computer, InstanceName , DiskMetrics=CounterName, bin(TimeGenerated, Bin) | evaluate pivot(InstanceName, avg(avg_CounterValue)); //end Disk
There’s a few things going on in this query, first we’re getting 3 different metrics in one where clause, summarizing their avg and then I rename the CounterName to DiskMetrics and pivot the instance name. The reason I did this was because I could not do joins to the other data without it either creating extra rows that had CPU/Memory metrics empty in certain spots or it would eliminate some of my drives, IE my file server has a number of drive letters. If you only have 1 disk then you can eliminate the pivot portion of this query.
and finally I join the disk data set to the Report data set that already contains CPU and Memory.
Report | join (Disk) on Computer | extend TimeGenerated = TimeGenerated1 | extend format_datetime(TimeGenerated, Date) | project-away TimeGenerated1, Computer1 | sort by TimeGenerated, Computer desc
Throughout the queries you may have noticed a few things like Time, Date, and Bin. At the top of my query I defined these as variables, so should you wish to run the report for only the last 5 days, you can set Time to 5d, the same with Bin and Date for formatting the date. If you bin by hours I would recommend changing your date format to include hours.
The full query
//set report time frame let Time = 35d; //set report bin time frame let Bin = 1d; //set date format let Date = 'MM-dd-yyyy'; //Get CPU Max, Min and Avg let CPU = Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and TimeGenerated > ago(Time) | summarize AvgCPU = avg(CounterValue) , MinCPU = min(CounterValue) , MaxCPU = max(CounterValue) by Computer, bin(TimeGenerated, Bin); //end CPU //Get Memory Max, Min, and Avg let Memory = Perf | where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use" and TimeGenerated > ago(Time) | summarize AvgMemory = avg(CounterValue) , MinMemory = min(CounterValue) , MaxMemory = max(CounterValue) by Computer, bin(TimeGenerated, Bin); //end Memory //join CPU and Memory together let Report = Memory | join (CPU) on Computer | project Computer, TimeGenerated, AvgMemory, MinMemory, MaxMemory, AvgCPU, MinCPU, MaxCPU; //end join //get disk metrics Free Space, Write and Read let Disk = Perf | where (ObjectName == "LogicalDisk" and CounterName == "Avg. Disk sec/Write" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total" and TimeGenerated > ago(Time)) or (ObjectName == "LogicalDisk" and CounterName == "Avg. Disk sec/Read" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total" and TimeGenerated > ago(Time)) or (ObjectName == "LogicalDisk" and CounterName == "% Free Space" and InstanceName != "HarddiskVolume1" and InstanceName != "_Total" and TimeGenerated > ago(Time)) | summarize avg(CounterValue) by Computer, InstanceName , DiskMetrics=CounterName, bin(TimeGenerated, Bin) | evaluate pivot(InstanceName, avg(avg_CounterValue)); //end Disk //join disk to Report Report | join (Disk) on Computer | extend format_datetime(TimeGenerated, Date) | project-away TimeGenerated1, Computer1 | sort by TimeGenerated, Computer desc
Here is sample output excluding any drives other than C:
Here is the cropped section on my file server, you can see it has a number of drives, but pivoting the instance name allows them to all be on one row eliminating a ton of extra rows and more easily allowing inner unique joins, without excluding data.
Demo
For those of you who’ve run the SCOM report I gave as my target know it can take quite a while to run sometimes its taken so long I’ve had to up the default timeout in SSRS so that the report will actually complete. For reference for the exact same computers in my SCOM environment for 30 day period, daily aggregation it took roughly 45 seconds to run. My SCOM DB and Data Warehouse DB are on SSD drives. Here is a video of the same 30 days daily bin running for the same computers, taking roughly 6 seconds. Of course the hardware MS is using for Log Analytics is way better, but even still I think the language itself is very fast, since it was designed to be that way.
Summary
Hopefully you find this report useful either in your environment, or getting started in writing your own report. As I said above there are multiple ways to do things in Log Analytics. Let me know if you come up with a better way to do reporting, I’m all about learning better ways to do things. Also as I understand Linux metrics are named slightly different. So this report currently will only work with Windows machines. However, the concepts are the same, you would just need to add your counters for Linux.
Github repo if you’re into that kind of thing https://github.com/scautomation/Log-Analytics-Server-Performance-Report