I have found that the following very basic set of monitor gives me very realistic view of the CPU, Memory, and Disk use of the system under test. If I run into performance issues, I will generally expand this list of monitors to target specific problem areas. I try to rely on the systems experts to select which metrics to collect when issues occur.
Here is the basic set:
- **SystemCurrent Processor Queue Length**: If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck.
- **Processor% Processor Time:** This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as Processor Queue Length, high CPU utilization may be worth investigating.
Less than 60% consumed = Healthy
51% - 90% consumed = Monitor or Caution
91% - 100% consumed = Critical or Out of Spec
- **SystemContext Switches/sec**: Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system.
- **MemoryAvailable Mbytes**: Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in MemoryAvailable Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required)
Low on available memory - less than 10% available
Very low on available memory - less than 5% available
Decreasing trend of 10MB's per hour. This could indicate a memory leak.
- **Current Disk Queue Length**: is a direct measurement of the disk queue present at the time of the sampling. Like the processor queue length, a disk queue length greater than one is an indication that requests to access the disk subsystem are in a wait state. As this value grows, longer periods of time are required for requests to be managed by the I/O subsystems. The effect of a positive value for the current disk queue length is just an indicator of a potential issue. If this value becomes very high, check other counters to see of virtual memory is being overused or the software is making excessively heavy requests on one or more disk spindles.
Additionally, I recommend that you also add Network Delay Monitors between segment and sub segment of the application under test network data flow. As performance deteriorates with the effect of increasing load, you will need to ensure that network latency is not the root cause of the issue.