Tuesday, December 10, 2013

Sometimes, what SQL Server really needs is more work to do (surprise ending)

As long as the SQL Server schedulers each have enough tasks to keep them busy (or waiting), tuning based on waits & queues, as well as the efficiency of work (eg 'good' plans and low spinlock-type overhead) can be very successful.

But there's a special case where that type of system tuning will only chase its tale: when the bottleneck is actually the number of tasks given to the schedulers.  A couple of examples that can be diagnosed on SQL Server itself are when inadequate parallelism is employed for the executing queries(perhaps plans forced to serial), and when query memory limits are throttling the number of queries that can run concurrently (pending query memory grants).  But there are a number of factors outside SQL Server that could also cause this class of performance problem: BusinessObjects, for example, uses services called JobServers which will submit a maximum number of concurrent jobs/queries.  Too few concurrent job slots, and there might not be enough work to keep the RDBMS server busy.  Another case could be an SSIS package or other ETL execution with dependencies among job tasks resulting in a lull of tasks that can be run concurrently.

Below is a graph of perfmon \Processor(_Total)\%Processor Time (red, right-hand axis) vs perfmon Workload Group Stats(default)\Active requests & Workload Group Stats(default)\Active parallel threads.  The system is a 4 vCPU virtual machine.  I've skipped a couple of steps for the sake of brevity :) I've already checked SQL Server CPU utilization against total server CPU utilization: SQL Server is indeed the major CPU consumer for the time period.  I've also already determined that the default workload group is the only significant workload group consumer of CPU for the timeperiod.  Having done that preliminary work...


 

The core licenses have been paid for, I don't really want them underutilized if I can help it :) I'd really like to achieve consistently high CPU utilization on this DSS-style system between 7 am and 11 am, instead of very brief spikes to high CPU utilization among much longer lulls around 40% utilization.

Between 5:00 am and 6:30 am and after 11:00 am, there were much higher counts of active parallel threads, and CPU utilization was higher.  The rate of logical work was higher as well, based on buffer page lookups/second.

So, maybe SQL Server just needs more active requests > more tasks assigned to the workers > more active parallel threads.

Pending memory grants can lead to low active requests and low resource utilization.  But, in this case they were completely absent.

Next... blame it on the other guy :) Some workloads, like an SSIS package, may have dependencies outside of SQL Server proper that throttle the number of active requests.  (BusinessObjects, too - each 'jobserver' service has a maximum number of concurrent job requests it will submit).

But... there are periods where 3 or 4 active requests and 4 active parallel threads drove high CPU utilization.  Rats!  And this was supposed to be my big finish - declaring that the low CPU utilization was jut because there wasn't enough work to do!

In this case, I actually suspect that the VM host was using paging space disk to back VM guest 'physical memory'.  That dramatically increases memory latency and reduces CPU utilization.  But it doesn't show up in the guest as page file activity.

Okay, okay, I'll ask the ESX server owners to check for paging on the server during the low CPU time.  Sorry - guess I'll have to wait for another time to show an example of "its not doing much because nobody is asking it to do much."

No comments:

Post a Comment