Wednesday, May 4, 2016

SQL Server: A quick word about large sorts/hashes, lookups/sec, and CPU utilization

Several years ago I began trying to model SQL Server DSS system behavior based on activity rates and resource utilization.  Its an ongoing effort :-)  This is the motivation for many of my strange-looking perfmon graphs - I'm trying to fit it all together.

One of my first stops was to graph CPU utilization against "page lookups" across many systems.
I quickly learned that spinlock contention could disrupt the correlation between CPU utilization and page lookups or database logical IO.  I was suspecting that anyway - I'd seen the same effect on another database platform.

It stands to reason that high rates of plan compile, or lots of XML parsing and manipulation, or usage of CPU by CLR, or CPU consumed by something other than SQL Server will likewise disrupt the correlation.  Operations using worktables can also throw things for a loop, whether reviewing perfmon or results from SET STATISTICS IO ON.

But today I want to mention another disrupter of the relationship between CPU utilization and database logical IO: significant query use of large sorts and hashes.

Consider the following perfmon from a DSS system(96 GB RAM on the system).  Page Lookups/second in blue against the left vertical axis, CPU utilization in orange against the right vertical axis.  I've color-coded this observation period into green, yellow and red portions of time.  During the initial green period of time, there is a strong correlation between CPU utilization and page lookups.  That gives way to the yellow time period, where the relationship between CPU and logical IO comes unglued.  Finally, in the red section the amount of CPU utilized is completely disproportionate to database logical IO.



I don't want to bore you with my "home movie" of the dozens of perfmon graphs I build when I review systems.  :-)  Let's cut to the chase.

 

The graph above has tempdb footprint (light blue) stacked on top of used query memory (dark blue) against the left vertical axis.  The green period has very limited use of query memory.  During the yellow period, a moderate amount of query memory was used.  During the red period, a large amount of query memory was used and at a number of points operations spilled into tempdb.  As query memory was used more extensively, the CPU:lookups/sec correlation was more disrupted.

Once fully considered, this makes sense: query memory is "stolen" from the database page buffer pool.  References to pages in the page pool are "page lookups", but each time stolen query memory is poked and prodded... well, that's not a page lookup.  But it has CPU cost.

So if you are reviewing a system that usually has high correlation between CPU utilization and page lookups/second and suddenly the relationship is disrupted: yeah, could be spinlock contention.  Could be lots of other things, too.  Might be large sorts/hashes dominating memory traffic and less reliance on the database cache where "page lookups" are tracked.

OK... I guess that wasn't really a "quick word".

I tried; not sorry. :-)

No comments:

Post a Comment