Showing posts with label spinlocks. Show all posts
Showing posts with label spinlocks. Show all posts

Wednesday, April 12, 2017

#SQLServer: Waits keeping CPU utilization down? Or Management Cost inflating CPU utilization? Part I

Two of the first graphs I look at to understand system resource utilization are active parallel threads against CPU utilization, and active requests against CPU utilization.  One or both relationships should stay fairly well correlated - as parallel threads and/or active queries increase, CPU utilization should increase until reaching CPU saturation.  

*Unless* a wait type or a group of wait types starts to dominate the workload, such that time is passing with *no* runnable thread in the lot for a scheduler or more than one scheduler.

In the graphs below(30 second perfmon interval), the time period from 12:45 am to about 1:20 am, and from 1:35 am to 2:00 am look fairly healthy from that standpoint.

But the yellow box is confounding.  I personally hate CPU sawtooth patterns like that - just about my least favorite pattern to come out of graphing system resource utilization.  Only thing I like less is a straight horizontal line at saturation.




I don't know much about this particular system yet.  But... I can't help myself.  I must know. I know its a 2 socket physical server, with 32 cores.

There are no added Resource Governor Resource Pools or Workload Groups - just Default and Internal.

Well - could it be something other than SQL Server that is chewing up CPU?  I guess...

Ooohhh.  Nope, I take that back.  Definitely SQL Server chewing up that CPU.



My money is on 1:20 - 1:35 am being a case of exorbitant management cost. Spinlock contention, probably.  Maybe related to handling a system rowset object, or the plan cache... or it could even be good old spinlock contention surrounding CMEMTHREAD waits.

I'm not sure right now.  But considering the system is at a low point for concurrent queries and parallel threads at that time - its hard for me to believe waits are keeping CPU utilization lower than the trend.  Rather, it looks more likely that somehow, during a time of low concurrency something is artificially increasing CPU utilization with management work.

This is particularly intriguing.  About 4 GB of tempdb in use during the CPU sawtooth, but almost no granted memory used.  Hmmmm.


There was memory granted at the time.



This is also quite intriguing.  Plan cache was at its plateau during this unusual period of time.  And when a significant part of plan cache was released shortly after 1:35 am, a more expected relationship between CPU utilization and queries/parallel threads was restored.



Stay tuned... when I figure it out I'll let you know...


(to be continued...)

#SQLServer: Are waits keeping CPU utilization down? Or is Management Cost inflating CPU utilization? Part II
http://sql-sasquatch.blogspot.com/2017/04/sqlserver-are-waits-keeping-cpu_13.html

Friday, March 3, 2017

SQL Server 2016 Spinlocks: Query Store

Recently on #sqlhelp there was an instance of a 1 GB Query Store being nearly full, and the instance slowing to a crawl.  It was difficult to even run diagnostic queries, and to resolve the service was restarted and Query Store set to read only (so all contents were retained).

Hard to say if the problem was work or wait. Certainly both could have contributed.  But most diagnostic queries are pretty light - unless CPU is saturated or there are long waits to compile or get a memory grant, they'll usually run even if a bit sluggish.

So I suspect that this was a work problem.  Eg with the Query Store nearly full, management work became so expensive that user query work was starved for CPU time.  That doesn't seem far-fetched because the number of queries in the 1 GB query store was over 100,000 if I recall correctly.

There have been issues in the past with the plan cache when it was at or near plan count capacity - it would become a one-in-one-out scenario, with the session that wants to insert a new plan being responsible for the necessary cleanup.  If the hash buckets were really long it became expensive to walk them.

I'm not sure how size-based cleanup is implemented for the Query Store, other than seeing documented that it kicks in near 80% of configured limit.  If sessions waiting to compile a query take on responsibility of cleaning out stuff from Query Store, there could be a situation very similar to the plan cache conundrum of old.

If that's happening and CPU utilization is near 100% either server-wide or on a subset of cores, then spinlock stats may become more enlightening than wait stats.

I did see a number of new spinlocks added for Query Store...

SELECT name FROM sys.dm_os_spinlock_stats WHERE name LIKE '%QUERY_STORE%' ORDER BY name

QUERY_STORE_ASYNC_PERSIST
QUERY_STORE_CAPTURE_POLICY_INTERVAL
QUERY_STORE_CAPTURE_POLICY_STATS
QUERY_STORE_HT_CACHE
QUERY_STORE_PLAN_COMP_AGG
QUERY_STORE_PLAN_LIST
QUERY_STORE_SELF_AGG
QUERY_STORE_STMT_COMP_AGG
SPL_QUERY_STORE_EXEC_STATS_AGG
SPL_QUERY_STORE_EXEC_STATS_READ
SPL_QUERY_STORE_STATS_COOKIE_CACHE