Wednesday, June 18, 2014

Adventures with SQL Server & perfmon Part 3: Less is more for Me & TokenAndPermUserStore

 This is a continuation of my adventures with SQL Server and perfmon.
Adventures with SQL Server & perfmon Part 1: Resource Governor & Query Memory
Adventures with SQL Server & perfmon Part 2: TokenAndPermUserStore & spinlock suspects

In Part 2 I explained that I often evaluate the amount of logical work (in the form of buffer pool lookups/sec) performed in a database against the CPU utilized.  If I see something suspicious, I dig in.  What I saw on that system made me suspicious of some type of spinlock condition at the very beginning of queries in the batch.

My previous post has some graphs that explain the situation a bit.  Here's a brief recap.  RG1 is a resource governor workload group, attached to its own resource governor resource pool.  And here are two graphs showing the disparity between CPU utilization and logical reads, as well as the prevalence of CMEMTHREAD wait starts in the database instance.

Bpool Lookups (light blue above) = "\SQLServer:Buffer Manager\Page lookups/sec"
%CPU (dark red above)= "\SQLServer:Resource Pool Stats(RG1)\CPU usage %"
CMEMTHREAD Wait Starts (red dots above) =  "\SQLServer:Wait Statistics(Waits started per second)\Thread-safe memory objects waits"

So I hit the books.  Or the internet, or whatever.  And I learned about TokenAndPermUserStore.

Here's the stuff I read through.

TokenAndPermUserStore cache performance issue
Explicitly parameterize adhoc queries
Use stored procedures
Force parameterization
Periodic use of:

Trace flag 4618 for 1024 "access check cache quota" entries

Add trace flag 4610 to 4618 for 8192 "access check cache quota" entries

Trace flag 4621 + Windows Registry setting 'TokenPermQuota' to customize size

Blog Post about trace flag 4621

The blog post below gives a query example from sys.dm_os_memory_clerks that can be used to evaluate this condition. The sys.dm_os_memory_clerks query is lightweight; the blog post also includes a sys.dm_os_memory_cache_entries query that should not be used during peak workload due to its overhead.

Check this out. This blog post uses a sys.dm_os_memory_clerks query together with perfmon "\SQLServer\User Settable" to track this condition. Maybe its a good thing I didn't see this post until I was almost done with my own post - maybe it would have scared me off :-)

So I evaluated the system.  Then I evaluated system behavior based on what I know about the workflow.  Its a batch-based workflow, with almost all queries executed by a single service account.  The security/access cache doesn't necessarily deliver a lot of value.
So I opted to set the cache elements on the 64 bit system to the 32 bit defaults.

--default sizes below if 'access check cache bucket'=0 & 'access check cache quota'=0
--x86 32 bit default sizes 'access check cache bucket'=256,  'access check cache quota'=1024
--x64 / ia64 default sizes 'access check cache bucket'=2048, 'access check cache quota'=8192
--check current values
EXEC sp_configure @configname = 'access check cache bucket' ;
EXEC sp_configure @configname = 'access check cache quota' ;

EXEC sp_configure @configname = 'access check cache bucket' , @configvalue = 256 ;
EXEC sp_configure @configname = 'access check cache quota' , @configvalue = 1024 ;

--verify the new values
EXEC sp_configure @configname = 'access check cache bucket' ;
EXEC sp_configure @configname = 'access check cache quota' ;

--reset 'show advanced options' to 0
EXEC sp_configure @configname = 'show advanced option', @configvalue = 0 ;

That turned out to be a pretty good strategy.

From 7:30 to 8:00 am the peak CMEMTHREAD wait starts per second dropped from almost 60,000 to under 160!  Even better, CPU utilization became "unstuck" from its previous plateau of about 75%... and the relationship between CPU utilized and database logical IO was restored.

Here's a before and after, which will make the drop in CPU and increase in logical IO rate quite evident.

Pretty good for not tuning a single query, eh? :)

Again - its not that I think query tuning is unimportant.  But first things first: make sure the system is delivering expected performance capacity... then tune queries to leverage the system performance capacity.

Gives an idea what I'm moving on to now :)  Its time to push this system a little bit harder, and then make sure that the logical work performed at the instruction of the queries and query plans is efficient - so that the system is working hard and smart.