Tuesday, March 31, 2015

Resource Governor to restrict Max Query Memory Grant

Quite a while ago I put up a brief post explaining that, once SQL Server had achieved stability at "max server memory", the stacked graph of "database memory", "stolen memory", and "free memory" was a meaningful account of the "total memory" - the memory governed by the "max server memory" setting and used for shared memory allocations.

Here is such a stacked graph from a system.  This graph is composed of the perfmon metrics "\SQLServer:Buffer Manager\Stolen pages", "\SQLServer:Buffer Manager\Database pages", and "\SQLServer:Buffer Manager\Free pages".

There is a problem on this system.   Resource_semaphore waits are evident on the system - some queries are waiting due to pending query memory grants.  Perfmon gives two places to track outstanding and pending memory grants.

These are a little tricky.  If there are Resource Governor pools other than default for user workloads - beware!!  Although you might think that these metrics account for memory grants across all resource pools, my experience has been that they only show the default resource pool grants.
SQLServer:Memory Manager\Memory Grants Outstanding   
SQLServer:Memory Manager\Memory Grants Pending

Resource pool stats also show these numbers - for default, internal, and any other resource pools.
SQLServer:Resource Pool Stats(default)\Active memory grants count
SQLServer:Resource Pool Stats(default)\Pending memory grants count

The graph below has pending grants stacked on top of outstanding grants for the observed workload.

From approximately 8:30 am to 12:30 am, only roughly half the submitted queries at any given time are executing - the remainder are waiting for their query memory grants.

Often I see this described as a sign of insufficient memory on the system.  Its often - maybe even usually not.  That's because a memory grant isn't an amount of memory that is already delivered - rather its a promise that the specified amount of memory can be stolen from the buffer pool when needed.  The total amount of query memory granted at any given time is not the amount of query memory in use, but it is the amount promised.  If the queries with grants have "eyes bigger than stomach" - having requested more query memory than they will ever actually use, and there are pending query memory grants that's a sign of over-promise, not over-use.

Perfmon tells us how much memory has been stolen at any given time in "\SQLServer:Buffer Manager\Stolen pages", which we used above.  Does it give us the amount of stolen memory that is query memory?  Not directly.

Perfmon does tell us how much memory is currently promised in outstanding grants in "SQLServer:Memory Manager\Granted Workspace Memory (KB)".  Perfmon also tells us how much of the outstanding grants have yet to be called in for steal: "\SQLServer:Buffer Manager\Reserved pages".

So the complement of the "Reserved Pages" is the amount of query memory currently in use.

Wow.  Can already tell there is a level of "eyes bigger than stomach" here. Over the entire timeperiod, the total query memory stolen never gets much above 1/3 of the total query memory granted.

Combine what we know about memory grants on the system with what we know about use of query memory.

By default, a query is allowed to ask for 25% of the grantable memory on the system.  Since there are at least 8 grants outstanding throughout most of this timeperiod (and because I know the type of query workload) I imagine that the largest grants are for about 10% of grantable memory, with many even smaller.  But in aggregate they aren't using close to their total outstanding grant amount.  The largest among them most likely have the largest amount of granted-but-never-stolen memory.  Its possible that outdated stats are causing high estimates of intermediate or final result sets.  Maybe data skew is causing an overload in histogram steps and a high estimate.  Maybe trace flag 4199 could help by way of 4101, which I've seen work wonders in memory grant size for queries with llooonnnggg join lists? (Not in this case - because I know that trace flag 4199 is already in place.)  Sometimes a scalar UDF can obscure the query and result in a really large grant request.

You get the idea :-)  On a busy system with a huge variety of queries, it can be tough to evaluate and tune every query for the sake of a better fit between grant requests and the query memory actually used during execution of the query.

S'ok.  Seems to me plenty of folks have run into this issue now.

Page 16 of the "Hitachi Unified Compute Platform Select for Microsoft® SQL Server® 2012 for Medium Level Data Warehousing on Hitachi Compute Blade 2000 and Hitachi Unified Storage 150" (wheww - what a mouthful) has  the following excerpt.


I've recommended that approach on a number of systems.  In fact, although the Hitachi Fast Track was the first Fast Track I saw that recommended this, I know that a number of the other Fast Track DW documents contain similar recommendations.  REQUEST_MAX_MEMORY_GRANT_PERCENT is the setting to work with.  On this system during that workload, decreasing the max grant from 25% to 10% likely wouldn't change much.  At 8% maybe there would be some change. Some systems in a similar position have had to adopt a maximum of 3-5% for memory grants, in order to reach desired query concurrency.  If you decide to take the route of using Resource Governor to manage this behavior,  lower the max grant until a better fit between granted and stolen query memory becomes evident.  Can it be lowered too far?  Yep.  If so, some queries will use their entire grant and spill to tempdb.  So keep watch on tempdb while modifying the maximum grant size, as a sanity check.  The information in sys.dm_exec_query_memory_grants can also be extremely helpful in finding individual queries with large memory grant requests, and those with really, really large ideal memory (if the ideal memory is absurdly large for multiple queries, adding memory to the server will likely not result in higher memory concurrency as desired but instead larger grant requests).


I'll throw this graph in for free, because I love graphing perfmon stuff.  This is total stolen memory on the system.  The lighter red stacked on top is stolen query memory.  But there are other consumers of stolen memory.  On this system, the other consumers of stolen memory remain relatively constant in size and query memory is the main volatile component of stolen memory.

A few more recent blog posts related to perfmon & query/workspace memory.

SQL Server Granted/Reserved/Stolen Workspace Memory in Perfmon

SQL Server Workspace Memory - with a Twist