Thursday, June 13, 2013

My pitch for async stats auto updates and... Yikes! A SQL Server stats update async memory leak!!!

FIX: Memory leak if you enable the AUTO_UPDATE_STATISTICS_ASYNC statistics option in SQL Server 2008, in SQL Server 2012 or in SQL Server 2008 R2

A few days ago, a SQL Server DBA asked whether this memory leak was significant to consider against my recommendation for enabling auto_update_statistics_async for a given system.  My answer reminds me of a lesson I learned a long time ago, cooking in a deli.

"Serve 'em up... and watch 'em real close."

Here's the deal.  Maybe async auto stats updates aren't appropriate for the primary workflows on the systems in your sphere.   But in my world, its all about minimizing batch execution, with query concurrency higher than logical CPU count, and many of the concurrent queries peeking at the same statistics objects as the work their way through the query optimizer.  And almost all of these queries are run repetitively at a given frequency - most of them daily, some weekly, a few of them monthly.  Most of the tables grow in fairly regular fashion, and I haven't yet seen query plan problems in these environments caused by skewed (as opposed to outdated - and particularly ascending-key-absent) statistics.

A wait for a synchronous auto stats update won't just bloat the elapsed time for a single query - it will bloat the elapsed time for all concurrent queries peeking at the stats object being updated.  So lets say there's a stats object that has just tripped over the threshold for auto update - first time in one month.  Takes 25 minutes to update the stats.

Normally, for a single query, you do the calculus: how much elapsed time is at risk if a plan based on the old stats is used, vs a plan based on the new stats?  Not total elapsed time... but the delta between the two query executions.  If the expected delta between the query plan execution elapsed time is LESS than the expected elapsed time for a synchronous stats update... you come out ahead by doing the async update.  If the stats update requires less elapsed time than the difference between the query plan based on outdated stats and the faster query plan based on new stats... seems better to do sync update.

But, throw in lots concurrent queries stacked up behind the synchronous stats update, and you got trouble.  Some of those concurrent queries would benefit from cache hit based on buffer contents brought in by other other concurrent queries.  And, assuming they all ran acceptably yesterday and are growing reasonably predictably - why have ALL of them wait for a sync stats update?

Ahhh... but what about this scary memory leak?  Well, maybe its not so scary.  80 bytes of memory leaked with each update stats async task (one task per object).  12,800 update async tasks before 1 mb of memory has been leaked.

How many async stats updates are expected before a SQL Server service restart clears the memory leak?  Especially until the restart AFTER the fix is installed? :)  I'm not sure, honestly.  For systems that install trace flag 2371, the answer is more than previously.  My plan: serve 'em up, and watch 'em real close.

The KB article indicates this can be done with the version specific code below.  If appropriate, take a baseline value before enabling async stats update, so that the growth can be evaluated against the baseline and the threat from the leak evaluated.  Hey... I could be making the wrong bet in this scenario.  But I don't think so.  YMMV.  Know your systems, know your primary workflows, know the business cycle.

SQL Server 2012

select [Memory Used KB] = pages_in_bytes /1024 from sys.dm_os_memory_objects where type = 'MEMOBJ_RESOURCE'

SQL Server 2008

select [Memory Used KB] = (pages_allocated_count * page_size_in_bytes)/1024 from sys.dm_os_memory_objects where type = 'MEMOBJ_RESOURCE'

No comments:

Post a Comment