Friday, June 24, 2016

SQL Server trace flag 8721 - auto stats sync & async update info

Aaron Morelli (@sqlcrossjoin on Twitter) has a great collection of trace flags in the pdf linked below.  (Check to see if there's a more recent update, v6 was current as of April 2016.)
https://sqlcrossjoin.files.wordpress.com/2016/04/sqlcrossjoin_traceflagrepository_v6.pdf

Trace flag 8721 is of interest to me when investigating ETL behavior.  A complex ETL could change stats numerous times for lots of stats keys.  Information for auto stats updates in addition to stats updates performed explicitly by ETL code can really help complete the picture of what happens during an investigation.

I found this post from Prince Rastogi where he walks through a demo of trace flag 8721 by creating a new table with a stats key, increasing the rows from 0 to 1, then executing a query that requests the stats.  Here's an important note: use trace flag 3605 (redirect some output to SQL Server log) together with trace flag 8721.
http://www.sqlservergeeks.com/sql-server-trace-flag-8721/

I had two questions, and answered them both by following the same process.
My first question: Does it still work in SQL Server 2016? Yep.
Second question: Does it capture both async & sync auto stats updates? Yep.

Snippet from the SQL Server error log below.

*****
Date,Source,Severity,Message
06/24/2016 12:16:22,spid23s,Unknown,AUTOSTATS: UPDATED Stats: test_Tflag8721..PK__test_Tfl__1B434632D45E0816 Dbid = 7 Indid = 1 Rows: 1 Duration: 0ms
06/24/2016 12:16:22,spid57,Unknown,AUTOSTATS: Tbl: test_Tflag8721 Objid:2089019169 Rows: 1.000000 Threshold: 500 Duration: 0ms **** >>>> ASYNC <<<< ****
06/24/2016 12:16:22,spid57,Unknown,AUTOSTATS: Tbl: test_Tflag8721 Objid:2089019169 Rows: 1.000000 Threshold: 500 Duration: 0ms **** >>>> ASYNC <<<< ****
06/24/2016 12:16:22,spid57,Unknown,AUTOSTATS: Tbl: test_Tflag8721 Objid:2089019169 Rows: 1.000000 Threshold: 500 Duration: 0ms **** >>>> ASYNC <<<< ****
06/24/2016 12:16:11,spid57,Unknown,The error log has been reinitialized. See the previous log for older entries.
06/24/2016 12:16:11,spid57,Unknown,Default collation: SQL_Latin1_General_CP1_CI_AS (us_english 1033)
06/24/2016 12:16:11,spid57,Unknown,The service account is 'NT Service\MSSQLSERVER'. This is an informational message; no user action is required.
06/24/2016 12:16:11,spid57,Unknown,Logging SQL Server messages in file 'M:\MSSQL\Log\ERRORLOG'.
06/24/2016 12:16:11,spid57,Unknown,Authentication mode is MIXED.
06/24/2016 12:16:11,spid57,Unknown,System Manufacturer: 'VMware<c/> Inc.'<c/> System Model: 'VMware Virtual Platform'.
06/24/2016 12:16:11,spid57,Unknown,Server process ID is 1488.
06/24/2016 12:16:11,spid57,Unknown,All rights reserved.
06/24/2016 12:16:11,spid57,Unknown,(c) Microsoft Corporation.
06/24/2016 12:16:11,spid57,Unknown,UTC adjustment: -5:00
06/24/2016 12:16:11,spid57,Unknown,Microsoft SQL Server 2016 (CTP3.0) - 13.0.700.242 (X64) <nl/>    Oct 26 2015 18:46:06 <nl/>    Copyright (c) Microsoft Corporation<nl/>    Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)
06/24/2016 12:16:11,spid57,Unknown,Attempting to cycle error log. This is an informational message only; no user action is required.
06/24/2016 12:15:52,spid57,Unknown,Setting database option AUTO_UPDATE_STATISTICS_ASYNC to ON for database 'SEASIDE'.
06/24/2016 12:12:56,spid57,Unknown,AUTOSTATS: Tbl: test_Tflag8721 Objid:2057019055 Rows: 1.000000 Threshold: 500 Duration: 2ms
06/24/2016 12:12:56,spid57,Unknown,AUTOSTATS: UPDATED Stats: test_Tflag8721..PK__test_Tfl__1B434632ECB10C7B Dbid = 7 Indid = 1 Rows: 1 Duration: 0ms
06/24/2016 12:12:50,spid57,Unknown,The error log has been reinitialized. See the previous log for older entries.
06/24/2016 12:12:50,spid57,Unknown,Default collation: SQL_Latin1_General_CP1_CI_AS (us_english 1033)
06/24/2016 12:12:50,spid57,Unknown,The service account is 'NT Service\MSSQLSERVER'. This is an informational message; no user action is required.
06/24/2016 12:12:50,spid57,Unknown,Logging SQL Server messages in file 'M:\MSSQL\Log\ERRORLOG'.
06/24/2016 12:12:50,spid57,Unknown,Authentication mode is MIXED.
06/24/2016 12:12:50,spid57,Unknown,System Manufacturer: 'VMware<c/> Inc.'<c/> System Model: 'VMware Virtual Platform'.
06/24/2016 12:12:50,spid57,Unknown,Server process ID is 1488.
06/24/2016 12:12:50,spid57,Unknown,All rights reserved.
06/24/2016 12:12:50,spid57,Unknown,(c) Microsoft Corporation.
06/24/2016 12:12:50,spid57,Unknown,UTC adjustment: -5:00
06/24/2016 12:12:50,spid57,Unknown,Microsoft SQL Server 2016 (CTP3.0) - 13.0.700.242 (X64) <nl/>    Oct 26 2015 18:46:06 <nl/>    Copyright (c) Microsoft Corporation<nl/>    Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)
*****

Tuesday, June 14, 2016

SSIS "buffers spooled" on a NUMA server


 So this happened today during an ETL.



A quick look through the InterTubes tells me that SSIS should spool buffers to disk only if
  • memory allocation failed
  • low memory resource notification sent by Windows OS

So the server had plenty of free RAM.  But NUMA node 1 was in a pinch.  And SSIS spooled its buffers to disk.  Doggone it.

I guess I'd figured that notifications were sent based on server-wide memory state.  But I guess maybe memory state on each NUMA node can lead to a memory notification?

The target SQL Server instance, a beefy one, was also on this physical server.  There's 1.5 TB of RAM on the server.  :-)

We lowered max server memory by 100GB, hoping to get a reasonable amount of free memory on NUMA node 1 for tomorrow's ETL.  (The server was also rebooted for an unrelated reason.)

I dunno though.  This is SQL Server 2014 SP1 CU2.   

Memory is already being consumed unevenly across the NUMA nodes.  If NUMA node 1 gets short end of the stick by 6:00 am tomorrow, I think for now we may try to affinitize SSIS to NUMA node 2, in hopes that its memory allocations will come from that node and it that memory allocations will continue to allow enough free memory there even while short-changing NUMA node 1.  

*****Update*****
A blog post showing use of Windows Resource Manager to affinitize SSIS.  But... this feature was deprecated in Windows Server 2012 R2. 
:-( Sad trombone.  

Maybe next version of SSIS can support affinity natively, similar to SQL Server database engine?  Or maybe SSIS can be made NUMA aware?  Or...

Well, at any rate, for maximum scalability and flexibility, gotta find some way of making this work...




Monday, June 13, 2016

L'esprit de L'escalier - Whither SQL Server Backup Buffers?

I've got a long, long list of interesting questions.  And I blog frightfully slowly :-)

So... why don't I start blogging more of the questions?  Someone else may very well be able to springboard from one of my questions to a *really* astounding answer!  Maybe.  I hope my questions are that good anyway :-)

Sometimes the questions come to me as a form of L'esprit de L'escalier - not something I wish I'd said but a question I wish I had asked.

So it is with my question: "Whither SQL Server Backup Buffers?"

Yesterday (or maybe really early this morning) I posed the question: "Whence SQL Server Backup Buffers?"

Here's what I noticed: 'Target server memory' 33553968 hadn't been achieved.  'Total server memory' during the backup was 14947794.  Of that 4194496 was allocated to memoryclerk_backup.

When the backup was cancelled, the memoryclerk_backup allocation was simply returned to the OS.

Why not give it to free memory?  It should have been nice, contiguous chunks of memory addresses.  The server was far from reaching target server memory.  The server wasn't under memory pressure.  This is what it looks like right now... and it would have looked nearly the same right after the backup was cancelled.
That's a lot of free memory.  Take away 4 GB from it for the backup buffers... yeah, its still a lot of free memory.

I wonder:
-If 'max server memory' wasn't being overridden by a more reasonable target (because max server memory is 120 GB on a 32GB VM), would the behavior still be the same before reaching target?  I bet it would be.
-Is this behavior specific to not having reached target?  Or when reaching target would backup buffers be allocated, potentially triggering a shrink of the bpool, then returned to the OS afterward requiring the bpool to grow again?
-What other allocations are returned directly to the OS rather than given back to the memory manager to add to free memory?  I bet CLR does this, too.  Really large query plans?  Nah, I bet they go back into the memory manager's free memory.
-Does this make a big deal?  I bet it could.  Especially if a system is prone to develop persistent foreign memory among its NUMA nodes.  In most cases, it probably wouldn't matter.

Maybe I'll get around to testing some of this.  But it probably won't be for a while.

Whence SQL Server Backup Buffers?

Recently I saw a question on Twitter #sqlhelp about backup buffers, asking if they came from the bpool or MTL.

I think the question is trickier than first realized :-)

MTL (or mem-to-leave) is at this point more of an historic term than anything else.  This term was most relevant in the days of 32-bit OS.  You can search old blog posts (not of mine) and old forum posts to see tons of fun related to 32-bit Windows and attempting to ensure that multi-page allocations and Windows direct allocations had enough available memory in the first 4 GB of address space. (Hunt around for "-g" startup option and "/3GB" switch to *really* appreciate 64-bit Windows & SQL Server.)

Next, SQL Server 2012 saw a very significant change in SQL Server memory management.  Previous to SQL Server 2012, there was both a single page allocator (SPA) and a multi-page allocator (MPA) in SQL Server.  While single page allocations were under governance of "max server memory", multiple page allocations were not.  SQL Server 2012 instead saw the introduction of the any page allocator, which took over the jobs of both previous allocators.  On a related note, the multi-page allocations which were previously outside of "max server memory" control were moved under control of "max server memory".  (I'm really glad that happened - in general it made server behavior much more predictable for a given configuration.)

Now about the idea of "coming from the buffer pool."  SQL Server still uses the term "stolen memory" to describe behavior and within its memory accounting.  But in SQL Server 2012 and beyond, the bpool is a consumer of memory from the memory manager in much the same way as others.  So although the bpool may shrink in order that another consumer can grow, that's a process managed by the memory manager.  And the bpool is usually on the "need to shrink" side only because it tends to be one of the largest, if not the largest, consumers.

OK... so now I'll mention what I *think* the question really meant.  I think the question was meant to determine:
"Are backup buffers governed by 'max server memory', or should I plan for backup buffers outside of 'max server memory' like we still do for SQL Server worker threadstacks?"

That's a pretty good question.  If you search around, you may find a number of posts where the conclusion is... "I don't remember if its included in max server memory or not." :-)

Let's see what can be learned with a quick little experiment.

Don't try this at home.  :-)

Only if you have exclusive access to the memory resources on the server AND if a significant read load on the storage won't disrupt anyone.

What will happen if I run a backup to DISK='NUL' on a very large database with buffercount = 1024 and maxtransfersize = 4194304 (4 mb)?  Altogether, that should be 4 GB of memory allocation - should be quite noticeable.

I'm testing on SQL Server 2016 RC2.

Here's how things looked before the backup.

   
Kinda boring.  Memoryclerk_backup has nothing in node 0 or node 64.
'Total Server Memory (KB)'  is just a hair over 10GB, even though 'Target Server Memory (KB)' is just shy of 32 GB.  'Max server memory' is actually 120000 MB.  There hasn't been much happening in this instance for a few days.

Lets fire up a backup.
The test database is several terabytes, but in this case I'm writing the backup to NUL and I'll cancel the backup as soon as I learn what I want to know.

How do the backup memory clerks and performance counters look now?




Memoryclerk_backup on node 0 now has just over 4 GB allocated to it.  Total server memory has increased by... let's see... 14947744 KB - 10753248 KB =  4194496 KB.  That's exactly the size of the allocation to the backup clerk.

What about stolen memory?  It increased by 5457832 KB - 1257344 KB = 4200488 KB.  That's a little bit more than what memoryclerk_backup picked up.  In part because I opened a second query window while testing :-)

Remember - the memory isn't really stolen from the buffer pool necessarily.  Notice that 'Database Cache Memory (KB)' is the same 9467600 KB before and during the backup.  'Free Memory (KB)' has declined by 28304 KB - 22312 KB = 5992 KB.

Doesn't that number look suspicious?

4194496 KB (backup allocation) + 5992 KB (additional stolen memory) = 4200488 KB

Whew! Glad that worked out :-)

So some memory really was stolen.  But it wasn't stolen from the bpool, which remained the same size during the backup (until I canceled it) as it was before the backup.  The additional stolen memory came from the free memory.

Now, because target memory in this SQL Server instance is so much lower than 'max server memory' (my VM got shrunk from 128GB RAM to 32 GB RAM and I never adjusted 'max server memory'), I can't say that I proved that backup buffers will always be from within the allocation governed by 'max server memory'.  Can I say it's left as an exercise for the reader?

It's late and I gotta get some sleep...

:-)