Monday, July 2, 2018

SQL Server 2016 Memory Accounting: A Suspicious Surprise

Originally published 2018 July 2
Updated 2020 December 15
*****
I wanted to update this blog post with a link to a fix that corrects the memory double-counting described here (and the condition that leads to it).

KB4536005 - Improvement: Fix incorrect memory page accounting that causes out-of-memory errors in #SQLServer #SQLServer 2019 CU2 #SQLServer 2017 CU20 #SQLServer 2016 SP2 CU15
*****

Wading through all of the SQL Server memory-related perfmon counters to understand how they related to each other took me a really long time.  Time-series graphs that show the relationship help me tremendously, and when I started trying to account for SQL Server memory years ago I couldn't find any.  So I started to blog some time-series graphs, under the theory that either my understanding was correct and my graphs would be helpful to someone... or they'd be wrong and someone would correct me.
Well... its been about 5 years and my graphs haven't generated too much discussion, but they've really helped me 😀😀😀

Perfmon: SQL Server Database pages + Stolen pages + Free pages = Total pages
http://sql-sasquatch.blogspot.com/2013/09/perfmon-database-pages-stolen-pages.html

Working with SQL Server 2016 and some demanding ColumnStore batch mode workloads, I began to see suspicious numbers, and graphs that didn't make sense to me.  Today I got pretty close to figuring it out so I wanted to share what I've learned.

The following graphs are from a 4x10 physical server running Windows and SQL Server.  Four sockets, 4 NUMA nodes.

Perfmon has SQL Server "total" memory numbers for each SQLOS memory node, in addition to the instance-level measure.  The sum of "total node" memory across the nodes should be equal to the instance measure of "total server" memory.  That checks out.



 Database cache, free memory, and stolen memory report measures at the instance level and at the SQLOS memory node level.  Let's check out the instance measures first.  The sum of these categories should be "total server memory".  That checks out, too.



Now, since we've got instance measures of database cache, stolen, and free memory at the SQLOS memory node level as well, we can check whether they sum to "total node memory" on their respective nodes.

Uh oh.  I smell trouble...



Hey!! Trouble on this node too...


SQLOS node 001 also shows a suspicious graph...


Not to be left behind, SQLOS node 000 also shows an unexpected graph.


OK.  So either some memory is being counted multiple times... or some memory is being lost by the SQL Server memory manager and not reflected in "total node memory" and "total server memory".  Maybe I can narrow down the problem space.

Lets start by looking at Free Memory.


That graph above looks pretty good, even though it is pretty volatile.  Lets zoom in a little bit just to make sure.  The graph below makes me pretty confident - the node level "free memory" measures and the instance level "free memory" measures are aligned.



OK, since free memory accounting seems to agree between SQLOS nodes and the instance level measure, lets look at database cache.  The graph below shows that the instance measure aligns nicely with the node measures.



That leaves just Stolen Memory.  Take a look at the graph... and... yep, there's the problem.  Or at least there's a problem.


Having seen that Stolen Memory and only Stolen Memory indicates a discrepancy between the instance measure and the sum of the node measures, lets consider again the memory position on SQLOS memory node 002.  The sum of free + stolen + database is nearly 50GB greater than total node memory for an hour!  When there is not much stolen memory in that SQLOS memory node at all!!


Just to clarify what I mean by "not much stolen memory at all" on node 002...

[yeah, i name my excel workbook tabs "perfmon" and "graphs".  i use RC style format.  and its not unusual for my perfmon tabs to have over 1000 columns.]



Now... if I graph the difference between the sum of node-level stolen memory and instance-level stolen memory...


Now what if I sum database cache, stolen and free memory across all 4 SQLOS nodes and subtract the instance-level Total Server memory? Lets lay that - in transparent red - on top of the blue we just graphed.


So here's what I think is happening: in some cases of batchmode queries, a portion of memory is getting double-counted.  Its counted in one SQLOS memory node as "database cache" and in another memory node as "stolen memory"!!  That leads to the discrepancy seen here for instance and memory node measures - in stolen memory, and in total memory.

At this point i'm not sure if this represents a performance and scalability problem, or if it just masks some problems by making memory state and trends more difficult to accurately observe.  When i learn more about this, i'll blog an update post and link the two together.

Ciao for now!

[for another example of this condition, from a 2 vNUMA node VM please see the later blog post below]

SQL Server 2016 Memory Accounting Part II: Another Suspicious Example
https://sql-sasquatch.blogspot.com/2018/10/sql-server-2016-memory-accounting-part.html






Sunday, July 1, 2018

TugaIT Summer 2018 Edition: Insight-Based Administration

This will be my 3rd year presenting at TugaIT, and I'm more excited about it than ever!

This year I'll be giving a workshop with the aspirational title "Insight-Based Administration."

I've got something in mind - I want to shape and deliver a workshop that matches the needs of the folks that attend.  In order to do that... I want to invite those interested in attending to submit perfmon captures to me.

I've put .bat scripts to start and stop perfmon, as well as counter sets for SQL Server and other system concerns.  The start script will ensure that the SQL Server counters are captured for each SQL Server instance on a VM or server.  The counters will be captured in 30 second intervals.  The log file format is csv - the csv files are in some cases only 1/10 the size of blg binary files.  The files will have a maximum size of 32 mb.  If perfmon is still running 7 days later (168 hours) it will stop.

https://github.com/sqLhandLe/TugaIT_2018Summer

Over the next several days I'll be pointing out specific things about the counters in these countersets.