Thursday, May 23, 2019

SQL Server 2016: Here a NUMA, there a NUMA... Part 2

This post is a continued look at some memory behaviors explored earlier in the following blog post.

SQL Server 2017: here a NUMA, there a NUMA... Part 1

The previous blog post gives detail around a significant performance degradation event experienced on a 4 node perf/scale test system. This blog post instead details observations from a production 4 node system.

4x14 vcpus, 1472 total GB vRAM (368 GB per vNUMA node).

Max server memory set at 1375000 mb. 

See if you can spot the unexpected system state below.  🙂

The [db cache] on node 000 is ginormous!!

There is only 368 GB of vRAM on the vNUMA node.  Node 000 is accounting for [db cache] that alone approaches the full amount of vRAM on 3 vNUMA nodes!!

How does that happen if instance wide [total] memory stays constant at [max server memory], and [total] memory stays constant on the other 3 NUMA nodes at [target]?

Got memory counted twice.  Comparing [stolen] memory from all 4 nodes to the instance-wide measure of [stolen] memory shows that something's up 🙂  On this system, nodes 001, 002, and 003 each have a portion of their memory which they account for as [stolen] but node 000 accounts for as [db cache].

There were no free list stalls during this period, thankfully.Somehow the instance is able to eke along keeping a small amount of free memory throughout.

But there is still a cost to pay for this unusual memory state.

I suspect that the cost of cross-node accounting and double-counting the memory is in part reflected by latch waits.

No comments:

Post a Comment