This post is a continued look at some memory behaviors explored earlier in the following blog post.
SQL Server 2017: here a NUMA, there a NUMA... Part 1
The previous blog post gives detail around a significant performance degradation event experienced on a 4 node perf/scale test system. This blog post instead details observations from a production 4 node system.
4x14 vcpus, 1472 total GB vRAM (368 GB per vNUMA node).
Max server memory set at 1375000 mb.
See if you can spot the unexpected system state below. 🙂
The [db cache] on node 000 is ginormous!!
There is only 368 GB of vRAM on the vNUMA node. Node 000 is accounting for [db cache] that alone approaches the full amount of vRAM on 3 vNUMA nodes!!
How does that happen if instance wide [total] memory stays constant at [max server memory], and [total] memory stays constant on the other 3 NUMA nodes at [target]?
Got memory counted twice. Comparing [stolen] memory from all 4 nodes to the instance-wide measure of [stolen] memory shows that something's up 🙂 On this system, nodes 001, 002, and 003 each have a portion of their memory which they account for as [stolen] but node 000 accounts for as [db cache].
There were no free list stalls during this period, thankfully.Somehow the instance is able to eke along keeping a small amount of free memory throughout.
But there is still a cost to pay for this unusual memory state.
I suspect that the cost of cross-node accounting and double-counting the memory is in part reflected by latch waits.