Wednesday, November 8, 2017

SQL Server 2016/2017 Soft-NUMA Part I

Just what you probably wanted - another "part 1" 😂😂😂 .

I'll start looking at soft-NUMA in SQL Server 2016 & 2017 - probably 3 or 4 blog posts to get where I want to go for now.  Let's start by looking at soft-NUMA on my dual core + HT laptop on SQL Server 2017 RTM CU1.

How does the soft-NUMA and NUMA configuration look right now? Plain - like you'd expect.  A single memory_node_id, single node_id.  4 scheduler_ids, 4 cpu_ids associated with that node_id/memory_node_id.

How about the lazy writer, transaction log writer, and resource monitor sessions?  Yep, they're pretty plain.  One each. Whaddya expect out of a little laptop? 😅

Pssssst. Before I did this testing, I backed up the registry. Just in case.  Please do that if you decide to follow along - even if its on your own laptop.

OK... let's add manually configured soft-NUMA to the mix. I'll use these commands in an administrator cmd.exe...

REG DELETE "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration" /f
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0" /v CPUMask /t REG_DWORD /d 0x001
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1" /v CPUMask /t REG_DWORD /d 0x002
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2" /v CPUMask /t REG_DWORD /d 0x004
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3" /v CPUMask /t REG_DWORD /d 0x008

First command above deletes a key and all of its subkeys (clearing any previous manual soft-NUMA config for SQL Server 2017). If the NodeConfiguration key doesn't exist, the benign error indicating it couldn't be deleted is seen.

Then NodeConfiguration and Node* keys are added. Finally, CPUMask values are added to each of the Node* keys - REG_DWORD data type and hex value for the CPUMask for each soft-NUMA node.

This is what that looks like...

Now, the soft-NUMA configuration can't catch hold until the next time SQL Server starts up.  So let's stop and start SQL Server from the cmd.exe.

OK. First stop - the SQL Server error log.  Hmmm.  That looks promising.

OK, now lets look at memory_node_ids & node_ids again.  There's still a single memory node - that makes sense, because this laptop is a single NUMA node construct. But now there are 4 node_ids, each with a single scheduler_id & cpu_id associated. Rather than a single node_id with all scheduler_ids/cpu_ids associated.

All right - lets look at lazy writer, tx log writer, and resource monitor again. Now there are 4 txlog writers - that's the maximum.  The documentation previously said that SQL Server 2016 would have one txlog writer per NUMA node (up to 4), on NUMA node 0.  My little laptop has the maximum number of txlog writers possible for SQL Server 2016 or 2017 😏 Apparently the txlog writers scale with soft-NUMA nodes rather than NUMA nodes.  That's probably not optimal, given that there is a cost to coordinating transaction log write activity among the multiple txlog writers.  What's worse, though, is that all 4 txlog writers are on the same cpu_id!  All of the added cost to distribute/coordinate work among 4 txlog writers, none of the benefits of spreading the writers across multiple schedulers or cores.  That's almost certainly a net loss for efficiency.

But wait! There's more! Still got a lazy writer on cpu_id 0, too.  Imagine a high write-throughput workload.  The four txlog writers will be contending with each other for CPU time on cpu_id 0.  (That will drive up logwrite waits as the txlog writers experience wait time to get on CPU.  The lazy writer will be contending with the txlog writers for CPU time as well.

There's another session on cpu_id 0 - before even starting to look at user sessions.  There's a Resource Monitor session on cpu_id 0 as well.  Under heavy load, CPU 0 is gonna be in trouble - so too will anyone waiting on cpu_id 0. For example, queries with significant logwrite wait time will see their logwrite waits amplified.

Using soft-NUMA to distribute work among CPUs is sometimes a very important strategy.  Several txlog writer threads sharing the same CPUs is probably not a good idea without some amelioration.  One thing that can be explored is using server or resource pool affinity to keep user connections and parallel tasks away from node 0 = cpu_id = 0.

There are cases where multiple txlog writers can be beneficial - but keeping them all on the same cpu_id as in this case is almost certainly a net loss. Here's a Connect item...

All right - that's it for now.  But I'll be back with more soft-NUMA fun in a little bit...

No comments:

Post a Comment