While at PASS Summit this year (2017), I was in a discussion about distribution of SQL Server tasks and connections. It's a common theme with me - I often work with batch workloads and I want work as evenly distributed over available compute resources as possible, for as long as possible, in order to minimize elapsed workload time.
I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus. Thomas Kejser et al used this strategy to good effect in "The Data Loading Performance Guide", which used SQL Server 2008 as a base.
My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not. The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors). Though I wasn't able to test at the time, I was skeptical: I'd always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.
I'm back home now... time to test!! This test is on Microsoft SQL Server 2017 (RC2) - 14.0.900.75, on my dual core (with HT) laptop. I've set up soft-NUMA, with one soft-NUMA node for each scheduler/logical CPU. Notice below that all of the soft-NUMA nodes (in sys.dm_os_nodes) are in a single memory node. That's actually the single physical NUMA node of my laptop.
So what happens when a parallel query is attempted on this system with soft-NUMA nodes of 1 scheduler each? It still gets DOP 4. And the actual plan shows each of the 4 parallel workers doing some work.
For the sake of completeness, I also want to show that work is being distributed across the schedulers in this setup. Let's create and populate a temp table for the tare.
Now let's run the test query 64 times.
OK, let's take the post-test measure and check the delta.
Excellent. So we've seen that even with soft-NUMA nodes of single schedulers, parallel queries are getting DOP > 1, and work is being distributed across all schedulers (although with short tasks and a single session, not very evenly). I'm putting the code I used for this test below - although the most interesting part may be setting up the soft-NUMA itself. I'll leave that for another day :-)
Ciao for now!