Tuesday, November 14, 2017

#tsql2sday 96 Reflecting on those that guided my path...

T-SQL Tuesday is always a bright spot for me, whether I write up a post myself or not.  If you read my blog but have never participated... consider it!! Its a fairly low-friction way to get in with a good group!

Today SQL Server bloggers from all over are reflecting on those that have inspired, guided and influenced them.  So I'll name a few names and share some stuff.  Over time, I'll be able to share even more names, maybe a story or a few.  These folks are heroes to me... but many of them I also consider friends. Which makes me a very blessed individual.

I got a late start working with software - after grad school and a bit of wandering.  The name CR Boardman won't be familiar in SQL Server circles, but his personality looms large in memories of my early years working with software.  He loves investigation, and has a fierce determination for rooting out technical problems.  His problem-solving approach knows no boundaries: he'd trace a problem workflow on the database side, then track down the client-side code responsible for making the RPC code calls on the database.  He refused to be siloed - although the organizational responsibility for a given code module or feature may lie within a different group, that never meant trouble-shooting and investigation had to stop at an arbitrary border.  I remember some late nights in conference rooms, working through some challenging problems.  More than any particular problem-solving approach or tool, I remember his enthusiasm and excitement.  The last time we worked together was nearly 10 years ago now - I hope my work now embodies those characteristics I admired in CR.

Andy Yun (@SQLBek) has been mentioned in a number of blog posts today, and I'll mention him too!  Andy's put together a few presentations in the last couple of years that many people have really benefited from.  He's not satisfied to simply enhance his own speaking resume, though.  Andy does a great job of encouraging folks to get involved and share their experiences, expertise and knowledge.  In my case, he's employed some friendly needling to make sure I don't crawl back into a cave to avoid blogging and speaking 😊.  Yep, sometimes I've absolutely needed that.  Now I hope to not only remain on the path of sharing my adventures through Twitter, my blog, and some presentations... I hope to encourage others to do the same.  Maybe even do some considerate needling... if I have the guts😊.

Niko Neugebauer (@NikoNeugebauer) has a lot of applied energy.  A real lot! Niko is often a catalyst - causing others around him to also start engaging with technical problems and solutions in new ways.  When you spend time with Niko, a striking devotion comes to the fore: a devotion to constant forward motion.  If there's time left today, there's time left to try something new, or learn something new for tomorrow.  Going to discuss blog or presentation ideas?  Niko will probably ask you what new contributions will be in the presentation - how are you going to keep things moving forward?  This might give some insight into how he has generated 115 blog posts to date on SQL Server columnstore indexes.  I hope my appreciation for innovation is as apparent as it is in Niko.  His energy and enthusiasm convinced me to participate in Tuga IT for the last two years.  And I plan to keep participating in the future!
Ewald Cress (@sqlOnIce) is hosting the T-SQL Tuesday posts this time around - check out his blog for a round-up of all of the posts for the November 14th 2017 adventure... and a lot of other creative SQLOS deep-dives. 
Ewald has helped to guide my approach to my blog and to speaking opportunities - though he may not know the extent.  He and I both tackle questions that tend to be off the beaten path, though from different perspectives.  He's made some pretty deep topics into creative blog posts and presentations (which I'll hopefully get to see sometime). His approach to providing details on the parts we might otherwise take for granted impresses me - and hopefully I'll be able to fill out some of my blog posts in a similar manner.

That's all I'll write for tonight... as I mentioned before there are additional names... and stories... which will almost certainly come spilling out of my over a shared meal - or maybe even on the pages of this blog.

Ciao for now!

Thursday, November 9, 2017

Windows paging file usage & SQL Server 2017 RTM: sys.dm_os_sys_memory is Tricky

OK... using my laptop SQL Server 2017 RTM CU1 instance to show something tricky about reconciling paging space usage.  The SQL Server 2016 versions I checked briefly displayed the same behavior I describe below.

A new tool was added to the belt recently - I like the idea.  Since SQL Server will respond to server level low memory messages, I like being able to see the state of server memory.  For that has Microsoft wrought sys.dm_os_memory.

sys.dm_os_sys_memory (Transact-SQL)

There's good stuff in there for investigation of OOM scenarios.

Let's see what turns up on my laptop.

Based on the column names and values above, seems natural to think:
total_page_file_kb - available_page_file_kb = used page file kb
11027476 kb - 3047668 kb = 7979808 kb

Holy cow! Is my laptop using nearly as much paging space as there is RAM on the laptop??
Weird. If something forced that much paging space use relative to RAM on the laptop... I certainly wouldn't expect system_memory_state_desc = 'Available physical memory is high'!!

Lets double-check by using wmic in an administrator cmd.exe.

OK... its clear already something is afoot.  AllocatedBaseSize is the current size of the pagefile indicated in 'Description', in mb.  So... 2880 mb is the size of the pagefile.  And only 1684 mb currently used, with a peak usage of 2752 mb.  Almost 8 gb of pagefile used as reported by SQL Server sys.dm_os_sys_memory?  Nope - no way.

So what's up?  Sometimes memory accounting in SQL Server takes some work. 😀😀😀

I stared at the numbers for a bit and had a hunch... double-checked and it tied out.

11027476 kb - 8077528 kb = 2949948 kb ≈ 2880mb


total_page_file_kb - total_physical_memory_kb =AllocatedBaseSize

Aha!  Looks like total_page_file_kb column is misnamed. Because...

total_physical_memory_kb + AllocatedBaseSize = total_page_file_kb

So total_page_file_kb is really more like 'total virtual memory kb'!

OK.   Let's look back at the definitions in BoL.

sys.dm_os_sys_memory (Transact-SQL)

The description of available_page_file_kb is wrong. Since total_page_file_kb is described as the commit limit - or the virtual memory as sum of physical memory + paging space - the available_page_file_kb is then not best defined as an available portion of the paging file but as an available portion of the commit limit or virtual memory.

All right - enough for now.  See you again soon!

Wednesday, November 8, 2017

SQL Server 2016/2017 Soft-NUMA Part I

Just what you probably wanted - another "part 1" 😂😂😂 .

I'll start looking at soft-NUMA in SQL Server 2016 & 2017 - probably 3 or 4 blog posts to get where I want to go for now.  Let's start by looking at soft-NUMA on my dual core + HT laptop on SQL Server 2017 RTM CU1.

How does the soft-NUMA and NUMA configuration look right now? Plain - like you'd expect.  A single memory_node_id, single node_id.  4 scheduler_ids, 4 cpu_ids associated with that node_id/memory_node_id.

How about the lazy writer, transaction log writer, and resource monitor sessions?  Yep, they're pretty plain.  One each. Whaddya expect out of a little laptop? 😅

Pssssst. Before I did this testing, I backed up the registry. Just in case.  Please do that if you decide to follow along - even if its on your own laptop.

OK... let's add manually configured soft-NUMA to the mix. I'll use these commands in an administrator cmd.exe...

REG DELETE "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration" /f
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0" /v CPUMask /t REG_DWORD /d 0x001
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1" /v CPUMask /t REG_DWORD /d 0x002
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2" /v CPUMask /t REG_DWORD /d 0x004
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3" /v CPUMask /t REG_DWORD /d 0x008

First command above deletes a key and all of its subkeys (clearing any previous manual soft-NUMA config for SQL Server 2017). If the NodeConfiguration key doesn't exist, the benign error indicating it couldn't be deleted is seen.

Then NodeConfiguration and Node* keys are added. Finally, CPUMask values are added to each of the Node* keys - REG_DWORD data type and hex value for the CPUMask for each soft-NUMA node.

This is what that looks like...

Now, the soft-NUMA configuration can't catch hold until the next time SQL Server starts up.  So let's stop and start SQL Server from the cmd.exe.

OK. First stop - the SQL Server error log.  Hmmm.  That looks promising.

OK, now lets look at memory_node_ids & node_ids again.  There's still a single memory node - that makes sense, because this laptop is a single NUMA node construct. But now there are 4 node_ids, each with a single scheduler_id & cpu_id associated. Rather than a single node_id with all scheduler_ids/cpu_ids associated.

All right - lets look at lazy writer, tx log writer, and resource monitor again. Now there are 4 txlog writers - that's the maximum.  The documentation previously said that SQL Server 2016 would have one txlog writer per NUMA node (up to 4), on NUMA node 0.  My little laptop has the maximum number of txlog writers possible for SQL Server 2016 or 2017 😏 Apparently the txlog writers scale with soft-NUMA nodes rather than NUMA nodes.  That's probably not optimal, given that there is a cost to coordinating transaction log write activity among the multiple txlog writers.  What's worse, though, is that all 4 txlog writers are on the same cpu_id!  All of the added cost to distribute/coordinate work among 4 txlog writers, none of the benefits of spreading the writers across multiple schedulers or cores.  That's almost certainly a net loss for efficiency.

But wait! There's more! Still got a lazy writer on cpu_id 0, too.  Imagine a high write-throughput workload.  The four txlog writers will be contending with each other for CPU time on cpu_id 0.  (That will drive up logwrite waits as the txlog writers experience wait time to get on CPU.  The lazy writer will be contending with the txlog writers for CPU time as well.

There's another session on cpu_id 0 - before even starting to look at user sessions.  There's a Resource Monitor session on cpu_id 0 as well.  Under heavy load, CPU 0 is gonna be in trouble - so too will anyone waiting on cpu_id 0. For example, queries with significant logwrite wait time will see their logwrite waits amplified.

Using soft-NUMA to distribute work among CPUs is sometimes a very important strategy.  Several txlog writer threads sharing the same CPUs is probably not a good idea without some amelioration.  One thing that can be explored is using server or resource pool affinity to keep user connections and parallel tasks away from node 0 = cpu_id = 0.

There are cases where multiple txlog writers can be beneficial - but keeping them all on the same cpu_id as in this case is almost certainly a net loss. Here's a Connect item...


All right - that's it for now.  But I'll be back with more soft-NUMA fun in a little bit...

Tuesday, November 7, 2017

Investigating OOMs on SQL Server 2016 SP1 CU4 - Part I

Today OOMs were reported on a physical server with 1.5TB of RAM, running SQL Server 2016 SP1 CU4.  I started to poke around a bit... pretty sure we'll need to set up perfmon and maybe a SQL agent job to capture numbers every 5 minutes.  Capturing the memory clerk numbers seems like the place to be.

Here's the SQL if you want to play along at home. But I'm too sleepy to do anything with it tonight.

;WITH clerk_kb 
     AS (SELECT [type],
                --roll up 'ACRUserStore%' & 'SecCtxtACRUserStore' numbers
                CASE WHEN [name] LIKE 'ACRUserStore%' THEN 'ACRUserStore' 
                     WHEN [name] LIKE 'SecCtxtACRUserStore%' THEN 'SecCtxtACRUserStore' 
                ELSE [name] END clerk_name, 
                memory_node_id, pages_kb
        FROM sys.dm_os_memory_clerks)
SELECT GETDATE() AS capture_tm, clerk_name, memory_node_id, SUM(pages_kb) pages_kb
FROM clerk_kb
--keep the two clerks per database - dbname and {objperm + dbname} - out of results 
WHERE NOT EXISTS (SELECT 1 FROM sys.databases sdb 
                  WHERE PATINDEX('%' + sdb.[name] + '%', clerk_name) > 0)
GROUP BY clerk_name, memory_node_id
HAVING SUM(pages_kb) > 40000
ORDER BY clerk_name, memory_node_id;

Monday, November 6, 2017

SQL Server 2017 - soft-NUMA limits MAXDOP? Nope.

While at PASS Summit this year (2017), I was in a discussion about distribution of SQL Server tasks and connections.  It's a common theme with me - I often work with batch workloads and I want work as evenly distributed over available compute resources as possible, for as long as possible, in order to minimize elapsed workload time.

I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus.  Thomas Kejser et al used this strategy to good effect in "The Data Loading Performance Guide", which used SQL Server 2008 as a base.

My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not.  The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors).  Though I wasn't able to test at the time, I was skeptical: I'd always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.

I'm back home now... time to test!! This test is on Microsoft SQL Server 2017 (RC2) - 14.0.900.75, on my dual core (with HT) laptop.  I've set up soft-NUMA, with one soft-NUMA node for each scheduler/logical CPU.  Notice below that all of the soft-NUMA nodes (in sys.dm_os_nodes) are in a single memory node.  That's actually the single physical NUMA node of my laptop.

So what happens when a parallel query is attempted on this system with soft-NUMA nodes of 1 scheduler each? It still gets DOP 4. And the actual plan shows each of the 4 parallel workers doing some work.

For the sake of completeness, I also want to show that work is being distributed across the schedulers in this setup.  Let's create and populate a temp table for the tare.

Now let's run the test query 64 times.

OK, let's take the post-test measure and check the delta.

Excellent. So we've seen that even with soft-NUMA nodes of single schedulers, parallel queries are getting DOP > 1, and work is being distributed across all schedulers (although with short tasks and a single session, not very evenly).  I'm putting the code I used for this test below - although the most interesting part may be setting up the soft-NUMA itself.  I'll leave that for another day :-)

Ciao for now!

SELECT oss.scheduler_id, oss.cpu_id, osn.node_id, osn.memory_node_id, 
       osn.online_scheduler_mask, osn.online_scheduler_count 
FROM sys.dm_os_nodes osn
JOIN sys.dm_os_schedulers oss ON osn.node_id = oss.parent_node_id
WHERE osn.node_state_desc = 'ONLINE'
  AND oss.status = 'VISIBLE ONLINE';

    (capture_tm DATETIME,
     scheduler_id INT,
     total_cpu_usage_ms BIGINT
         (capture_tm, scheduler_id, total_cpu_usage_ms))

SELECT getdate() capture_tm, scheduler_id, total_cpu_usage_ms
FROM sys.dm_os_schedulers

DECLARE @ij INT = 1, @redirect INT;
WHILE @ij < 65
SELECT @redirect = SUM(1) FROM sys.dm_os_nodes os
CROSS JOIN sys.dm_os_nodes os2
OPTION (querytraceon 8649);
SET @ij = @ij +1;

SELECT getdate() capture_tm, scheduler_id, total_cpu_usage_ms
FROM sys.dm_os_schedulers WHERE status = 'VISIBLE ONLINE';

;WITH max_cte AS (SELECT scheduler_id, 
                         MAX(total_cpu_usage_ms) max_ms 
                  FROM #t GROUP BY scheduler_id),
      min_cte AS (SELECT scheduler_id, 
                         MIN(total_cpu_usage_ms) min_ms 
                  FROM #t GROUP BY scheduler_id)
SELECT max_cte.scheduler_id, 
       max_cte.max_ms - min_cte.min_ms AS delta_cpu_usage_ms
FROM max_cte 
JOIN min_cte ON max_cte.scheduler_id = min_cte.scheduler_id;

Wednesday, October 18, 2017

Oops. When is a #SQLServer maximum not really *the* maximum? (Workspace Memory)

When is a maximum not really the maximum?
When it's a maximum for an explicitly or implicitly modified default.
Whether "the definitive documentation" says so or not.

Yesterday on Twitter #sqlhelp this question came up.


Aha! I thought to myself.  For this I am purposed! To show how Perfmon and DMV data tie out!

So I responded thusly in part I...


And thusly in Part II...


To wit, I included a clever graphic in the erstwhile conclusion of my rejoinder...

Indeed the internal resource pool defies limitation by the Resource Governor; the sum of resource semaphore target_memory_kb values for that pool is the true maximum workspace memory limit for the SQL Server instance.  But is that number necessarily the same as is reported by "SQLServer:Memory Manager\Maximum Workspace Memory (KB)"?

Based on the two sources I cited, that may look to be the case.  From the Microsoft.com link above for the MemoryManager object:

But there are wrinkles.  As it turns out, the number reported in that perfmon counter is the sum of resource semaphore targets for the default pool.  Two factors can cause the default pool resource semaphores' targets to be lower than those of the internal pool: conditions in the max_memory_percent and min_memory_percent of Resource Governor pools other than the internal pool.

Consider what happens when default pool max_memory_percent is changed from 100 to 50, with min_memory_percent remaining at 0 for all other pools.  The RP_maxmem_100 pool still reports the same sum of target values as the internal pool.  Both of these pools report a total number twice as large as the default pool reports.  And its the default pool's sum of resource semaphore targets that gets reported by "SQLServer:Memory Manager\Maximum Workspace Memory (KB)".

OK, now lets bring default pool max_memory_percent back to 100.  What happens if a pool other than default has a nonzero min_memory_percent defined?  Pool RP_maxmem_100 has min_memory_percent 27 in the example below. Remember, the internal pool is unphased by design.  No other pool has a minimum memory set, so the sum of Pool RP_maxmem_100 resource semaphore targets is the as for the internal pool.  But the default pool now has a semaphore target total and "Query exec memory target" value of 100% - 27% = 73% => 73% * 192890640 = 140861040 (when rounded down to the nearest multiple of 8k).

As Paul Harvey would say:
Now you know... the rest of the story.

Ciao for now!

perfmon "SQLServer:Resource Pool Stats(*)\Query exec memory target (KB)" ties out well with sys.dm_exec_query_resource_semaphores

SELECT opc.[object_name], opc.instance_name, 
       opc.cntr_value AS [Query exec memory target (KB)], 
       SUM(qrs.target_memory_kb) total__target_memory_kb
FROM sys.dm_os_performance_counters opc
JOIN sys.dm_resource_governor_resource_pools rp ON opc.instance_name = rp.[name]
JOIN sys.dm_exec_query_resource_semaphores qrs ON rp.pool_id = qrs.pool_id
WHERE counter_name = 'Query exec memory target (KB)'
GROUP BY rp.pool_id, opc.[object_name], opc.instance_name, opc.cntr_value
ORDER BY rp.pool_id

And the results...