Monday, April 26, 2021

A Very Circuitous Answer to a Question about #SQLServer PLE

Usually if I use PLE at all, rather than using PLE as a first-tier indicator of an issue, I use it to confirm issues I’ve detected or suspected based on other evidence.  Below I give some of the reasons.  And ideas to look at rather than PLE.

~~

If the vm has multiple vNUMA nodes, in addition to the [\NUMA Node Memory(*)\*] counters which describe memory utilization at the vNUMA node level, you’ll see [\SQLServer:Buffer Node(*)\*] and [SQLServer:Memory Node(*)\*] counters to describe activity within the SQLOS memory nodes (unless an instance has trace flag 8015 enabled, which disables #sqlserver NUMA detection).

From the Buffer Nodes counters, the counter I use most frequently is [\SQLServer:Buffer Node(*)\Page life expectancy].

[\SQLServer:Buffer Manager\Page life expectancy] is equal to  [\SQLServer:Buffer Node(000)\Page life expectancy] if there is only 1 vNUMA node (or if NUMA detection disabled with T8015).  Otherwise,  [\SQLServer:Buffer Manager\Page life expectancy] is equal to the harmonic mean of [\SQLServer:Buffer Node(*)\Page life expectancy] values.

Here’s a blog post from 4 ½ years ago where I try to derive overall PLE as harmonic mean of node PLEs. Plus you get to see what my graphs looked like 4 ½ years ago 😊

Harmonic mean of SQLOS Buffer Node PLEs on NUMA servers 

http://sql-sasquatch.blogspot.com/2016/10/harmonic-mean-of-sqlos-buffer-node-ples.html

The main thing is, cache churn on a single vNUMA node can have a pretty significant impact on instance-wide PLE.

And the formula for determining PLE at the SQLOS memory node/vNUMA node level isn’t public.

But, let’s think through the things that would have to go into a page life expectancy calculation.

First would be size of the database cache.  The larger the database cache with a fixed workload, the longer we’d expect a given page to last in database cache.

Then we’d also need a database page insertion rate.  Database pages can be inserted due to page reads, or due to the first row insert to a previously empty page.  Improving performance of a given workload means increasing the pace of page reads, or increasing the pace of first row inserts, or both.  For a given database cache size, increasing the pace of work decreases PLE.  For a given database cache size, decreasing the pace of work will increase PLE.   That’s about as short of an explanation as I an provide about why PLE isn’t a direct measure of performance or resource utilization.

Then there’s the change in database cache size.  The database cache isn’t a fixed size.  It has a minimum size of 2% of [Target Node Memory] on each node, once it’s crossed that minimum size.  And SQL Server tries to keep it at least 20% of [Target Server Memory].  But for a given workload, the larger the database cache, the higher the PLE.

Now, the formula for PLE isn’t publicly documented.  So this is speculation on my part. But I believe that not only are observed values for sizes and rates used to calculate PLE.  I believe “velocity” or rate of change is used as well.  I’m pretty sure PLE is not only adaptive but predictive.

OK. Well, if I don’t look at PLE until I look at other stuff, what other stuff would I look at first?

I’d start with a stacked graph of [Database Cache] + [Free Memory] + [Stolen Memory], first. Put in [Target Memory], too, as a line graph.

I’ll use some graphs from some work last year.

Especially if there is no use of paging space, I’d move on to memory grants. (If there is use of paging space like our colleague noticed, investigation to see if MSM is too high, or if another memory-consuming app is on the VM can be valuable.  The memory-consumer could even be sending a SQL Server backup to a backup manager like Data Domain, Commvault, etc if using a method that doesn’t bypass file cache aka not using unbuffered IO. Let's talk 'bout paging space another day.)

Let’s add in CPU for the resource pool.

The part in the blue box looks a bit suspicious. It’s got fewer active memory grants than other “busy” times.  But CPU utilization is higher.

 

Huh. If we look at overall SQL Server memory and this time bring in PLE as a line graph, that period looks pretty exceptional, too. High CPU, and growing PLE.

 

Oh. A huge amount of backup traffic.  (SQL Server backups can use a lot of CPU if they are compressed or encrypted.)   So backup traffic, some active memory grants, some pending memory grants.

And if I look at granted memory vs the maximum allowed workspace memory, I see the dynamic that is leading to pending grants.


Max Server Memory on this instance is 960 GB.  But memory grants for a given resource pool are not calculated directly against MSM.  Rather, they are calculated against the maximum target for the relevant resource semaphore for the resource pool.  In this SQL Server instance, no resource pools have been added beyond default (which contains all of the user workload sessions) and the internal resource pool.  So the graph above works out and makes sense.  (If there are multiple non-internal resource pools - eg default + ORM_unrealistic - that simultaneously have memory grants, a graph like this might not make sense and info from the DMVs may be needed.  Because [Reserved Memory] in perfmon is global across all resource pools.)

Notice how when [Reserved Memory] is layered in front of [Granted Memory], only a small portion of [Granted Memory] is visible at the top.  Looking across all the queries and workload groups, as a whole, the queries are asking for wayyy more workspace memory than they need.  Lots of room for tuning.

Now there’s one more unusual thing about the area in the blue box.  That’s a lot of room between the granted memory and the maximum workspace memory. Why are there still so many pending memory grants for so long?  It has to do with what query is at the front of the waiting line, and how big of a grant it wants.

At least one of the workload groups at this time still had a 25% maximum grant per query.  But that difference between max workspace memory and granted memory is way more than 25%, isn’t it?

Here’s a rule about memory grants that matter when start getting close to the max: for a grant to succeed, 1.5x the grant must be available for granting.  That way, at least half the size of the most recent grant is still available for grants afterward.  It’s a way to make sure a small number of very grant hungry queries don’t permanently squeeze out queries that request smaller grants.    

OK, that’s probably more than anyone needed to see today 😊

Just remember – PLE by itself doesn’t tell me much of what’s going on in SQL Server.

In the box below PLE is constantly rising.  But the workloads suffer from the same performance problem inside the box as outside the box – really long resource_semaphore waits for memory grants.

No comments:

Post a Comment