Friday, October 10, 2014

IBMPower AIX Memory Accounting: Fun with vmstat... Numperm Surprise!!

I like to account for all of the memory on AIX servers.  I often use vmstat in 30 second increments to do that at a high level: tracking computation memory, filesystem cache, and free memory.
Here's the numbers from a small LPAR inhouse.
sasquatch vm-oracle:/mtn_home/sasquatch> vmstat -v
              4194304 memory pages
              3915331 lruable pages
               393725 free pages
                    2 memory pools
               773459 pinned pages
                 95.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 28.4 numperm percentage
              1114771 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 28.4 numclient percentage
                 90.0 maxclient percentage
              1114771 client pages
                    0 remote pageouts scheduled
                    4 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                 5330 external pager filesystem I/Os blocked with no fsbuf
                 64.0 percentage of memory used for computational pages

OK... so that's 64.0% computational memory, 28.4% numperm/numclient, and 393725 free pages.  Let's see...
393725 free pages from 4194304 total pages... that's 9.4%.
+ 28.4%
+  9.4%

Huh?!? I guess I could pass this off as a rounding error if it was between 100% and 101%.  But not if the total of 3 numbers is 101.8%!
What's going on?

If I look at the total of those numbers over a 12 hour period in 30 second increments, it doesn't get any better.

Well... if there's anything that will make me look like a stooge on a conference call, its when I can't add 3 numbers to get 100% reliably when its expected.  But, I'm not willing to fudge the numbers, either.

Lets consider just these numbers.

              4194304 memory pages
                 28.4 numperm percentage
              1114771 file pages

                 28.4 numclient percentage

1114771/4194304 = 26.58%

Not 28.4%? Oh.  That seems like a clue.  Is there some level of inaccuracy when numperm/numclient is calculated?  Nope. (It took me a long, long time to figure this out but because this is *my* blog, I get to look smart here.)

What about this number reported by vmstat?
              3915331 lruable pages
1114771/3915331 =  28.47%
Well, look at that.  The number reported as numperm is not a percentage of *total* memory pages on the server, but rather a percentage of *lruable* pages.  And, rather than rounded as one may typically expect, the number is just truncated to one decimal point. I proved that to myself by looking at a bunch of numbers that I won't insert here because I think this post is already ending up too long.

So that's one reason my stacked graphs of computational, filesystem, and free memory have had such a jagged top edge like this.  It won't look too jagged when its this small and with a 0-100 scale... but its bothered me for a while. At the very top of this stacked graph the jagged edge is the same as the jagged blue line displayed above which flits up and down between 100.5 and 101.6.


              4194304 memory pages
              3915331 lruable pages
               393725 free pages
              1114771 file pages
                 64.0 percentage of memory used for computational pages

 393725/4194304 =  9.39% free memory
1114771/4194304 = 26.58% filesystem cache memory

That leaves 64.03% for computational pages. Pretty good in comparison to the 64.0% reported by vmstat.  Looking at a few other values is enough to convince that while numperm is computed as a portion of lruable pages (and then truncated rather than rounded), computational memory is computed as a portion of total pages.  But, is the reported computational memory truncated or rounded?

Total - (Free + File) percentage of memory used for computational pages
58.57 58.6
58.58 58.6
58.58 58.6
58.95 59
58.97 59
58.93 58.9

In this case, the numbers are rounded.  So, there you have it.  Numperm is calculated with 'lruable pages' as the denominator, and truncated to one decimal point.  The reported percentage for computational memory is calculated with total pages as denominator, and is rounded as typically expected.

So... from now on to get rid of the jagged edge in my stacked graphs I'll use the page counts and convert to each to percentages with total pages as the denominator: free pages, file pages, and the remaining pages from the total as computational pages.

 You can't see it... but trust me, the top edge is now straight.  By definition :-) 

Wednesday, September 24, 2014

Trace flag 4199 impact on Optimizer Memory Use; RESOURCE_SEMAPHORE_QUERY_COMPILE wait logjam

 Its an ugly query, but I have to show it some love.

This is one of the queries responsible for the RESOURCE_SEMAPHORE_QUERY_COMPILE wait logjam I noticed a few days ago.

On a whim, I asked the performance engineer to disable trace flag 4199.  The logjam disappeared - but system workload performance wasn't great.  I didn't expect it to be: to get the query concurrency desired with the available query memory and to get efficient plans at least the join re-ordering of trace flag 4101 is needed and all the other optimizer fixes rolled into trace flag 4199 probably also (in sum) make a positive contribution.

But with the logjam now associated with trace flag 4199, I've got to figure out why T4199 is resulting in more optimizer memory consumption.

I grabbed an estimated plan from the query with the following tacked on:

Then I cleared procedure cache and grabbed an estimated plan with the same trace flag list minus T4199.

Both query plans indicated early abort of the optimizer:


With T4199, the diagnostic trace flags 2372 & 2373 spit out 89,932 lines of text, finishing at 7628 for memory.
Without trace flag 4199, they spit out 119,790 lines of text, finishing at 3652 for memory.

Up to line 55,051 the two diagnostic captures are almost identical - the numbers stay within 1 of each other.  After line 55,051 the rules and the memory amounts diverge, as indicated by the graph above.

I'm gonna need a bigger boat :-)

Hopefully I can pique Paul White's interest, and maybe he can teach me an efficient way of picking some insight out of the 29,858 text lines after the T4199 version of the query diverges.  Ultimately I'd like to get the query not to time out of the optimizer... but getting it not to cause RESOURCE_SEMAPHORE_QUERY_COMPILE wait logjam is the first order of business.