Friday, October 10, 2014

IBMPower AIX Memory Accounting: Fun with vmstat... Numperm Surprise!!

I like to account for all of the memory on AIX servers.  I often use vmstat in 30 second increments to do that at a high level: tracking computation memory, filesystem cache, and free memory.
Here's the numbers from a small LPAR inhouse.
sasquatch vm-oracle:/mtn_home/sasquatch> vmstat -v
              4194304 memory pages
              3915331 lruable pages
               393725 free pages
                    2 memory pools
               773459 pinned pages
                 95.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 28.4 numperm percentage
              1114771 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 28.4 numclient percentage
                 90.0 maxclient percentage
              1114771 client pages
                    0 remote pageouts scheduled
                    4 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                 5330 external pager filesystem I/Os blocked with no fsbuf
                 64.0 percentage of memory used for computational pages


OK... so that's 64.0% computational memory, 28.4% numperm/numclient, and 393725 free pages.  Let's see...
393725 free pages from 4194304 total pages... that's 9.4%.
  64.0%
+ 28.4%
+  9.4%
-------
 101.8%


Huh?!? I guess I could pass this off as a rounding error if it was between 100% and 101%.  But not if the total of 3 numbers is 101.8%!
What's going on?

If I look at the total of those numbers over a 12 hour period in 30 second increments, it doesn't get any better.


Well... if there's anything that will make me look like a stooge on a conference call, its when I can't add 3 numbers to get 100% reliably when its expected.  But, I'm not willing to fudge the numbers, either.

Lets consider just these numbers.

              4194304 memory pages
                 28.4 numperm percentage
              1114771 file pages

                 28.4 numclient percentage

1114771/4194304 = 26.58%

Not 28.4%? Oh.  That seems like a clue.  Is there some level of inaccuracy when numperm/numclient is calculated?  Nope. (It took me a long, long time to figure this out but because this is *my* blog, I get to look smart here.)

What about this number reported by vmstat?
              3915331 lruable pages
Hmmm...
1114771/3915331 =  28.47%
Well, look at that.  The number reported as numperm is not a percentage of *total* memory pages on the server, but rather a percentage of *lruable* pages.  And, rather than rounded as one may typically expect, the number is just truncated to one decimal point. I proved that to myself by looking at a bunch of numbers that I won't insert here because I think this post is already ending up too long.

So that's one reason my stacked graphs of computational, filesystem, and free memory have had such a jagged top edge like this.  It won't look too jagged when its this small and with a 0-100 scale... but its bothered me for a while. At the very top of this stacked graph the jagged edge is the same as the jagged blue line displayed above which flits up and down between 100.5 and 101.6.

 

              4194304 memory pages
              3915331 lruable pages
               393725 free pages
              1114771 file pages
                 64.0 percentage of memory used for computational pages



 393725/4194304 =  9.39% free memory
1114771/4194304 = 26.58% filesystem cache memory

That leaves 64.03% for computational pages. Pretty good in comparison to the 64.0% reported by vmstat.  Looking at a few other values is enough to convince that while numperm is computed as a portion of lruable pages (and then truncated rather than rounded), computational memory is computed as a portion of total pages.  But, is the reported computational memory truncated or rounded?


Total - (Free + File) percentage of memory used for computational pages
58.57 58.6
58.58 58.6
58.58 58.6
58.95 59
58.97 59
58.93 58.9

In this case, the numbers are rounded.  So, there you have it.  Numperm is calculated with 'lruable pages' as the denominator, and truncated to one decimal point.  The reported percentage for computational memory is calculated with total pages as denominator, and is rounded as typically expected.

So... from now on to get rid of the jagged edge in my stacked graphs I'll use the page counts and convert to each to percentages with total pages as the denominator: free pages, file pages, and the remaining pages from the total as computational pages.

 
 You can't see it... but trust me, the top edge is now straight.  By definition :-)