Thursday, April 25, 2013

AIX hpmstat utility on Power 7; Monitoring TLB misses

I believe: if its not predictable, its not a "best practice", its just a "practice".

There's more love for larger memory page sizes these days... the fewer memory pages the less expensive they are to manage.  But... allowing multiple memory page sizes increases the risk of paging memory contents to disk.  Sure, there are ways to mitigate most of the risks.  I'll take predictable over possibly perfect every day.  If a few more TLB misses is the cost of being sure database cache won't be paged out... in the absence of another guarantee... I'll probably take the deal.
So I'd like to evaluate the Oracle workloads I know and love 4k memory pages only (disabling medium 64k pages), 4k/64k memory pages, 4k/64k/16mb memory pages, and 4k/16mb memory pages.  Could do that with our standard workload performance tests.  But in this case I don't want to measure performance alone.  I also want to do some type of comparative risk assessment - what's to gain by minimizing TLB misses and what's to lose if page out to disk is incurred. 

The hpmstat utility will be part of the stew.

The IBM hpmstat documentation above indicates that groups of events can be monitored... but doesn’t mention where to find the groups. 

I poked around on the system til I found file /usr/pmapi/lib/POWER7.gps .

Group 11 includes TLB misses, as below.  There are 275 other groups of events that can be counted!  Jackpot!  
The "POWER7 and POWER7+ Optimization and Tuning Guide" ( has this to say about hpmstat, hpmcount (not nearly as interesting to me), and tprof (don't know a thing about it, maybe I'll learn next year):

"The POWER7 processor provides a powerful on-chip PMU that can be used to count the number of occurrences of performance-critical processor events. A rich set of events is countable; examples include level 2 and level 3 d-cache misses, and cache reloads from local, remote, and distant memory."

Wow... so I can find out about TLB misses... and later I can probably exploit this for monitoring accesses to affinitized memory.  Gold mine!

# hpmstat -g11
Execution time (wall clock time): 1.00030447 seconds

Group: 11
Counting mode: user+kernel+hypervisor
Counting duration: 24.007015394 seconds
  PM_BTAC_MISS (BTAC Mispredicted)                            :         5385572
  PM_TLB_MISS (TLB Miss (I + D))                              :          224777
  PM_DTLB_MISS (TLB reload valid)                             :          184261
  PM_ITLB_MISS (ITLB Reloaded (always zero on POWER6))        :           40516
  PM_RUN_INST_CMPL (Run_Instructions)                         :        53397847
  PM_RUN_CYC (Run_cycles)                                     :       216602899

Normalization base: time
Counting mode: user+kernel+hypervisor
  Derived metric group: General
  [   ] Run cycles per run instruction                        :           4.056
u=Unverified c=Caveat R=Redefined m=Interleaved

But, lest I get too lofty in my thoughts thinking this will be just as accessible on systems as my trusty vmstat and iostat fallbacks...

$ hpmstat -g11
hpmstat ERROR - pm_set_program_mm: : Must be root to count globally.

No comments:

Post a Comment