Friday, May 17, 2013

Should cpu_scale_memp be increased for Power7+ and Power7?

**update from sql_sasquatch 20130906**
I've decided not to tinker with cpu_scale_memp and leave it at its default value of 8.  Yes, the cores are more capable, so I'd expect lrud rate of scan/steal to increase from the past.  But the systems I work with also have supercharged the amount of ram per LPAR in comparison to the past.  Doubling cpu_scale_memp from 8 to 16 as I was proposing cuts the number of memory pools (thus the number of lrud threads) in half.  That's fine - but on average it will result in twice as large of frame lists for those lrud threads to work.  I want to avoid that.  When lrud does have a lot of work to do, I want it working with shorter (hopefully) more efficient lists.
***end update from sql_sasquatch 20130906**


A little history before meandering and ending up at my plans to test Oracle on AIX Power 7  AIX 7.1 with cpu_scale_memp increased from 8 to 16.

Year  OS Level     Processor
2010  AIX V7.1     Power7
2007  AIX V6.1     Power6
2004  AIX 5L 5.3   Power5
2002  AIX 5L 5.2
2001  AIX 5L 5.1   Power4
1999  AIX 4.3.3
1998  AIX 4.3.2
1998  AIX 4.3.1
1997  AIX 4.3


Previous to 2001 AIX was developed and deployed on single core processors.  There was one logical and one physical CPU per socket on these systems.

AIX 5L 5.1 was released in 2001.  The Power4 processors introduced that year were the first dual core processors in the family.  These were the first systems in the IBM Power family with 2 logical and 2 physical CPUs per socket.

AIX 5L 5.3 was released in 2004.  The dual core Power5 processors introduced that year were the first in the family to support simultaneous multithreading (SMT).  Enabling this feature presented two logical CPUs from each core to the OS for scheduling.  These were the first Power systems with 4 logical CPUs and 2 physical cores per socket.

AIX 6.1 was released in 2007, as were the dual core Power6 processors.  With SMT enabled, these systems also presented 4 logical CPUs from the 2 physical cores per socket.

AIX 7.1 was introduced in 2010, as were Power7 processors.  Power7 processors were available with 4, 6, or 8 active cores per socket.  These processors supported SMT, and added the SMT4 feature to present 4 logical CPUs from each physical core.  So these systems could present 4, 6, 8, 12, 16, 24, or 32 logical CPUs per socket from the matrix of no-SMT, SMT, or SMT4 and 4, 6, or 8 active cores per socket..

With this much history, its easy for stuff to get lost.  Maybe cpu_scale_memp is an example.  Look around on the Intertubes - you won't see too much about this AIX kernel parameter.  I love reading Jacqui Lynch's stuff.  She is one of my "goto" sources for AIX performance tuning.  Interesting that one of the only tuning references for cpu_scale_memp is one of her presentations, where vmo tuning of the mempools parameter is discouraged in favor of tuning cpu_scale_memp (page 6 of the pdf below).  I don't think the mempools parameter is around anymore on Power7 AIX 7.1 systems.

http://regions.cmg.org/regions/sccmg/Presentations/2007_Fall_SCCMG_Meeting/Lynch-AIX_Performance_Tuning.pdf

The cpu_scale_memp parameter gives the maximum ratio of logical processors to memory pools in an LPAR.  The default value is 8.  So for each 8 logical processors, there will be at least one memory pool.  (The memory pools come from larger structures called vmpools or memory domains.  In turn, the memory pools are divided into framesets.)  Each memory pool has its own lrud (least recently used daemon, the page stealer).  Each lrud considers minfree and maxfree, the high and low water marks for free 4k memory frames, within its own memory pool.  Cross the low water mark, and the page stealer works until maxfree pages are free in the memory pool.

Nobody really talks about how or why to tune cpu_scale_memp.  But the introduction of the cpu_scale_memp parameter gives some insight.

AIX 5.2 http://www-01.ibm.com/support/docview.wss?uid=isg1IY57046
Need strict_maxclient and cpu_scale_memp tuning parameters 
"Customer sees too much paging to paging space with maxclient set high and paging of client pages before the system runs out of free pages with maxclient set low."

However, without any additional information about the issue, or how strict_maxclient and/or cpu_scale_memp address the issue, there's not a lot to go on.

Memory and process(user thread) affinity both concern stickiness to the physical cores.  For memory affinity, try to get memory that is optimized to the current physical core.  For process affinity, try to make sure that a given user thread will be affinitized to the first core it executed on (or a specified set of cores with similar desired properties).

There are lots of knobs and switches to turn to govern memory/process affinity and the VMM in AIX, but cpu_scale_memp is one of the basic determinants.

Now that there are more threads per core, and more cores per socket - why not increase the ratio of logical processors to memory pools?  If the memory pool is larger, it may balance memory usage patterns more easily.  The larger pools may absorb temporary fluctuations more easily.  If memory_affinity is disabled, then the memory pools will be evenly sized regardless of the physical placement of cores and memory that make up the LPAR.  Midstream in AIX 5.3 the list-based LRU page_steal_algorithm was introduced, increasing the efficiency of lrud.  So why not allow the system fewer larger memory pools?

IBM has further enhanced performance with memory and scheduling affinity management by the Active System Optimizer (ASO) daemon (included with AIX), and the Dynamic System Optimizer (an additional paid license module) on top of that.
http://pic.dhe.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.optimize/optimize_pdf.pdf

But it appears to me that returning to tune some of the fundamentals may lead to more predictable behavior and performance than simply adding more stuff on top.  Perhaps we'll see.  Hopefully in a round of IBM Oracle testing in the near future, I'll be able to circle back to this and include some comparative tests at cpu_scale_memp set to default and later to 16 on a Power7 or Power7+ system. In addition to monitoring memory with tons of vmstat, we'll use kdb to monitor memp and frs... perfpmr style.  :)



3 comments:

  1. It seems that the "Dynamic System Optimizer" may have evolved into the "Dynamic Platform Optimizer". Check out page 30 of ATS presentation "Architecting Enterprise Solutions" from Tracy Smith. In this document:
    ****
    – Available at no additional charge for the new Power 770 & 780 systems
    – Also available for the Power 795 with eFW 7.6
    – Feature code EB33 must be ordered, either initially or as MES
    – Requires eFW 7.6 and is supported by all operating systems
    ****
    https://www.ibm.com/developerworks/mydeveloperworks/wikis/form/anonymous/api/library/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/document/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/a2e9bb60-c5cd-4166-b699-a348f65a8c3c/media/Architecting%20Enterprise%20Solutionso%20VUG%20Dec%2018.pdf?version=1

    ReplyDelete
  2. Indeed. The recent P7 Virtualization Best Practice document describes the work - and limitations - of the Dynamic Platform Optimizer on pages 15 and 16.

    https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/64c8d6ed-6421-47b5-a1a7-d798e53e7d9a/attachment/f9ddb657-2662-41d4-8fd8-77064cc92e48/media/p7_virtualization_bestpractice.doc

    ReplyDelete
  3. Well, well, well.
    Looks like the Dynamic Platform Optimizer, prepackaged on some Power7 systems and available by order without extra charge, is a net new addition to the game.
    http://public.dhe.ibm.com/systems/power/community/aix/PowerVM_webinars/23_Dynamic_Platform_Optimizer.pdf
    The ASO daemon can be extended with the Dynamic System Optimizer. Here's some information about these, from Nigel Griffiths.
    http://public.dhe.ibm.com/systems/power/community/aix/PowerVM_webinars/26_ASO_DSO.pdf

    ReplyDelete