Tuesday, April 30, 2013

Continuing MSSQL FCoE investigation

No new insight into the conditions involved In the error/crash/restart with our ETL and FCoE data protocol.

But a new path of investigation: Storport trace. Will update later if a breakthrough.


Filter data:

0F 00 00 00 00 00 00 00

Friday, April 26, 2013


Just when I thought I understood IBM Power AIX memory affinity, I stumbled across this documentation on thread behavior environment variables:


AIXTHREAD_AFFINITY={default|strict|first-touch} (AIX 6.1.5 and later)

The AIXTHREAD_AFFINITY controls the placement of pthread structures, stacks, and thread-local storage on an enhanced affinity enabled system.
  • The default option will not attempt any special placement of this data, balancing it over the memory regions used by the process as determined by the system settings
  • The strict option will always place this data in memory local to the pthread; this may incur some performance penalty during the creation of the pthread as the existing data is migrated from one memory region to another, however, may improve run-time performance.
  • The first touch option is similar in placement of memory local to the pthread, however, it will not attempt to migrate any data within the memory. The in-memory pages are needed by the thread for this data (including paging in memory from paging space), and will be placed local. This option allows a balance between startup time and runtime performance.


All I wanted to do was find more recent information about AIXTHREAD_SCOPE=S.  There's some of that in there, too.  But I've had enough for now.

Thursday, April 25, 2013

Another Chart for SQL Server database on a single LUN

So, here's another chart for the SQL Server database on one lonely LUN. This chart is more interesting. There's an elbow at about 400 mb/second. Most likely this system, with all of the database files for a heavy-traffic database on a single LUN, is hitting a bandwidth limitation. Could be the front end port limit, could be back end bus limit. Could even be an FCAL limit.

Why is SQL Server PLE plummeting? Should I care?

Because the SQL Server page life expectancy is a computed value, it should earn your trust for various uses.  In fact, on NUMA servers with SQL Server NUMA support enabled (as it is by default), its not just a computed value, its an aggregation of computed values across independently managed database buffer pool cache segments, one per NUMA node.  There is some type of weighting involved in the calculation of the "general" PLE from multiple PLEs, as well.  I've looked at the numbers just enough to decide its not worth my time right now to determine how the node PLEs are weighted in figuring the general PLE.

For all of its warts, PLE is still valuable.  Within the activity profile of a given system, its a way to compare database buffer cache churn under different workloads.  Assuming the same system and the same workload, faster buffer churn typically correlates to a higher physical read load.

So - assuming the same or similar workload on a given system, a plummeting PLE can be a concern - it can drive up the physical read load.  Increasing the read load won't necessarily increase execution time, due to the high reliance of SQL Server on prefetch and readahead, which I believe was quite well implemented.  However, driving up the read load for the same workload makes the system a less friendly neighbor on shared resources such as SAN.  Even if the physical SAN disks for the SQL Server system are isolated from the disks for other systems, its possible for the increased read load to saturate front end director bandwidth (I always think of this as the link from host to SAN cache, because I believe cache is king), front end director CPU, front end director port queue depth, back end director CPU, back end bus bandwidth, and FCAL loop bandwidth if not lucky enough to be using back end SAS connections instead of arbitrated loops.  Increased read load from SQL Server can also lead to increased SAN read cache churn if SAN read cache is shared with other hosts.  SQL Sever systems generally don't care about SAN read cache much... but the other systems might be severely inconvenienced by having their contents churned out of read cache.  Enough blabbering - trust me - doing more work on the SAN than necessary is bad for other SAN consumers, and there are lots of places that badness can materialize.  If the SQL Server system is a guest on a shared virtual host, there are shared host resources that could be overburdened as well, such as the physical HBA.

Don't freak out based on PLE alone, but don't ignore the PLE either.  Please.

Are physical read operations rising while PLE is plummeting?  Does physical read bandwidth rise while PLE is plummeting?  Those are some of the other items in perfmon that I look at when I see irregularities in PLE.  And there's another condition to consider...

"Silly sasquatch - if PLE is dropping, reads MUST be increasing!  How could it be any other way?? Maybe you should go back to cobol..."

Not so fast, young warrior.  First of all, let's assume a constant database cache size.  If SQL Server is as well-coded as the databases of my youth - now considered antiques at over 30 years old - then the first write to a newly allocated user database block (at the very least, a new user database block in a new extent) should not require a physical read before written to the buffer cache, and flushed later as a write to disk.  That means... high write activity in a constant-size database cache will lead to lower PLE not solely based on reads of database blocks into the buffer cache.  Spreadsheets or graphs indicating that are left, for now, as a reader exercise.  :)

But, there's another potentially greater factor that can push down PLE, other than physical reads and logical writes: the expansion of "stolen memory" reported by perfmon.

Its important to distinguish between "stolen memory" and "granted workspace memory".  Sometimes I see a description of high "granted workspace memory", and the potentially resulting pending memory grants, as "memory pressure".  I don't think that's an entirely accurate description.  Its really something more like "promised memory pressure".  You may not believe me - you are free to try to get the numbers in perfmon to add up to something reasonable.  Or you can trust me :)  The numbers reported by perfmon of database pages, stolen pages, and free pages add up to... total pages.  You won't be able to fit granted workspace memory in there - I've tried.  Now, granted workspace memory - reserved memory should be equal or less than  "stolen memory".  That's because "stolen memory" is "active query memory", plus some other stuff.  I'll return to that in a later post.  I promise.

Anyway... assume a system that has already achieved the "max server memory" target for the buffer pool.  Assume 5 long-running queries on the system, each with query memory grants (which can be monitored with sys.dm_exec_query_memory_grants).  At the start of those queries, they may have very little use for the query memory they've been granted.  Why zero it?  Why not let the database pages cached in those buffers stay there, until the intermediate and final resultsets of the query actually need the buffers to be stored?  In fact - some queries request way, way more query memory than they will use.  Zeroing the contents of the granted query memory before the query workers are ready to fill them would needlessly age database pages out of the database cache.

But... if those queries have a good fit between their memory grant requests and the memory they'll actually use... over time they will steal memory from the database cache... up to their memory grant... and the database cache page count gets lower.  With a lower database page count, and stable rates of page insertion and retirement, the page life expectancy is lowered.  Some folks like to call the PLE concept "buffer cache hold time" - I actually think that sounds cooler, but what do I know?

Anyway... next time you look at a system and see wild changes in PLE - whether global PLE or per NUMA node PLE... consider the pace of data pages into the buffer, the pace of data page first write in the buffer, and the pace of "stolen memory" stealing as potential causes for rapid changes in PLE.  It might save you from a panic attack.

On the other hand... if you eliminate those conditions as factors in drastically changing PLE...

****  update by sql_sasquatch 26 September 2013****
This thread is good discussion of PLE issues specific to early builds of SQL Server 2012

That thread discusses these SQL Server 2012 two hotfixes.

kb2819662 SQL Server performance issues in NUMA environments

kb2845380 You may experience performance issues in SQL Server 2012 Edit

AIX hpmstat utility on Power 7; Monitoring TLB misses

I believe: if its not predictable, its not a "best practice", its just a "practice".

There's more love for larger memory page sizes these days... the fewer memory pages the less expensive they are to manage.  But... allowing multiple memory page sizes increases the risk of paging memory contents to disk.  Sure, there are ways to mitigate most of the risks.  I'll take predictable over possibly perfect every day.  If a few more TLB misses is the cost of being sure database cache won't be paged out... in the absence of another guarantee... I'll probably take the deal.
So I'd like to evaluate the Oracle workloads I know and love 4k memory pages only (disabling medium 64k pages), 4k/64k memory pages, 4k/64k/16mb memory pages, and 4k/16mb memory pages.  Could do that with our standard workload performance tests.  But in this case I don't want to measure performance alone.  I also want to do some type of comparative risk assessment - what's to gain by minimizing TLB misses and what's to lose if page out to disk is incurred. 

The hpmstat utility will be part of the stew.

The IBM hpmstat documentation above indicates that groups of events can be monitored... but doesn’t mention where to find the groups. 

I poked around on the system til I found file /usr/pmapi/lib/POWER7.gps .

Group 11 includes TLB misses, as below.  There are 275 other groups of events that can be counted!  Jackpot!  
The "POWER7 and POWER7+ Optimization and Tuning Guide" (www.redbooks.ibm.com/redbooks/pdfs/sg248079.pdf) has this to say about hpmstat, hpmcount (not nearly as interesting to me), and tprof (don't know a thing about it, maybe I'll learn next year):

"The POWER7 processor provides a powerful on-chip PMU that can be used to count the number of occurrences of performance-critical processor events. A rich set of events is countable; examples include level 2 and level 3 d-cache misses, and cache reloads from local, remote, and distant memory."

Wow... so I can find out about TLB misses... and later I can probably exploit this for monitoring accesses to affinitized memory.  Gold mine!

# hpmstat -g11
Execution time (wall clock time): 1.00030447 seconds

Group: 11
Counting mode: user+kernel+hypervisor
Counting duration: 24.007015394 seconds
  PM_BTAC_MISS (BTAC Mispredicted)                            :         5385572
  PM_TLB_MISS (TLB Miss (I + D))                              :          224777
  PM_DTLB_MISS (TLB reload valid)                             :          184261
  PM_ITLB_MISS (ITLB Reloaded (always zero on POWER6))        :           40516
  PM_RUN_INST_CMPL (Run_Instructions)                         :        53397847
  PM_RUN_CYC (Run_cycles)                                     :       216602899

Normalization base: time
Counting mode: user+kernel+hypervisor
  Derived metric group: General
  [   ] Run cycles per run instruction                        :           4.056
u=Unverified c=Caveat R=Redefined m=Interleaved

But, lest I get too lofty in my thoughts thinking this will be just as accessible on systems as my trusty vmstat and iostat fallbacks...

$ hpmstat -g11
hpmstat ERROR - pm_set_program_mm: : Must be root to count globally.

Tuesday, April 23, 2013

BusinessObjects Query Builder Queries; SI_SCHEDULE_TYPE & SI_SCHEDULE_STATUS values

Some reference info that always takes me forever to find...
Thanks to Manikandan Elumalai for the informative posts below...

SI_SCHEDULE_TYPE=0 – Once: the report is run once only
SI_SCHEDULE_TYPE=1 – Hourly: the report is run hourly.
SI_SCHEDULE_TYPE=2 – Daily: the report is run daily.
SI_SCHEDULE_TYPE=3 – Weekly: the report is run weekly.
SI_SCHEDULE_TYPE=4 – Monthly: the report is run monthly.
SI_SCHEDULE_TYPE=5 – NthDay: the report is run on the nth day of the month.
SI_SCHEDULE_TYPE=6 – 1stMonday: the report will run on the first monday of every month.
SI_SCHEDULE_TYPE=7 – LastDay: the report will run on the last day of the month.
SI_SCHEDULE_TYPE=8 – Calendar: the report will run based on calendar templates.
SI_SCHEDULE_TYPE=9 – Run schedule based on calendar plugin object templates