Monday, June 24, 2013

Busy, busy disks in AIX - wish Windows would make QFULLs this transparent

 I don't get to work with storage array performance and resource utilization logs nearly as often as database host logs.

On the database host side, I wish that Windows easily afforded me the visibility that UNIX, and especially AIX, does.

The AIX implementation of iostat quickly indicates the average number of QFULL occurrences per LUN in the collection interval (I rely on iostat -DlRTV very heavily on AIX Oracle and nosql database servers).  The fcstat utility likewise indicates the fibre channel adapter devices full queue conditions.  Not so easy to see full queues at the physical volume or fibre channel adapter level in Windows - as near as I can tell a separate method may be necessary for each multipath software (EMC Powerpath, Hitachi HDLM, etc) or HBA family.  Some SQL Server storage issues would be much easier to diagnose if QFULL conditions were easier to identify and investigate in Windows.

In addition to an easily accessible way to see QFULL conditions, AIX exposes the activity of logical volume logical partitions with the lvmstat utility.  (Windows, on the other hand... I generally avoid and advise avoiding Windows LVM striped volumes for SQL Server storage.  If I'm correct, the striping will force all read and write IOs to a 64k maximum.  Very likely just fine for an OLTP system.  But for the systems I work with, IO coalescing is a big deal - I'll take as many 512k reads and as many 256k writes as I can get, thank you :)  )

Here's a good reference for the AIX lvmstat utility if you ever have need:
If Your Disks Are Busy, Call on lvmstat
Identify the busy spots in partitions

Now, given the insight provided by the iostat utility, why would you want to add lvmstat collection?

In my investigations, I focus on three potential contributors to QFULL conditions:
1. Small read IO flood.  This could be due to small nonsequential reads.  Could also be due to small sequential reads.  If small sequential reads, gotta check to see if the maximum Oracle read IO is being prevented... maybe by too small of a maximum transfer size at the hdisk level.  Also check the LVM physical partition size.  It could easily be too big.  (Here's a hint: for batch processing systems, a 1 GB physical partition size is almost always too big.  Take the largest LUN queue depth your storage allows- or the largest you will configure.  Divide by 2.  Multiply by the largest Oracle read IO.  For my favorite workloads, a physical partition size equal to or smaller than that number should be used to eliminate the risk of QFULL.)
2. Large sequential read IO flood by multiple processes.  Most often happens under high concurrency of similar queries, or under high parallelism.  When this occurs, its almost always because the ppsize is too big. See my thoughts above on ppsize.
3. Large repetitive sequential reads.  Oracle (and likely back at least to 11.1) really, really favors full table scan and fast full index scan via direct path read into PGA for tables larger than the small table threshold and indexes larger than the large object threshold.  These thresholds are by default based on database cache size... so as the database grows, more and more tables and indices will be DPR into PGA.  Combine that with AIX jfs2 filesystem cio mount option (bypass filesystem cache to prevent double buffering and remove a level of inode locking), and my workload propensity for similar queries to run nearly concurrently, and you've got a recipe for lots and lots of reads of the same data at nearly the same time... a very likely contributor to QFULL conditions at the physical volume level.

So... if sequential reads of type 1, 2, or 3 are significant contributors to QFULL conditions that show up in iostat, some of the heavy QFULL conditions for a given physical volume should correspond to periods of excessive activity for a small number of logical partitions in the logical volume which are backed by physical partitions on that physical volume.  (That would be a great place for some pictures.  Hopefully I'll come back later to draw some.)

David's article has a great example of using lvmstat to find the busiest logical partitions in a logical volume.  My investigations of start at a different point than his - my busy LUNs are typically identified from iostat by QFULL conditions - then its time to find out why they are so busy.  But commands like David's "lvmstat -l fslv05 -c40 60 10" (fslv05 logical volume, busiest 40 logical partitions, 60 second interval, 10 collections) are a huge part of the work

No comments:

Post a Comment