Friday, June 28, 2013

AIX fcs* and fcsi* devices

The fcs* devices are the fibre channel adapters.
The fscsi* devices are the drivers on top of them.

[sasquatch@mtnforest] $ lsdev -l fscsi1 -F parent
[sasquatch@mtnforest] $ lsdev -p fcs0
fcnet0 Defined   00-00-01 Fibre Channel Network Protocol Device
fscsi0 Available 00-00-02 FC SCSI I/O Controller Protocol Device

[sasquatch@mtnforest] $ lspath | grep fscsi0 | wc | awk -u '{print $1 ; }'

[sasquatch@mtnforest] But, what I really, really wanted is this:
iostat -datV | grep fcs | awk -u '{print $1 ; }'

The iostat will print adapter throughput.. and the V makes sure only nonzeroes get printed.

By filtering out the zeroes... I won't get errors from failing to open the device when I run fcstat... I was getting those when running fcstat for every fcs* device in the output of:
lsdev -C -c adapter -S a | grep fcs

That's because an fcs* adapter could be available... it could even have an fscsi* driver on top of it... but if there's no disk using a path through the fscsi* driver for the fcs* adapter, you just might get an error running fcstat for the fcs* device.

Not anymore.  I'll run fcstat only for the fcs* devices that are listed in the output of iostat -datV.   

Monday, June 24, 2013

Busy, busy disks in AIX - wish Windows would make QFULLs this transparent

 I don't get to work with storage array performance and resource utilization logs nearly as often as database host logs.

On the database host side, I wish that Windows easily afforded me the visibility that UNIX, and especially AIX, does.

The AIX implementation of iostat quickly indicates the average number of QFULL occurrences per LUN in the collection interval (I rely on iostat -DlRTV very heavily on AIX Oracle and nosql database servers).  The fcstat utility likewise indicates the fibre channel adapter devices full queue conditions.  Not so easy to see full queues at the physical volume or fibre channel adapter level in Windows - as near as I can tell a separate method may be necessary for each multipath software (EMC Powerpath, Hitachi HDLM, etc) or HBA family.  Some SQL Server storage issues would be much easier to diagnose if QFULL conditions were easier to identify and investigate in Windows.

In addition to an easily accessible way to see QFULL conditions, AIX exposes the activity of logical volume logical partitions with the lvmstat utility.  (Windows, on the other hand... I generally avoid and advise avoiding Windows LVM striped volumes for SQL Server storage.  If I'm correct, the striping will force all read and write IOs to a 64k maximum.  Very likely just fine for an OLTP system.  But for the systems I work with, IO coalescing is a big deal - I'll take as many 512k reads and as many 256k writes as I can get, thank you :)  )

Here's a good reference for the AIX lvmstat utility if you ever have need:
If Your Disks Are Busy, Call on lvmstat
Identify the busy spots in partitions

Now, given the insight provided by the iostat utility, why would you want to add lvmstat collection?

In my investigations, I focus on three potential contributors to QFULL conditions:
1. Small read IO flood.  This could be due to small nonsequential reads.  Could also be due to small sequential reads.  If small sequential reads, gotta check to see if the maximum Oracle read IO is being prevented... maybe by too small of a maximum transfer size at the hdisk level.  Also check the LVM physical partition size.  It could easily be too big.  (Here's a hint: for batch processing systems, a 1 GB physical partition size is almost always too big.  Take the largest LUN queue depth your storage allows- or the largest you will configure.  Divide by 2.  Multiply by the largest Oracle read IO.  For my favorite workloads, a physical partition size equal to or smaller than that number should be used to eliminate the risk of QFULL.)
2. Large sequential read IO flood by multiple processes.  Most often happens under high concurrency of similar queries, or under high parallelism.  When this occurs, its almost always because the ppsize is too big. See my thoughts above on ppsize.
3. Large repetitive sequential reads.  Oracle (and likely back at least to 11.1) really, really favors full table scan and fast full index scan via direct path read into PGA for tables larger than the small table threshold and indexes larger than the large object threshold.  These thresholds are by default based on database cache size... so as the database grows, more and more tables and indices will be DPR into PGA.  Combine that with AIX jfs2 filesystem cio mount option (bypass filesystem cache to prevent double buffering and remove a level of inode locking), and my workload propensity for similar queries to run nearly concurrently, and you've got a recipe for lots and lots of reads of the same data at nearly the same time... a very likely contributor to QFULL conditions at the physical volume level.

So... if sequential reads of type 1, 2, or 3 are significant contributors to QFULL conditions that show up in iostat, some of the heavy QFULL conditions for a given physical volume should correspond to periods of excessive activity for a small number of logical partitions in the logical volume which are backed by physical partitions on that physical volume.  (That would be a great place for some pictures.  Hopefully I'll come back later to draw some.)

David's article has a great example of using lvmstat to find the busiest logical partitions in a logical volume.  My investigations of start at a different point than his - my busy LUNs are typically identified from iostat by QFULL conditions - then its time to find out why they are so busy.  But commands like David's "lvmstat -l fslv05 -c40 60 10" (fslv05 logical volume, busiest 40 logical partitions, 60 second interval, 10 collections) are a huge part of the work

Thursday, June 13, 2013

My pitch for async stats auto updates and... Yikes! A SQL Server stats update async memory leak!!!

FIX: Memory leak if you enable the AUTO_UPDATE_STATISTICS_ASYNC statistics option in SQL Server 2008, in SQL Server 2012 or in SQL Server 2008 R2

A few days ago, a SQL Server DBA asked whether this memory leak was significant to consider against my recommendation for enabling auto_update_statistics_async for a given system.  My answer reminds me of a lesson I learned a long time ago, cooking in a deli.

"Serve 'em up... and watch 'em real close."

Here's the deal.  Maybe async auto stats updates aren't appropriate for the primary workflows on the systems in your sphere.   But in my world, its all about minimizing batch execution, with query concurrency higher than logical CPU count, and many of the concurrent queries peeking at the same statistics objects as the work their way through the query optimizer.  And almost all of these queries are run repetitively at a given frequency - most of them daily, some weekly, a few of them monthly.  Most of the tables grow in fairly regular fashion, and I haven't yet seen query plan problems in these environments caused by skewed (as opposed to outdated - and particularly ascending-key-absent) statistics.

A wait for a synchronous auto stats update won't just bloat the elapsed time for a single query - it will bloat the elapsed time for all concurrent queries peeking at the stats object being updated.  So lets say there's a stats object that has just tripped over the threshold for auto update - first time in one month.  Takes 25 minutes to update the stats.

Normally, for a single query, you do the calculus: how much elapsed time is at risk if a plan based on the old stats is used, vs a plan based on the new stats?  Not total elapsed time... but the delta between the two query executions.  If the expected delta between the query plan execution elapsed time is LESS than the expected elapsed time for a synchronous stats update... you come out ahead by doing the async update.  If the stats update requires less elapsed time than the difference between the query plan based on outdated stats and the faster query plan based on new stats... seems better to do sync update.

But, throw in lots concurrent queries stacked up behind the synchronous stats update, and you got trouble.  Some of those concurrent queries would benefit from cache hit based on buffer contents brought in by other other concurrent queries.  And, assuming they all ran acceptably yesterday and are growing reasonably predictably - why have ALL of them wait for a sync stats update?

Ahhh... but what about this scary memory leak?  Well, maybe its not so scary.  80 bytes of memory leaked with each update stats async task (one task per object).  12,800 update async tasks before 1 mb of memory has been leaked.

How many async stats updates are expected before a SQL Server service restart clears the memory leak?  Especially until the restart AFTER the fix is installed? :)  I'm not sure, honestly.  For systems that install trace flag 2371, the answer is more than previously.  My plan: serve 'em up, and watch 'em real close.

The KB article indicates this can be done with the version specific code below.  If appropriate, take a baseline value before enabling async stats update, so that the growth can be evaluated against the baseline and the threat from the leak evaluated.  Hey... I could be making the wrong bet in this scenario.  But I don't think so.  YMMV.  Know your systems, know your primary workflows, know the business cycle.

SQL Server 2012

select [Memory Used KB] = pages_in_bytes /1024 from sys.dm_os_memory_objects where type = 'MEMOBJ_RESOURCE'

SQL Server 2008

select [Memory Used KB] = (pages_allocated_count * page_size_in_bytes)/1024 from sys.dm_os_memory_objects where type = 'MEMOBJ_RESOURCE'

Tuesday, June 11, 2013

Using an auto-execute Stored Procedure to resolve 'Recovery Pending' development database

Development environments are quite unpredictable.  Yesterday I was faced with a situation where the storage housing the transaction logs for numerous development databases filled unexpectedly to capacity, and one of the development databases ended up in 'recovery pending' status.  This particular database was the target of numerous SSIS packages, and staging operations.  Plenty of traffic from new connections, and some of those connections would execute cross-database queries.

I took care of the shortage of transaction log space, and turned my attention to the sasquatch_2012 database, in 'recovery pending' mode.  Trying to alter the database to single user mode, or even set the database offline, continued to fail due to pressure from competing connections.

I rigged a couple of loops to kill sessions based on their database ID as recorded in the DMVs, and even a loop that would kill sessions based on the resource database from the sys.dm_tran_locks dmv.  No dice.  There seem to have been sessions queued for resource locks in a manner such that even after the session was killed, it would still come up for the resource in its queuing order and could prevent alter database statements from succeeding.  I came to that conclusion after noting my SSMS session as 58, closing the SSMS session, opening another session and later seeing session 58 (with my Windows host ID and username attached) pop up as a connection in the database.

To recover the database I set the "auto-execute on start" stored procedure to an sp that would set SASQUATCH_2012 offline. 

After waring all developers and testers, I used cluster services manager to take the SQL Server service offline, and then brought it back online.  The sasquatch_2012 database was offline as expected after the SQL Server service restarted.

I set the database to online, used the database, and set to single user.  Ran dbcc checkdb with repair, and after the repair set it to multi-user as below.  Didn't forget to undo the magic 'auto start stored procedure' - I don't want the sasquatch_2012 database to be offline every time SQL Server restartds.  :)

----create the auto stored procedure
create procedure x_sasquatch_2012_offline

ALTER DATABASE sasquatch_2012 set offline

----set the stored procedure to execute on startup
sp_procoption @ProcName = 'x_sasquatch_2012_offline' , @OptionName = 'startup' , @OptionValue = 'On'

----then use cluster manager to take the service offline and back online

----SASQUATCH_2012 was offline then as I expected
USE SASQUATCH_2012;  --- important to be in the database when setting it to single user :)
----Now its single user, do dbcc checkdb
----In a development environment, I'm ok with the risk of data loss during data repair

----abbreviated results from dbcc checkdb
DBCC results for 'SASQUATCH_2012'.
Service Broker Msg 9675, State 1: Message Types analyzed: 14.
Service Broker Msg 9676, State 1: Service Contracts analyzed: 6.
Service Broker Msg 9667, State 1: Services analyzed: 3.
Service Broker Msg 9668, State 1: Service Queues analyzed: 3.
Service Broker Msg 9669, State 1: Conversation Endpoints analyzed: 0.
Service Broker Msg 9674, State 1: Conversation Groups analyzed: 0.
Service Broker Msg 9670, State 1: Remote Service Bindings analyzed: 0.
Service Broker Msg 9605, State 1: Conversation Priorities analyzed: 0.
Repair: The page (1:798380) has been deallocated from object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594058899456 (type Unknown).
Repair: The page (1:798381) has been deallocated from object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594058899456 (type Unknown).
Msg 2576, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (0:0) is pointed to by the previous pointer of IAM page (1:798381) in object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594058899456 (type Unknown), but it was not detected in the scan.
        The error has been repaired.
CHECKDB found 1 allocation errors and 0 consistency errors not associated with any single object.
CHECKDB fixed 1 allocation errors and 0 consistency errors not associated with any single object.
CHECKDB found 1 allocation errors and 0 consistency errors in database 'SASQUATCH_2012'.
CHECKDB fixed 1 allocation errors and 0 consistency errors in database 'SASQUATCH_2012'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

----Yay!!!!  Now we can set it back to multi-user and be done.  For now...

----Finally, undo the little magic
----set the stored procedure to NOT execute on startup
EXEC sp_procoption @ProcName = 'x_sasquatch_2012_offline' , @OptionName = 'startup' , @OptionValue = 'Off'

Friday, June 7, 2013

Got SQL Server plans forced to serial execution? --Updated for SQL Server 2016 Query Store

Originally published June 7, 2013
Scroll down from a 2017 update relevant to SQL Server 2016 Query Store

Paul White talks about items that can force a serial plan in this blog post. 

I wanted to perform a quick check while the instance was fairly idle, to see if there were many plans that had been forced to serial execution.  (Monkeying around with the entire stored plan cache on a busy system is not something I recommend.)  I cobbled this together from some queries that Jason Strate has posted.  Turned up 42 rows.  Gave me something to think about... and address with the developers.

I'm sure there are more efficient ways of finding problematic plans... would probably be a good idea to periodically check the memory grant dmv for outstanding memory grants attached to a query with current DOP of 1, for example.  But this was all I was smart enough to do today.   



     (N'@NonParallelPlanReason'N'NVARCHAR(4000)') AS [NonParallelPlanReason],

   DB_NAME(qp.dbid) AS [DBASE_NAME]

   sys.dm_exec_cached_plans AS cp


   sys.dm_exec_query_plan(cp.plan_handle) AS qp


     (N'/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan') AS t(c)


   cp.cacheobjtype = N'Compiled Plan'

   AND c.value(N'@NonParallelPlanReason', N'NVARCHAR(4000)') IS NOT NULL


   DB_NAME(qp.dbid), cp.plan_handle

By the way... it took 3:09 to return the resultset.  Query cost of 125838, executed with DOP 1.  Worker time 153633351.  Lots of logical reads, lots of logical writes. 

March 30, 2017
I wanted to update the post to make it SQL Server 2016 Query Store ready 😊

This version of the query grabs only Query Store plan_id and NonParallelPlanReason.  The CTE is to make sure that plan_XML with more than 128 levels of nesting don't result in errors.  (But means you wouldn't get back the full plan XML from that CTE.) 

This query is still kinda slow, even using the Exists() method rather than checking the value like I used to.  About 10 minutes to retrieve 3000 nonparallel plans from a Query Store of 40000 plans.

I can probably speed it up by checking for forced serial plans within each query_hash, which is what I really want.  That'll be for another day.

serial_plan_xml_CTE (plan_id, qplan)
(SELECT plan_id,
        CASE WHEN PATINDEX('%<RelOp NodeId="0"%', query_plan) > 0 THEN
        CONVERT(XML, SUBSTRING(query_plan, 1, PATINDEX('%<RelOp NodeId="0"%', query_plan) - 1) +
                     SUBSTRING(query_plan, PATINDEX('%</QueryPlan>%', query_plan), LEN(query_plan) + 1) ) 
  ELSE CONVERT(XML, query_plan) 
  END AS qplan 
  FROM sys.query_store_plan
  WHERE is_parallel_plan = 0)
SELECT qplan.plan_id, 
      ) AS [NonParallelPlanReason]
   serial_plan_xml_CTE qplan
     (N'/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan') AS t(c)
   c.exist('./@NonParallelPlanReason') = 1

MSSQL: Plan guides to address ascending key problem, or increased stats attention?

Ok... lets start with the questions: In SQL Server, the ascending key problem can be addressed with statistics trace flags (such as 2371, 2389, and 2390), or increased manual stats update... but are plan guides also an appropriate mitigation?  If you've used plan guides for this purpose, how have evaluated them over time?

Now I'll meander...

On the systems I'm concerned with, many primary keys include at least one field which consistently ascends over time.  Often this is an identity type field (an identifier for the same record/object/data in a separate system) or a field based on datetime.  At the same time, the queries on this system focus heavily on the most recent data.  Queries with a scope of one day are more frequent than those with scope of one week, which are more frequent than those with scope of one month.  Any other query scope is far less frequent than monthly.  Most data is retained for at least 7 years if not indefinitely.

in this context, and especially with some of the complex queries running, the ascending key problem is a particular vulnerability.

Others have explained this issue far better than I; the blog posts below can be consulted for a more complete description so you can evaluate whether your system is effected.  

In a nutshell - the statistics histogram only knows about key values up to the highest value as of the last stats update.  The modification threshold for auto stats update scales with table row count.  Got a table that grows by a fairly constant row count over time?  Auto updates will become less and less frequent.  With row count in the billions, you'll wait a long time for auto stats update to occur - much longer than the inter-update interval while your table had millions of rows.

And the killer: since the histogram only knows of the highest value as of last update, all the rows added since then are absent from the histogram.  Here's a Sherlock Holmes type clue: if the query plan shows 'estimated rows' of 1, and you know doggone well there's thousands in the range, you could well be looking at manifestation of the ascending key problem.  Just saw a complex query, with four tables among others in the join list, a similar pk in each of the four, and an 'estimated rows' of 1 for each.  Counting the rows after the biggest range hi key in the stats yielded over 300,000 rows in each case!  Major performance buster.

The classic solution is to update all statistics all the dog-gone time.  Just like the classic index maintenance solution is to rebuild all indexes all the dog-gone time.  The pitfall is the same: you'll end up performing lots of low value, high cost operations.  Lots of disk io wasted.  Lots of plans which will be recompiled, maybe, and maybe little benefit to most of the new plan compiles.  And.... this is the one that always breaks my heart... what if the manual stats update happened on Monday according to schedule, but the query is run on Saturday and performance SUCKS because there are 70,000 rows in the range above the last known range hi key?

Ok... one way to address this in part is with trace flags 2389, 2390, and 2371.

Trace flag 2371 introduces a sliding threshold for automatic statistics update.

What's nice about this is that it will increase auto stats updates for humongous tables that are NOT based on an ascending pk.  I got lots of those, too :) TF 2371 helps these systems out.  But... ymmv.  Some tables are such that data distribution is irregular by design, and default sampling isn't good for optimal plan selection.  Coordinating full scan stats update will be better in that case.

Trace flags 2389 and 2390 are meant specifically to address the ascending key problem.  After three statistics updates with trace flag 2389 and/or 2390 enabled , indexes are branded as ascending, stationary, etc.  With trace flag 2389 in place, quick stats updates will lead to improved row estimates for ascending key statistics.  With trace flag 2390 in place, quick stats are enabled for 'unknown' keys as well (3 stats updates haven't occurred for these).  These have been beneficial in the systems I work with... but finding a way to quantify the batch-wide benefit is tricky.

For a strong finish, I'll circle back to my questions: In SQL Server, the ascending key problem can be addressed with statistics trace flags (such as 2371, 2389, and 2390), or increased manual stats update... but are plan guides also an appropriate mitigation?  If you've used plan guides for this purpose, how have evaluated them over time?


SQL Server backup compression

Some notes about SQL Server backup compression, introduced in SQL Server 2008.
1.  Compressed backups of noncompressed, row compressed, and page compressed tables and indexes can all be expected to be smaller than noncompressed backups of the same data.
2.  Without trace flag 3042, mentioned in this article, compressed backups use a final size file estimate for the initial file size.  This is done to avoid wait time during the backup for file expansion. Since this may be larger than the needed backup file size on completion, the in-process file size may be larger than the final file size.  This may be an important consideration when sizing backup resources.  With trace flag 3042 in place, compressed backups may take longer, but the files only grow as needed.

3.  Io is greatly reduced for compressed backups, leading to higher CPU utilization during a shorter execution time just from compacting the activity time window.  The compression itself also adds cpu overhead.  Not a reason NOT to employ backup compression: if CPU becomes a concern, it can be regulated with resource governor.

Tuesday, June 4, 2013

Oracle 11gR2 PGA parameters

I primarily work with Oracle 11gR2 and beyond.  The following post is the best source I've found for 11gR2 PGA management parameters - there's lots of information out there that seems to be specific to earlier versions of Oracle.

The parameter pga_aggregate_target will control the aggregate size of PGA workareas.  This will not limit other uses of PGA, which is why the total PGA usage can be considerably higher than pga_aggregate_target.

The hidden parameter _pga_max_size is the limit of all PGA work areea size for a single process.
The hidden parameter _smm_max_size is the limit of any single PGA work area size.
The hidden parameter _smm_px_max_size is the limit of any single PGA work area for parallel processes. 

For pga_aggregate_target <= 500MB
   _pga_max_size = 200MB
   _smm_max_size = pga_aggregate_target/5
   _smm_px_max_size = pga_aggregate_target/2

For 500MB < pga_aggregate_target <= 1GB
   _pga_max_size = 200 MB
   _smm_max_size = 100MB
   _smm_px_max_size = _pga_max_size/2

For pga_aggregate_target > 1GB
   _pga_max_size = pga_aggregate_target/5
   _smm_max_size = pga_aggregate_target/10
   _smm_px_max_size = _pga_max_size/2