sql.sasquatch

Sunday, August 17, 2014

SQL Server 2014, mount points, checkdb & trace flag 2549

Update - although I previously intended to add graphs to this post, I've decided instead to add a new post with the pretty graphs. Because... well, I'm trying to increase traffic a bit :-)
See the new post here:
http://sql-sasquatch.blogspot.com/2014/09/sql-server-2014-rtm-ignores-trace-flag.html

I'll add fancy graphs when I come back from the north woods. But I wanted to get the word out, so here's a text only....

'DBCC CHECKDB with physical_only' read behavior is one of the few places in SQL Server where use of mount points instead of drive letters matters.

Some background reading:

http://support.microsoft.com/kb/2634571

http://blogs.msdn.com/b/saponsqlserver/archive/2011/12/22/faster-dbcc-checkdb-released-in-sql-2008-r2-sp1-traceflag-2562-amp-2549.aspx

http://blogs.msdn.com/b/psssql/archive/2012/02/23/a-faster-checkdb-part-ii.aspx

The kb article introducing trace flags 2549 & 2562 explains that checkdb plans reads by default per drive letter. If the checkdb target data/files are distributed across more drive letters, the checkdb read pace will be greater.

That makes sense: with more LUNs, the greater aggregate LUN service queue depth can accommodate more simultaneous inflight disk ios. But, drive letters aren't the only way to distribute SQL Server files across more LUNs. The large size of some databases, and the desire to standardize directory structure across numerous server hosts/vms have made the use of mount points more popular for SQL Server.

But there's a problem with the default behavior: if data files are on multiple mount points under the same drive letter, the checkdb read pace will be the same as if all files were on a single Windows drive. One drive letter, same pace even if there are more LUNs underneath. The extra IO capacity and extra aggregate queue depth will not be leveraged by checkdb. The end result is a slower pace for checkdb than possible, and a longer elapsed time.

Enter trace flag 2549. Use that trace flag (assuming the code base includes the support for the trace flag) and checkdb will plan reads for each database file as if on own LUN. As an example of the result: given the same contents of the same database files, checkdb with physical_only should have the same elapsed time whether one drive letter per file and LUN, or all files on own mount point all under same drive letter.

The real big problem: the trace flag seems to be ignored by SQL Server 2014. (Full disclosure: haven't tested against cu1 or cu2 yet, only rtm.)

My test database is eight files, each ~350 gb. Each file is on its own LUN. But these LUNs are mount points, in the directory structure under the C drive.

Test server is 24 physical cores, 2 sockets. 512 gb server ram. Vmware vm, with all resources given to the single vm - 24 vcpu.

Bandwidth to SAN from this server is limited by the SAN to 800 mb/sec. (Shared storage, in such cases bandwidth limits or other qos such as cache partitioning or IOPs fencing is common).

I performed sql server backup to NUL drive to check read bytes/sec maximum. Just over 700 mb/sec. Backup completed in 1 hour, 9 minutes.

Checkdb with physical_only took over 4 hours!

Perfmon quickly revealed why: there was intense activity against one LUN at a time for up to 7 minutes at a time. Read bytes/sec and current disk queue length both showed this behavior.

When trace flag 2562 was enabled, so that checktable ran as a single batch (this database has well over 12000 hobts), the behavior was even more exaggerated. With T2562, there was a single intense read activity period for approximately 30 minutes against each of the individual LUNs until the 4 hours of the checkdb had completed.

The graphs looked the same whether trace flag 2549 was enabled or not.

After staring at the graphs for a while, I realized that trace flag 2549 was simply being ignored. No matter what, the 8 mount points were being treated by the checkdb read planning as if on a single LUN (the single drive letter that was directory parent for all of them).

To confirm my diagnosis, I detached the database, and added a distinct drive letter S-Z to each of the mount points. I reattached the database, using the new drive letters for access path.

Even though that was the only change made, checkdb with physical_only now completed in about half the time: 1 hour 57 with trace flag 2562, and 2 hours seven minutes without trace flag 2562.

For some reason, peak checkdb throughput was still under 400 mb/sec, while backup peak throughput approached the system limit by staying near 700 mb/sec. That'll have to be another day's investigation.

I think that relegating checkdb read pace to a trace flag is risky, and belies its importance. The largest databases will be those that tend to use mount points, and if we want them to be serious about checkdb, then checkdb doggone better finish! :-) 10-12 TB at the slower pace of ~200 mb/sec may take over 24 hours!

The volume stats/virtual file stats dmvs now know the mount points/LUNs, so... so does SQL Server. No need to guess that there are only as few LUNs as drive letters. No need to guess that there are as many LUNs as database files. Time to catch up a little bit :-) Checkdb is important, let's treat it that way.

Tuesday, July 15, 2014

Hitachi VSP and Universal Replicator: Asynchronous replication is not free

Occasionally I'll hear the idea that asynchronous array-level replication does not effect primary system performance. That's just not true. It doesn't effect performance in the same way as synchronous replication does. But that's different from not affecting primary system performance at all*.

But why take my word from it? EMC and Hitachi are both really good at array level replication. I've already quoted from EMC documentation on my blog today, so now I'll switch to Hitachi documentation :-)

****************************************************************************
Hitachi Virtual Storage Platform
Hitachi Universal Replicator User Guide
www.hds.com/assets/pdf/vsp-hitachi-universal-replicator-user-guide.pdf

Performance is affected because of journal-to-journal copying. Usage rates are also affected.
Page 1-3

****************************************************************************

*It is possible that a database with asynchronous writes and a light enough level of writes may see the effects of asynchronous - even synchronous - array replication without noticeably impeding query completion speed. But I'm talking about pedal-to-the-metal data warehouse loads and the like :)

EMC VMAX Virtual Provisioning: Striped Metas now generally recommended

For some reason, I find it very easy to come across REV A12 of the EMC® Symmetrix®Virtual Provisioning™ documentation referenced below compared to REV A14. Putting up this blog post more for myself than others... perhaps this way I'll easily be able to track down REV A14. For me REV A14 is important because its when the general recommendation switched from concatenated metas to striped metas. No doubt the performance benefits were the motivation, and EMC waited until enough operations on striped metas were possible to ease administrative burden of recommending striped metas.

I find the recommendation of striped metas especially important for Oracle redo logs and SQL Server transaction logs when SRDF synchronous replication is used. This is because the serialization required for replication has less of a performance drag with striped metas than concatenated metas. Striped metas may also help when asynchronous replication is used, because of the write cache policies on the VMAX.

************************************************************************************

Best Practices for Fast, Simple Capacity Allocation with EMC® Symmetrix®Virtual Provisioning™

EMCTechnical Notes 300-006-718 REV A12

In most cases, EMC recommends using concatenated rather than striped metadevices with Virtual Provisioning.

Page 12

************************************************************************************

Best Practices for Fast, Simple Capacity Allocation with EMC® Symmetrix®Virtual Provisioning™

Document 7383 300-006-718 REV A14 (link requires EMC Account login)

In most cases, EMC recommends using striped rather than concatenated metadevices with Virtual Provisioning because they offer the following advantages:

With Synchronous SRDF®, Enginuity allows one outstanding write per thin device per path. With concatenated metadevices, this could cause a performance problem by limiting the concurrency of writes.

This limit will not affect striped metadevices in the same way because of the small size of the metavolume stripe (1 cylinder or 1920 blocks).

In Enginuity releases prior to 5875, the code allows eight read requests per path per thin device. This may limit the number of read requests that can be passed through to the thin pool regardless of the number of data devices it contains. This can cause slower performance in environments with a high read miss rate. A striped metadevice offers improved throughput, because each metamember is allowed to queue requests to the backend in parallel. Even with the limit removed in Enginuity 5875, there are still potential performance benefits to using striped thin metas over concatenated thin metas.

Symmetrix Enginuity has a logical volume write pending limit to prevent one volume from monopolizing writeable cache. Because each metamember gets a small percentage of cache, a striped meta is likely to offer more writeable cache to the metavolume.

However, concatenated thin metadevices do offer two important operational advantages over striped thin metadevices:

Non-metadevices can be converted to concatenated metadevices without destroying the existing data. Beginning with Enginuity 5875 and Solutions Enabler 7.2, it’s possible to convert a thin device into a concatenated meta without unmapping the device first.

Concatenated thin metadevices can be expanded without destroying existing data by adding metamembers to an existing metadevice. Beginning with Enginuity 5875, a striped metadevice can also be expanded while preserving existing data.

Pages 12-13

Tuesday, July 8, 2014

Before I drop all those SQL Server auto-generated column stats....

***** Update 20180628 *****
I'm leaving this post for historical reasons. But if you want to parse plan XML for optimizer stats detail included due to trace flag 8666, please don't use the query included here. Its ugly, has unintended cartesian products which the DISTINCT attempts to mitigate, and performs poorly.

Instead, use something more like the query at this new blog post:

Harvesting SQL Server Trace Flag 8666 stats from Query Plan XML

http://sql-sasquatch.blogspot.com/2018/06/harvesting-sql-server-trace-flag-8666.html

*****

***** Update 20170104 *****
Just verified that this still works on post-2016 vNext 1.1 :-)
*****

A while back I ran across an intriguing suggestion to simply drop all auto-created columns statistics in a database, by Ami Levin. Read the blog post here:
http://blogs.msdn.com/b/mvpawardprogram/archive/2013/09/09/sql-server-auto-statistics-cleanup.aspx

I wouldn't want to drop all column statistics without an idea of which are being used and which are neglected.

A colleague asked me about evaluating the statistics in use on a running system. At first all I remembered was trace flags 9204 & 9292 (see trace flag 2363 for SQL Server 2014 and beyond). Paul White blogged about these here:
http://sqlblog.com/blogs/paul_white/archive/2011/09/21/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx

That allows investigating an individual query (an activity I always encourage for developers and report writers)... but not helpful for my context.

But in the comments to Paul's blog post Fabiano Neves Amorim showed up! I read this blog post about a year ago... but without an immediate need it just kind of filed away in my mind somewhere near the plots and plot holes of old James Bond films I watched. This time I paid attention :-)
http://blogfabiano.com/2012/07/03/statistics-used-in-a-cached-query-plan/

So, trace flag 8666 seems to be the key for what I want to do. It allows peeking into 'InternalInfo' in the plan XML, in order to get table names (wszTable) and stat names (wszStatName) that were used to compile plans. Cool! So here's the disclaimer:

Trace flag 8666 is NOT an officially documented SQL Server trace flag. I do not recommend using this trace flag without a tight leash! In fact, any querying of the plan cache can become quite intensive - plans can be big and pulling lots of them into RAM can flush other database contents out while other folks are trying to get work done! I know cuz I've caused trouble for folks - so do better than me! Also, there's absolutely no guarantee that trace flag 8666 will continue to allow this insight in the future.

But, here's a new toy of mine, for profiling stats use on performance test servers when a full batch load of queries is loaded up.

There's no doubt in my mind that some folks reading this are WAY better at handling xquery or XML in general than I am. If there are suggestions to improve the performance of this thing, I'm all ears...

But... here's my toy... in an early incarnation, anyways...
Note that in order to somewhat limit scope, only plans in the database of execution are considered based on sys.dm_exec_plan_attributes.

********************************************************************************************************

/* do NOT use in production unless CPU and memory utilization are well understood */
DBCC TRACEON(8666) with no_infomsgs;
;WITH 
   XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p),
   handles (usecounts, plan_handle)
   AS
   (SELECT
       usecounts,
    plan_handle
    FROM 
       sys.dm_exec_cached_plans AS cp
    OUTER APPLY
       sys.dm_exec_plan_attributes(plan_handle) AS epa
    WHERE 
       cp.cacheobjtype = 'Compiled Plan'
       AND
       epa.attribute='dbid'
       AND
       epa.value=DB_ID())
SELECT DISTINCT
   TableUsed.XMLCol3.value('@FieldValue','NVarChar(500)') AS TableName
   ,StatsUsed.XMLCol5.value('@FieldValue','NVarchar(500)') AS StatsName
   ,UpdateThreshold.XMLCol6.value('@FieldValue','NVarchar(500)') AS ullThreshold
   ,handles.usecounts
   ,handles.plan_handle
FROM
   handles AS handles
CROSS APPLY
   sys.dm_exec_query_plan(handles.plan_handle) AS qp
CROSS APPLY
   qp.query_plan.nodes('//p:InternalInfo') AS InternalInfo(XMLCol1)
CROSS APPLY
   InternalInfo.XMLCol1.nodes('p:EnvColl/p:Recompile') AS Recomp(XMLCol2)
CROSS APPLY
   Recomp.XMLCol2.nodes('p:Field[@FieldName="wszTable"]') AS TableUsed(XMLCol3)
CROSS APPLY
   Recomp.XMLCol2.nodes('p:ModTrackingInfo') AS Tinfo(XMLCol4)
CROSS APPLY
   Tinfo.XMLCol4.nodes('p:Field[@FieldName="wszStatName"]') AS StatsUsed(XMLCol5)
CROSS APPLY
   Tinfo.XMLCol4.nodes('p:Field[@FieldName="ullThreshold"]') AS UpdateThreshold(XMLCol6)
WHERE
   qp.query_plan.exist('//p:InternalInfo') = 1
   AND
   InternalInfo.XMLCol1.exist('p:EnvColl/p:Recompile') = 1
   AND
   Recomp.XMLCol2.exist('p:Field[@FieldName="wszTable"]') = 1
   AND
   Recomp.XMLCol2.exist('p:ModTrackingInfo') = 1
   AND
   Tinfo.XMLCol4.exist('p:Field[@FieldName="wszStatName"]') = 1
   AND
   TableUsed.XMLCol3.value('@FieldValue','NVarChar(500)') NOT LIKE 'sys%'
ORDER BY
   handles.usecounts DESC, handles.plan_handle DESC;

DBCC TRACEOFF(8666) with no_infomsgs;