Sunday, August 17, 2014

SQL Server 2014, mount points, checkdb & trace flag 2549

Update - although I previously intended to add graphs to this post, I've decided instead to add a new post with the pretty graphs.  Because... well, I'm trying to increase traffic a bit :-) 
See the new post here:
http://sql-sasquatch.blogspot.com/2014/09/sql-server-2014-rtm-ignores-trace-flag.html

I'll add fancy graphs when I come back from the north woods.  But I wanted to get the word out, so here's a text only....

'DBCC CHECKDB with physical_only' read behavior is one of the few places in SQL Server where use of mount points instead of drive letters matters.

Some background reading:



The kb article introducing trace flags 2549 & 2562 explains that checkdb plans reads by default per drive letter. If the checkdb target data/files are distributed across more drive letters, the checkdb read pace will be greater.

That makes sense: with more LUNs, the greater aggregate LUN service queue depth can accommodate more simultaneous inflight disk ios.  But, drive letters aren't the only way to distribute SQL Server files across more LUNs.  The large size of some databases, and the desire to standardize directory structure across numerous  server hosts/vms have made the use of mount points more popular for SQL Server.

But there's a problem with the default behavior: if data files are on multiple mount points under the same drive letter, the checkdb read pace will be the same as if all files were on a single Windows drive.  One drive letter, same pace even if there are more LUNs underneath.  The extra IO capacity and extra aggregate queue depth will not be leveraged by checkdb.  The end result is a slower pace for checkdb than possible, and a longer elapsed time.

Enter trace flag 2549.  Use that trace flag (assuming the code base includes the support for the trace flag) and checkdb will plan reads for each database file as if on own LUN.  As an example of the result: given the same contents of the same database files, checkdb with physical_only should have the same elapsed time whether one drive letter per file and LUN, or all files on own mount point all under same drive letter.

The real big problem: the trace flag seems to be ignored by SQL Server 2014.  (Full disclosure: haven't tested against cu1 or cu2 yet, only rtm.)

My test database is eight files, each ~350 gb.  Each file is on its own LUN. But these LUNs are mount points, in the directory structure under the C drive.

Test server is 24 physical cores, 2 sockets.  512 gb server ram.  Vmware vm, with all resources given to the single vm - 24 vcpu.

Bandwidth to SAN from this server is limited by the SAN to 800 mb/sec.  (Shared storage, in such cases bandwidth limits or other qos such as cache partitioning or IOPs fencing is common).

I performed sql server backup to NUL drive to check read bytes/sec maximum. Just over 700 mb/sec.  Backup completed in 1 hour, 9 minutes.

Checkdb with physical_only took over 4 hours!

Perfmon quickly revealed why: there was intense activity against one LUN at a time for up to 7 minutes at a time.  Read bytes/sec and current disk queue length both showed this behavior.

When trace flag 2562 was enabled, so that checktable ran as a single batch (this database has well over 12000 hobts), the behavior was even more exaggerated.  With T2562, there was a single intense read activity period for approximately 30 minutes against each of the individual LUNs until the 4 hours of the checkdb had completed.

The graphs looked the same whether trace flag 2549 was enabled or not.  

After staring at the graphs for a while, I realized that trace flag 2549 was simply being ignored. No matter what, the 8 mount points were being treated by the checkdb read planning as if on a single LUN (the single drive letter that was directory parent for all of them).

To confirm my diagnosis, I detached the database, and added a distinct drive letter S-Z to each of the mount points.  I reattached the database, using the new drive letters for access path.

Even though that was the only change made, checkdb with physical_only now completed in about half the time: 1 hour 57 with trace flag 2562, and 2 hours seven minutes without trace flag 2562.

For some reason, peak checkdb throughput was still under 400 mb/sec, while backup peak throughput approached the system limit by staying near 700 mb/sec.  That'll have to be another day's investigation.

I think that relegating checkdb read pace to a trace flag is risky, and belies its importance.  The largest databases will be those that tend to use mount points, and if we want them to be serious about checkdb, then checkdb doggone better finish! :-) 10-12 TB at the slower pace of ~200 mb/sec may take over 24 hours!

The volume stats/virtual file stats dmvs now know the mount points/LUNs, so... so does SQL Server.  No need to guess that there are only as few LUNs as drive letters.  No need to guess that there are as many LUNs as database files.  Time to catch up a little bit :-) Checkdb is important, let's treat it that way.





No comments:

Post a Comment