Monday, September 22, 2014

I got my eye on you: SOS_BLOCKALLOCPARTIALLIST

CPU utilization is certainly not high during the resource_semaphore_query_compile wait logjam*.

But... spinlock SOS_BLOCKALLOCPARTIALLIST has a pretty high level of activity during that time range.  Neither Google nor Duckduckgo know much about that spinlock type.  Maybe in the future I will :-)

*I describe that here...
http://sql-sasquatch.blogspot.com/2014/09/resourcesemaphorequerycompile-logjam.html

Saturday, September 20, 2014

SQL Server 2014 RTM Ignores Trace Flag 2549 - Bad News for VLDB checkdb

Update 9/30/2014: Rats - I was hoping more folks would be ranting about this one.  There is a connect item now, with two votes!!  Maybe I'll have to start reaching out to SAP folks...
 https://connect.microsoft.com/SQLServer/feedback/details/982532/sql-server-2014-rtm-ignores-trace-flag-2549

Full disclosure: This post reports behavior observed on Microsoft SQL Server 2014 - 12.0.2000.8 (X64).  This is SQL Server 2014 RTM.  I believe the behavior represents a significant regression in "dbcc checkdb with physical_only", thus I expect it may change soon.  When that happens, I will update this post with a link to the kb article addressing such. This post is a followup to a previous post found here:
http://sql-sasquatch.blogspot.com/2014/08/sql-server-2014-mount-points-checkdb.html

Running integrity checks on relational database data is important.  Its so important that I feel pretty strongly that it should be included in a lot more performance and benchmark scenarios than it currently is.

Kevin Closson* often points out two things which I fully agree with - the following is loose paraphrase:
1. A platform that can service OLTP workload at high volume with great performance can probably service almost any workload at high performance.
2. If you think you have an exclusively sequential workload at the storage device level, you're probably wrong.

Still... integrity checks are important.  Database backups are important.  Neither of them are random workflows from the database perspective - certainly not in the same way that user-driven workflow can be considered random.  So... just because your system achieves super-awesome response times at high levels of transactional throughput, I'm not willing to assume that integrity check or backup on the system is optimized.  You don't know until you know, and integrity checks and backups are both very important things to know.  OK... enough soapbox.

I'm going to present the events and graphs here in an order different than they occurred. I'm doing this for two reasons: 1) I think it will make the behavior and its import more clear 2) I think I look smarter if I pretend that I performed the tests in the order that they are most easily explained, and skip the bumbling in-between that occurred as I struggled to understand exactly what was happening.

First, using a SQL Server backup to NUL is often proposed as a means to profile read behavior of a system.  I think its a fine tool to include in disk behavior profile of a system - and doing so can give a strong start to optimizing backups on the system.  So I got this system... 2.25 TB database, backup to NUL in 1 hour, 8 minutes, 59 seconds.





 Read more about the backup to NUL on this system here:
http://sql-sasquatch.blogspot.com/2014/08/why-read-bytessec-valleys-in-sql-server.html

There are eight equally sized database files, each file on its own LUN.  When these LUNs are associated with their own Windows drive letters, 'dbcc checkdb with physical_only' with maxdop 4 looks like this.



What if this same database is attached accessing the files via mount points rather than individual drive letters? (Windows Server 2012 allows mount points to have drive letters simultaneously associated with them.)  In this case, trace flag 2549 has been set so that each database file should be considered on its own drive path, rather than the default behavior of considering only each distinct Windows drive letter to be a separate drive path.



 Yikes!!!  Read throughput is way down, and elapsed time is much longer.  There's a particular stretch of time that is especially intriguing, highlighted below.



Huhh.  For a significant period of time, all of the read bytes are coming from one LUN at a time.  Lets just skip over the hours this completely confused me, and the email I sent to my colleagues complaining that things didn't make sense anymore :-)

Lets get right to the part where I decided to run 'dbcc checkdb with physical_only' against the exact same database, without trace flag 2549 enabled.



So with or without trace flag 2549, read throughput of 'dbcc checkdb with physical_only' looks nearly identical... and both look extremely different from the same operation when the database files are each accessed via their own drive letter.

Its because SQL Server 2014 RTM is ignoring trace flag 2549.  And that's a shame.  If there are ANY databases out there that I really want for checkdb to complete, its those which are so large that they must use mount points due to running out of Windows drive letters.

My plea to Microsoft is to enhance the checkdb/checktable read planning, rather than simply fixing the regression.  I think this behavior is too important to relegate to a trace flag - leave it a trace flag and it seems vulnerable to being left off test plans.  With the advent of sys.dm_io_virtual_file_stats, there's no need for checkdb/checktable to guess which files are on separate LUNs: SQL Server knows. 


*****

So maybe you are curious about the sections highlighted in yellow above, where large amounts of data are being read from one drive at a time when the database files are accessed via mount points.  It kinda makes sense since SQL Server checkdb is thinking "well, looks like its all on the C drive, so have to plan all reads as if going through a single disk service queue."  

But if that was all there was to it... wouldn't other large portions of the checkdb look similar? Well... first of all, just wait til I post some graphs from adding trace flag 2562!! :-)  But also, consider the highlighted portion below in the "good" graph from accessing all files via their own Windows drive letters. 





Recall how checkdb during the checktable phase batches hobts - taking a table with all its hobts until the total count approaches but doesn't exceed 512.  After extensive testing with checktable against individual tables, I'm convinced all the yellow highlighted areas are parts of a batch which happen to include a highly contiguous, minimally fragmented ~500 gb table which includes a varchar(8000).  I'll leave that explanation for another day.

*****
*Kevin Closson is a performance engineer active in blogging (http://kevinclosson.net/) and on Twitter (@kevinclosson).

Many of his blog posts are related to Oracle, typically running on Linux x86 servers and pursuing maximum throughput at low latency ("low latency commercial computing").  But even if you primarily deal with AIX on IBMPower, or primarily deal with SQL Server... I still recommend checking out his stuff.  I read Kevin's blog and many others, even if they are not directly related to my current projects, to learn new testing and analysis methods I can apply to the platforms and projects I am working on.