Thursday, December 26, 2013

Perfmon "Current Disk Queue Length" to LUN w/single Tx Log; writes bound by 32 per txlog limit


I was going to ask this question on #sqlhelp:Are #SQLServer outstanding txlog writes limited by number of log buffers/txlog?  If so, is buffers/txlog fixed or configurable?

But by chance I found the answer:
Slide 6 of this deck from Microsoft's Ewan Fairweather addressed the question.
http://sqlbits.com/Downloads/246/Designing%20Highly%20Scalable%20OLTP%20Systems.pptx
Slide 6 says that the transaction log has 127 linked buffers and allows 32 outstanding IOs.
**Update - I've since learned that there are 128 txlog buffers per SQL Server txlog, with a maximum of 32 of them allowed for async inflight writes to the transaction log. End update** 

Ewan's presentation seems to be focused on SQL Server 2008R2/Windows 2008 R2 - but that's fine because so am I at the moment :)

Paul Randal also mentions the 32 outstanding write IOs per log file in his post "Trimming More Transaction Log Fat".
http://www.sqlperformance.com/2013/01/io-subsystem/trimming-more-transaction-log-fat

In the graph below, I want to drive up CPU utilization.  I've already made sure that the current CPU utilization is almost all SQL Server with very little other CPU consumed, and that there is a strong correlation between CPU utilized and logical reads on the system.  So to drive up the rate of database work, I've got to drive up CPU utilization - don't want to waste SQL Server licenses :) .

There are two main databases on this SQL Server instance, and each has data files spread over multiple volumes/LUNs.  Tempdb unfortunately has multiple datafiles (since the server has 48 logical CPUs) all residing on the same LUN.  So tempdb suffers from occasionally stressing its IO queue, as is evident below.  Windows will allow a total of 256 IO requests per LUN either in the HBA service queue for the LUN or in a Windows wait queue for the LUN.  Tempdb gets close to that limit :)  It actually used to hit that limit for a good chunk of time... so the HBA maximum transfer size was increased from the Windows default 512 kb to 2 mb.  That eased some queuing pressure on tempdb and on other LUNs as well.

Current Disk Queue Length and %CPU Busy



The somewhat surprising element to me was that, with the exception of a single operation which occurs at 9:35 am (I think this is a transaction log expansion though it could be the wrap-around) activity to the LUNs here artfully named "Logs Vol1" and "Logs Vol2" has a maximum value of 32 for "Current Disk Queue Length".  **update 12/30/2013** I previously thought that the peak in activity for one of the LUNs was due to transaction log activity like expansion/wraparound/shrink.  Probably not in this case.  Its more likely just that the LUN in question hosted more than just the single transaction log, and some of those outstanding IOs were for other LUN contents. **end update**

Current Disk Queue Length and %CPU Busy


That might be easier to see if I shrink the time period a bit and shrink the scale of the axis for "Current Disk Queue Length".  Of course, my choice of colors might make it unreadable instead - I'm never sure.

In this case, that happens to be the service queue depth for these LUNs.  (Its default service queue depth in most cases.)  But I suspect that SQL Server is not basing its activity on the service queue depth for the LUNs - rather I think its something like the number of uncommitted transaction log buffers allowed that is actually the limit.  Why wrestle to understand the difference?  Well... if I restricted the service queue depth to 16 instead of 32, and SQL Server is limiting its behavior based on outstanding transaction log buffers, the "Current Disk Queue Length" would continue looking the same, even though write latency measured by perfmon and overall performance would likely degrade.  On the other hand, if SQL Server really is taking the LUN service queue depth into account... if this was a SQL Server instance on top of IBM XIV storage, for example, I could probably crank the queue depth WAY up.  The XIV allows lots of outstanding IOs per LUN... although a really long service queue to an XIV may increase average service time, in my case its all about throughput.  So increasing throughput is acceptable even at the expense of average latency.

But, according to Ewan's presentation (and Paul Randal's post)... 32 outstanding writes per txlog is the limit, so I've got to find a way to bring down the write latency with that in mind.


No comments:

Post a Comment