Wednesday, February 4, 2015

SQL Server transaction log write latency, log write waits

A friend recently asked what minimum transaction log write latency I see on systems.  You know me... never one for easy answers :-)

On the systems I work with, we use a fairly long-running, query concurrency-controlled ETL each night.  Write latency to the transaction logs is a fairly volatile metric - I recommend limiting the volatility by eliminating queuing possibilities in the SQL Server physical host or guest.  Putting a single busy transaction log on a LUN means that it will have the LUN queue depth all to itself for the up-to-32 outstanding writes issued against the transaction log.  Ensuring that HBA adapter queue depth can sustain all read and write activity for the associated LUNs eliminates queuing at that level.

But, queuing at the storage controller can also effect write latency.  (So can flow control between server and storage.) Also, write cache conditions may result in throttling - especially if write cache becomes saturated.

So... here are some graphs :-)  On the systems I work with, we want to keep transaction log LUN write latency between 1 ms and 5 ms to keep everything happy.  If "write log waits" are minimized - and especially making sure that "log buffer waits" are eliminated - there is little benefit to my workload by continuing to lower the transaction log write latency.  It'd be different if I were working on high transaction volume OLTP.  But this ETL is high volume, low commit.  That can be seen below in the graphs by noticing that writes to the transaction log LUN average 60k quite regularly - the transaction log buffers are usually flushing because they are full and must be written, rather than usually being written due to transactions committing.







ps. I wondered about the peaks in write latency in the timeperiod.  Were they due to host queuing?  Didn't seem to be - checked LUN level queuing and total queuing across all LUNs during the timeperiod and not much correlation.  I suspect queuing at the SAN storage controller due to activity coming in from other servers at that time.