sql.sasquatch: SQLServer transaction log writes

I spend a lot of time thinking about high speed ETL. I also spend a lot of time thinking about DR solutions and backups.

Below you can read details on how I came to the following question (to which I don't yet know the answer and will update when I do): are SQL Server 60k writes 64k aligned?

*****
Aha! I don't think I'll have to bust out procmon for this after all. Just get a Windows striped volume, put my txlog (and only the txlog) on the striped volume, start perfmon monitoring the physical disks/LUNs in the striped volume with perfmon (writes per second, current disk queue depth, write bytes/second) and spin up a workload that pushes the logwriter to as many in-flight 60k writes as possible.

If the average write size on the physical volumes is ~60k and the current queue length is 16 or less - awesome! That would mean striping is keeping each write intact and not spitting it, and that the queue depth on each LUN is lower (so that replication readers, etc have room in the queue depth of 32 to do their stuff without pushing anything into the OS wait queue.)

But if the average write size is ~32k... that would mean that most 60k writes by the log writer are being split into smaller pieces because they are not aligned with the 64k stripes used by the Windows LVM.

I guess even if the writes aren't 64k aligned, Windows striping may still be useful for my scenarios... but would have to stripe 4 LUNs together into a striped volume in order to lower the queue length for burdened log writer activity from 32 (with a single LUN) to 15.

*****

Each SQL Server transaction log can sustain up to 32 concurrent in-flight writes, with each write up to 60k. To get the fastest ETL, fast transaction log writes at queue length 32 are a necessity. That means... put such a transaction log on its own Windows drive/mounted partition, since typically the HBA LUN service queue depth is 32. Put other files on there, too, and the log writer in-flight writes might end up in the OS wait queue. If writes wait on a full service queue of reads, they'll be ESPECIALLY slow. There are other ways to make them especially slow - for example to serialize the inflight writes to to a synchronous SAN replication strategy. Anyhooo...

In massive ETL its not unusual for the transaction log writer to wait on completion of 32 writes, each 60k, and not issue the next write until one of them completes.

Writes are usually to write SAN cache, and should be acked on receipt to write cache. As such, as long as the write is in the HBA service queue (rather than in OS wait queue), front end port queue depth isn't saturated, front end CPU isn't saturated, and SAN write cache isn't saturated - writes should be doggone fast already. (The overhead of wire time - or 'wait for wire time' - for synchronous SAN replication also shouldn't be overlooked when evaluating write latency.) So what can be done to improve these writes that are already typically pretty fast?

I'm not a fan of using Windows striped volumes for SQL Server data files - there's a fixed 64k stripe size. That will circumvent large readahead attempts by SQL Server. But for speeding up transaction log access, striped volumes may be just the thing I need. (Robert Davis - @sqlsoldier - pointed out that unless there is underlying data protection a Windows striped volume offers no data redundancy or protection. I'm only going down this path because underneath the Windows basic disks, whether in striped Windows volume or not, SAN storage is providing RAID10, RAID5, or RAID-DP protection.)

So... this is where the question of 64k alignment of the 60k writes comes in.

Assume 32 inflight writes at 60k each issued by SQLServer logwriter to a txlog all by itself on a Windows striped volume composed of two equally sized basic disks. If the writes are not 64k aligned, the sames as the Windows stripes, the write activity passed down through the HBA will break down like the chart below. Its painful to look at, I know. Haven't figured out a less confusing way to represent it yet. Basically, each 64k stripe on either basic disk will contain either a full 60k transaction log write and only 4k of the next transaction log write, or two partial transaction log writes. All tolled, 32 writes gets broken down into 60 writes! (By the way, this same idea is why its important to have Windows drives formatted so that their start aligns with expected striping.)

Basic Disk A Basic Disk B

1 - 60k

2 - 4k 2 - 56k

3 - 52k 3 - 8k

4 - 12k 4 - 48k

5 - 44k 5 - 16k

6 - 20k 6 - 40k

7 - 36k 7 - 24k

8 - 28k 8 - 32k

9 - 28k 9 - 32k

10 - 36k 10 - 24k

11 - 20k 11 - 40k

12 - 44k 12 - 16k

13 - 12k 13 - 48k

14 - 52k 14 - 8k

15 - 60k

16 - 60k

17 - 60k

18 - 4k 18 - 56k

19 - 52k 19 - 8k

20 - 12k 20 - 48k

21 - 44k 21 - 16k

22 - 20k 22 - 40k

23 - 36k 23 - 24k

24 - 28k 24 - 32k

25 - 28k 25 - 32k

26 - 36k 26 - 24k

27 - 20k 27 - 40k

28 - 44k 28 - 16k

29 - 12k 29 - 48k

30 - 52k 30 - 8k

31 - 60k

32 - 60k

So by striping the txlog, instead of 1 LUN with 32 writes and 1920 write bytes inflight… its 2 LUNs, each with 30 writes & 960 total write bytes outstanding. 50% reduction in write bytes per volume, 6% reduction in concurrent write IOs per LUN (from 32 to 30).

On the other hand, if the writes are 64k aligned, it'd be an even split: 16 writes and 920 write bytes outstanding to each LUN, a 50% reduction in both outstanding writes and outstanding write bytes.

So unless someone knows the answer, I guess we'll be busting out procmon and tracking transaction log write offsets once we crank the workload up to consistently hit 60k writes. If they are 64k aligned, I'll be happy - I can blog in the near future about Windows striped volumes getting me out of a few jams. If not... it'll probably be back to the drawing board.

sql.sasquatch

Friday, April 4, 2014

SQLServer transaction log writes - 64k aligned?

No comments:

Post a Comment

sql.sasquatch Archive