Stop me if you’ve heard this one before. :-)
I keep searching for a clear but complete way
to tell this story – if you enough of my posts you’ll probably see various attempts to explain this – looking for something that gives folks an 'aha! ' moment.
Below are right-left comparisons of two 10 minute periods on
the same system, a SQL Server VMware VM using fibre-channel SAN storage. The graphs below are from perfmon
LogicalDisk(_Total) counters. Perfmon is
using a 15 second capture interval.
The first time period sees between 200 and 450 mb/second
disk throughput, with average read latency remaining below 6 ms. That’s great.
The second time period, though, doesn’t look healthy. Less than 60 mb/second of disk throughput
with average read latency ranging from 400 ms to just over 2 full seconds. What’s the deal?
Maybe the second time period is overloaded with IOPs? Nope – not according to the comparison
below. The healthy first time period
remains above 1600 IOPs for the most part, while the second time period never
goes above 1600 IOPs.
Weird, huh? The same system averaging about 2000 IOPs and
over 300 mb/second with read latency <6ms in the morning is struggling to do
1000 IOPs and 60 mb/second.
The second time period has more write activity – the first time
period has almost no write activity.
Maybe the system is penalized for writes? Synchronous replication could do that. Insufficient write cache could do that. Copy-on-write could do that. Bypassing write cache could do that
(especially if RAID5 or RAID6). But all
of that requires insight into the underlying SAN configuration and
activity. Anything else to look at in
the VM guest before descending to the physical host and the SAN? One avenue would be to split out metrics for
each Windows guest logical drive – perhaps there are significant differences
among them? I won’t include the graphs
here – I did that and although there were differences they were not the most
significant issue on this system.
I like to refer to 'outstanding IO' as a missing dimension
in SQL Server performance analysis and system design. I hope examples like this explain why.
LogicalDisk perfmon counters allow us to look at 'Current
Queue Length'. Many folks believe that,
since SAN LUNs are comprised of many underlying storage devices, the 'Current
Queue Length' metrics are not very meaningful.
I hope to disabuse performance investigators of that notion. :-)
At a total disk queue length under 16, the system seemed to
perform great. But with a queue length
above 100 – and up to nearly 800 – the system struggled.
The low queue length of the first time period is
expected. The first time period was SQL
Server backup.
If you stare at perfmon or other monitoring of SQL Server
backup and checkdb long enough – some interesting patterns arise. Basically, both backup and checkdb use a
fixed maximum read queue length as they progress. For the SQL Server backup, the maximum number
of outstanding reads is determined by the number of backup buffers. DBCC CHECKDB uses the number of Windows drives
that the database data files occupy to plan its read queue length (unless trace
flag 2549 is enabled globally – then it’s the number of files). Tricky, huh?
Want to speed up backup or checkdb with physical_only? Often that can be done by increase their read
queue length! Give backup more buffers,
or use more files (and potentially more LUNs) for the database contents for
checkdb. I wasn’t even planning to go
on that tangent – but I’ll leave it there.
We’ll pick up with more stuff about backup and checkdb performance
another time.
The second time period – wow! Certainly not backup or checkdb. If you run a system like mine, which
typically is engaged by more parallel queries than cores, with each query
fairly data-hungry – this can be important: each SQL Server worker thread can perform its own reads. Each worker thread can be responsible for
more than 1 asynchronous read. 4
vcpus? Maximum outstanding read liability
isn’t directly tied to the number of vcpus.
It is more directly tied to the active number of workers. And SQL Server Enterprise Edition always has
at least 512 workers by default.
At this point, time to examine the guest vHBA adapter
configuration. I’ve mentioned before
that the LSI adapter does not handle high queue length well – the pvscsi vHBA
is much better at that. The LSI adapter
can submit IOs up to 32mb in size – but here’s a secret that isn’t too well
kept: SQL Server won’t do any disk IOs that large. I’ve heard rumors that SQL Server can perform
8 mb reads – but the largest read I’ve
ever seen from SQL Server was 4 mb.
Unfortunately, the pvscsi driver will only allow disk IO up to
512kb. But, in my workload the larger
the IO, the less frequent due to fragmentation and interleave of tables and
indexes. So sacrificing 1, 2, and 4 mb
reads still has a potential performance effect – but in every case I’ve worked
on with DSS workloads, the tradeoff from LSI adapter to pvscsi was a significant
net gain.
In this case, the C: boot drive was left assigned to the LSI
adapter, and the database data file and transaction log volumes were assigned
to a pvscsi vHBA that was added to the VM.
A Windows registry value was used to increase the pvscsi vHBA adapter
queue depth from its default of 256 to 1024(link below adds details for that).
Look at that! The target workflow (had to go earlier in the
morning to get a 10 minute sample because it was finishing WAY faster now J) used to have less
throughput than the backup. Now its got
more for part of the 10 minute period!
(This implies adding backup buffers during the backup, will probably
speed the backup up, too.)
Read latency has decreased incredibly – it now stays
comfortably below 100 ms. This workflow
is reasonably tolerant of read latency up to 100 ms, as long as throughput is
high.
Wow – look at that jump in IOPs for the target
workflow! Under 1600 previously, IOPs
peak over 9000 now.
I’m glad we increased the pvscsi vHBA queue depth from 256
to 1024 – even with the performance improvement, queue length nearly reaches
800.
Below are some of my favorite references when its time to add a pvscsi vHBA to a vm.
*****
Retrofit a VM with the VMware Paravirtual SCSI Driver
Technobabble by Klee
David Klee
http://www.davidklee.net/2014/01/08/retrofit-a-vm-with-the-vmware-paravirtual-scsi-driver/
Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than Paravirtual SCSI default values (2053145)
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2053145
Retrofit a VM with the VMware Paravirtual SCSI Driver
Technobabble by Klee
David Klee
http://www.davidklee.net/2014/01/08/retrofit-a-vm-with-the-vmware-paravirtual-scsi-driver/
Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than Paravirtual SCSI default values (2053145)
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2053145
No comments:
Post a Comment