Tuesday, April 21, 2015

3rd Try Charm: Why does pvscsi rather than LSI matter so much for SQL Server on VMware?


I've written two blog posts recently on the performance benefits of the VMware pvscsi vHBA over the LSI vHBA.

March 23, 2015
SQL Server on VMware: LSI vHBA vs pvscsi vHBA
http://sql-sasquatch.blogspot.com/2015/03/sql-server-on-vmware-lsi-vhba-vs-pvscsi.html

April 7, 2015
Another SQL Server VMware LSI vs pvscsi vHBA blog post
http://sql-sasquatch.blogspot.com/2015/04/another-sql-server-vmware-lsi-vs-pvscsi.html

These posts give details from different systems (although running a similar workload), and claim a benefit of roughly 10 fold in peak throughput and peak disk response times by switching from the default LSI vHBA to the pvscsi vHBA for the SQL Server LUNs.  That sounds a little fishy, doesn't it?  Especially if you are familiar with...

Achieving a Million I/O Operations per Second from a Single VMware vSphere® 5.0 Host
http://www.vmware.com/files/pdf/1M-iops-perf-vsphere5.pdf

Page 10 of the performance study above includes the following text.
"… a PVSCSI adapter provides 8% better throughput at 10% lower CPU cost."


That is a much more modest (and probably more believable) claim than 10x performance benefit.  What gives?

Here's a few details about the mechanics:
LSI vHBA has adapter queue depth 128, cannot be increased. LUN queue depth cannot be increased from default 32.
pvscsi vHBA has default adapter queue depth 256, and default LUN queue depth 64.  Adapter queue depth can be increased to 1024, LUN queue depth to 256 with Windows registry settings.
http://www.pearsonitcertification.com/articles/article.aspx?p=2240989&seqNum=3
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2053145


And here's a detail about the testing that I just happened to come across in the Longwhiteclouds blog.


"Maximum queue depth supported by LSI driver (in guest) cannot be changed. So to keep the aggregate outstanding I/Os per controller lower than the max queue depth we had to use 16 OIOs per vDisk. To have a fair comparison between LSI and pvSCSI, second test also had 16 OIOs per vDisk for pvSCSI as well. " Chethan Kumar, author of the VMware paper, as quoted on Longwhiteclouds.

So, comparison testing was done within the queue depth constraints of the LSI vHBA.  But, in the case of these Enterprise Edition SQL Server workloads, the number of outstanding IOs would often exceed 600 and microbursts as high as 1000 outstanding IOs occurred.  That's well outside the LSI adapter queue depth, and the queuing penalty can be high.  Even with 4 LSI adapters in a VM, the aggregate adapter queue depth would be only 512.

If a SQL Server workload doesn't burst more than 32 outstanding IOs per LUN or more than 128 outstanding IOs per vHBA adapter, the change to pvscsi would most likely bring rather modest performance benefits - along the lines of the 8% better throughput at 10% lower CPU utilization indicated in the whitepaper.  In fact, at that low level of outstanding IO... maybe there would be a slight performance decline.  That's because the LSI vHBA can allow an IO request up to 32mb in size.  SQL Server won't (yet) perform disk IO that large.  The largest disk IO I've seen from SQL Server has been 4 mb.*  The pvscsi vHBA adapter currently will allow a maximum disk IO size of 512kb.

However - really large disk IOs from SQL Server are in my experience fairly rare, and high aggregate queue length is more common.  For that reason, I heartily recommend using the pvscsi vHBA for SQL Server vms.  Retaining the LSI vHBA for the boot drive is common, even when pvscsi vHBAs are added for database LUNs.  I've got nothing against that approach. But its important to ensure that a SQL Server vm can handle the outstanding IO generated by its workload.  CPUs are hungry - feed them lots of data quickly :-).


*But Niko has shown that columnstore will do read IO up to 8 mb :-)
Clustered Columnstore Indexes – part 50 ("Columnstore IO")
http://www.nikoport.com/2015/04/04/clustered-columnstore-indexes-part-50-columnstore-io/       
 

2 comments:

  1. These three articles have been extremely valuable and enlightening! I am building a production SQL VM today that will do large ETL workloads, so of course I want to build it as you suggest. 2 questions - 1 - How can I put load on this server artificially before it is in production so I can show that the pvscsi adapters are working and making a difference, and 2 - why is it always such a battle for you to get VMware Administrators to give you pvscsi adapters? Is there a cost to them of which I am unaware? Thanks so much! Building today so I am hoping for a fast response if possible... Thanks again!

    ReplyDelete
    Replies
    1. Sorry - buried under landscaping chores :-)
      Q: How can I put load on this server artificially?
      A: either sqlio or the relatively new diskspd utility

      Q: Why such a battle?
      A: I think 3 reasons:
      1) LSI is the default, requires energy to change defaults :-) Especially since pvscsi typically requires install from VMware tools.
      2) Not much official documentation out there, and the WP references relatively small numbers in terms of the CPU overhead and throughput - although the WP tests use very low total queue length.
      3) Not much discussion in the SQL Server community yet of how the LSI vs pvscsi differences effect different workloads: low total queue length vs high total queue length.

      Delete