Tuesday, April 18, 2017

How many vcpus make the cores on this ESXi server oversubscribed?


How many vcpus on a given physical server before the physical cores are oversubscribed?


The CPU Scheduler in VMware vSphere® 5.1 
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf

Best Practices for Oversubscription of CPU, Memory, and Storage in vSphere Virtual Environments
How far can oversubscription be taken safely?
https://communities.vmware.com/servlet/JiveServlet/downloadBody/34283-102-2-46887/Dell%20%20%20Best%20Practices%20for%20Oversubscription%20of%20CPU%20%20Memory%20and%20Storage%20in%20vSphere%20Virtual%20Environments_0%20(1).pdf

The most common answer is that oversubscription begins when vcpu count is greater than core count; if the sum of vcpus across all VMs is less than or equal to the core count the physical server's cores are typically not considered oversubscribed.

But, the vcpus for a VM are not the only CPU scheduling needs for the ESXi server or even for the VM itself.  The hypervisor needs some CPU resources - these are among the system worlds on the ESXi server.  In addition, as of ESXi 6 each VM has 4 non-vcpu worlds that must occasionally be scheduled on physical cores.  These worlds take action on behalf of the VM, but their work is not in the guest context of the VM: stuff like handling the IO after its been handed off by the guest.

Imagine an ESXi server with 24 physical cores and a single 24 vcpu vm.  Let's keep all 24 guest vcpus very busy with a SQL Server workload - a very demanding ETL with data coming in over the network.  The vcpus are bound to the cores most of the time since they've got runnable SQL Server threads.  Those SQL Server threads are handling data coming in from the network, and also issuing reads and writes for the disk subsystem.

The hypervisor sooner or later has stuff its gotta do: when the hypervisor takes time on any core, that's time denied to a guest VM vcpu.  The non-vcpu worlds for the VM have stuff to do: time they spend on a physical core is time denied to a guest vcpu.  Even with relaxed co-scheduling, its still possible for the skew between the lead and lag vcpus for the VM to be greater than the threshold and result in throttling the VM's vcpus with co-stop time.

The idea of oversubscribing the cores of a server (and implementing co-scheduling policy, vm fairness policy, etc) is to drive up utilization.  The question serving as subtitle to the Dell paper above should be a clue that this approach to achieving maximum resource utilization can be antithetical to achieving the highest level of performance.

Excluding the questions of whether instructions being executed are necessary or efficient(fundamental application and database level questions), and the question of how much utilization is for management rather than meaningful work(typically a question for database level evaluation), a remaining gold mine is whether high resource utilization is a higher priority goal than limiting wait time for the resource.  (This is one huge reason that goals are extremely important to any conversation about performance and scalability.)

Even at one busy vcpu per core on the ESXi server, hypervisor and non-vcpu worlds can result in %ready time for the vm's vcpus.  Time that there is a runnable thread within the guest, dispatched to the vcpu, but with the vcpu waiting for time on the core.  And co-scheduling policy can amplify that under prolonged heavy demand.

So, typically, no-one will raise an eyebrow if the number of vcpus on an ESXi server is equal to the number of cores.  In fact, the numbers in the 2012 whitepaper above... that 1-3 times the number of cores is typically not a problem... are not too uncommon out there.  And in cases that high utilization is the primary goal, that's not bad.

But, if the primary goal is squeezing as much as possible out of a certain number of vcpus (think a SQL Server EE VM licensed per vcpu), don't be too surprised if someone like me comes along and starts scrutinizing wait time for CPU at the SQL Server and ESXi level, and maybe trying to talk someone into lowering the vcpu count to something below the core count, or using full reservation for that VM's vcpus and trying to make sure there's always enough time for the VMs non-vcpu worlds... or trying to get the VM marked as "latency sensitivity = high" 😀       

No comments:

Post a Comment