2011 was a year where despite the economic constraints
everything Big was seemingly good; Big Data, Big Clouds, Big VMs etc. Caught in
the industry’s lust for this excess, 2011 was also the year I lost count of how
many overprovisioned resources to ‘Big’ Production VMs I witnessed. More often
than not this was a typical reaction from System Admins trying to alleviate
their fears of potential performance problems to important VMs. It was the year
where I began to hear justifications such as “yes we are overprovisioning our
production VMs..but apart from the cost savings, overallocating our available
underlying resources to a VM isn’t a bad thing, in fact it allows it to be
scalable”. Despite this 2011 was also the year where I lost count of the amount
of times I had to point out that sometimes overprovisioning a VM does lead to
performance problems - specifically when dealing with Virtual CPUs.
VMware refers to CPU as pCPU and vCPU. pCPU or ‘physical’ CPU
in its simplest terms refers to a physical CPU core
i.e. a physical hardware execution context (HEC)
if hyper-threading is unavailable or disabled. If hyperthreading has
been enabled then a pCPU would consitute a logical CPU. This is because hyperthreading
enables a single processor core to act like two processors i.e. logical
processors. So for example, if an ESX 8-core server has hyper-threading enabled
it would have 16 threads that appear as 16 logical processors and that would
constitute 16 pCPUs.
As for a virtual CPU (vCPU) this refers to
a virtual machine’s virtual processor and can be thought of in the same vein as
the CPU in a traditional physical server. vCPUs run on pCPUs and by default, virtual machines are allocated
one vCPU each. However, VMware have an add-on software module named Virtual SMP
(symmetric multi-processing) that allows virtual machines to have access to
more than one CPU and hence be allocated more than one vCPU. The great
advantage of this is that virtualized multi-threaded applications can now be
deployed on multi vCPU VMs to support their numerous processes. So instead of
being constrained to a single vCPU, SMP enables an application to use multiple
processors to execute multiple tasks concurrently, consequently increasing
throughput. So with such a feature and all the excitement of being ‘Big’ it was
easily assumed by many that taking advantage of such a feature by provisioning
additional vCPUs could only ever be beneficial – but if only it was that simple.
The typical examples I faced entailed performance problems that
were either being blamed on the Storage or the SAN and not CPU constraints
especially as overall CPU utilization for the ESX
server that hosted the VMs would be reported as low. Using Virtual
Instruments’ VirtualWisdom I was able to quickly conclude that the problem was
not at all related to the SAN or Storage but the hosts themselves. By being
able to historically trend and correlate the vCenter, SAN and Storage metrics
of the problematic VMs on a single dashboard it was
apparent that the high number of vCPUs to
each VM was the cause. This was indicated by a high reading of what is termed
the 'CPU Ready' metric.
To elaborate, CPU Ready is a metric that measures
the amount of time a VM is ready to run against the pCPU i.e. how long a vCPU
has to wait for an available core when it has work to perform. So while it’s
possible that CPU utilization may not be reported as high, if the CPU Ready metric
is high then your performance problem is most likely related to CPU. In the
instances that I saw, this was caused by customers assigning four vCPUs and in
some cases eight to each Virtual Machine. So why was this happening?
VirtualWisdom Dashboard indicating high CPU Ready |
Well firstly the hardware and its
physical CPU resource is still shared. Coupled with this the ESX Server
itself also requires CPU to process storage requests and network traffic etc.
Then add the situation that sadly most organizations still suffer from the
‘silo syndrome’ and hence there still isn’t a clear dialogue between the System
Admin and the Application owner. The consequence being that while multiple
vCPUs are great for workloads that support parallelization but this is
not the case for applications that don’t have built in multi-threaded
structures. So while a VM with 4 vCPUs will require
the ESX server to wait for 4 pCPUs to become available, on a particularly busy
ESX server with other VMs this could take significantly longer than if the VM
in question only had a single vCPU.
To explain this further let’s take an example of a four pCPU host that has four VMs, three with 1 vCPU and
one with 4 vCPUs. At best only the three single vCPU VMs can be scheduled
concurrently. In such an instance the 4 vCPU VM would have to wait for all four
pCPUs to be idle. In this example the excess vCPUs actually impose
scheduling constraints and consequently degrade the VM’s overall performance,
typically indicated by low CPU utilization but a high CPU Ready figure. With the ESX server scheduling and prioritising workloads
according to what it deems most efficient to run, the consequence is that
smaller VMs will tend to run on the pCPUs more frequently than the larger
overprovisioned ones. So in this instance overprovisioning was in fact proving
to be detrimental to performance as opposed to beneficial. Now in more recent versions of vSphere the scheduling of
different vCPUs and de-scheduling of idle vCPUs is not as contentious as it
used to be. Despite this, the VMKernel still has to manage every vCPU, a
complete waste if the VM’s application doesn’t use them!
To ensure your vCPU to pCPU ratio is at its optimal level and
that you reap the benefits of this great feature there are some straightforward
considerations to make. Firstly there needs to be dialogue between the silos to
fully understand the application’s workload prior to VM resource allocation. In
the case of applications where the workload may not be known, it’s key to not
overprovision virtual CPUs but rather start with a single vCPU and scale out as
and when is necessary. Having a monitoring platform that can historically trend
the performance and workloads of such VMs is also highly beneficial in
determining such factors. As mentioned earlier CPU Ready is a key metric to
consider as well as CPU utilization. Correlating this with Memory and
Network statistics, as well as SAN I/O and Disk I/O metrics enables you to
proactively avoid any bottlenecks and correctly size your VMs and hence avoid
overprovisioning. This can also be extended in considering how many VMs you
allocate to an ESX Server and in ensuring that its physical CPU resources are
sufficient to meet the needs of your VMs. As businesses’ key applications
become virtualized it’s an imperative that whether they are old legacy single
threaded workloads or new multi threaded workloads the correct vCPU to pCPU
ratio is allocated. In this instance size isn’t always everything it’s what you
do with your CPU that counts.