IOPS
is commonly recognized as a standard measurement of performance whether to
measure the Storage Array’s
backend drives or the performance of the SAN. In its most basic terms IOPS are
the number of operations issued per second, whether, read, writes or other
and admins will typically use their Storage Array tools or applications
such as Iometer to
monitor
IOPS.
IOPS will vary on a number of factors that include a system’s
balance of read and write operations, whether the traffic is sequential, random
or mixed, the
storage
drivers the OS background operations or even the I/O Block size.
Block size is usually determined by the application with different
applications using different block sizes for various circumstances. So for
example Oracle will typically use block sizes of 2 KB or 4 KB for
online transaction processing and larger block sizes of 8 KB, 16 KB, or 32 KB
for decision support system workload environments. Exchange 2007 may use an 8KB
Block size, SQL a minimum of 8KB and SAP 64KB or even more.
IOPS and MB/s both need to be considered |
Additionally
it is standard practice that when IOPS is considered as a measurement of
performance, the throughput that is to say MB/sec is also looked at. This is
due to the different impact they have with regards to performance. For example an application with only a 100MB/sec of throughput but
20,000 IOPs, may not cause bandwidth issues but with so many small commands,
the storage array is put under significant exertion as its front end processors
have
an immense workload to deal with. Alternatively if an application has a low
number of IOPS but significant throughput such as long sustained reads then the
exertion will occur upon the SAN’s links.
Despite
this MB/s and IOPS are still not a good enough measure of performance when you
don’t
take
into
consideration
the
Frames per second. To elaborate, referring back to the FC Frame, a Standard FC Frame has a
Data Payload of 2112 bytes i.e. a 2K payload. So in the example below where an application has an 8K I/O block size, this
will require 4 FC Frames to carry that data portion. In this instance this
would equate to 1 IOP being 4 Frames. Subsequently 100 IOPS in this example would equate to 400 Frames. Hence
to get a true picture of utilization looking at IOPS alone is not sufficient
because there exists a magnitude of difference between particular applications
and their I/O size with some ranging from 2K to even 256K, with some applications such as backups having
even larger I/O sizes and hence more Frames.
Frames per second give you a better insight of demand and throughput |
Looking
at a metric such as the ratio of frames/sec to Mb/sec as is displayed below,
we will actually get a better picture and understanding of the environment and
it’s
performance.
To elaborate, the MB/sec to Frames/Sec ratio is different to the IOPS metric. So with reference to this graph of MB/sec to Frame/sec ratio, the line graph should never be below the 0.2 of the y-axis i.e. the 2K data payload.
If
the ratio falls below this, say at the 0.1 level, we can identify that data is
not being passed efficiently despite the throughput being maintained (MB/sec).
Given
a situation where you have the common problem of slow draining devices, the case that MB/s and
IOPS alone are not sufficient is even more compelling as you can
actually be misled in terms of performance monitoring.
To
explain, Slow draining devices are devices that are requesting more information
than they can consume and hence cannot cope with the incoming traffic in a
timely manner.
This
is usually because
the
devices such as the HBA have slower link rates then the rest of the
environment, or the server or device are being overloaded in terms of CPU or
memory and thus having difficulty in dealing with the data requested. To
avoid performance problems it is imperative to proactively identify them before
they impact the application layer and consequently emanate to the business’
operations.
Slow Draining devices - requesting more information than they can consume |
In
such a situation, looking again at the MB/S Frames
per Sec ratio graph below we can now see that the ratio is at the 0.1 level, in other
words we are seeing a high throughput but minimum payload. This enables you to
proactively identify if there are a number of management frames being passed
instead of data as they are busily reporting on the physical device errors that
are occurring.
Management Frames being passed can mislead |
So
to conclude without taking Frames per second into consideration and having an
insight into this ratio it is an easy trap to falsely believe that everything
is ok and data is being passed as you see lots of traffic as represented by
MB/S, when in actuality all you are seeing are management frames reporting a
problem.
Here's an animated video to further explain the concept: