Love it or hate it, ITIL and Change Management will always
be an integral part of any IT set up with regulations such as BASEL II, FISMA,
SOX (Sarbanes-Oxley) and HIPAA constantly breathing down the neck and conscience
of organization leaders. Having once had a “purple badge” wearing ITIL guru for
a manager, it always fascinated me how he’d advocate the framework as the
solution to all our IT problems. While he’d hark on about defining repeatable
and verifiable IT processes, it always ended up being theoretical as opposed to
practical, often emphasized by his own IT competency, “Err, Archie how do I
save this Word document and what on Earth is that SAN thing you keep going on
about?”
Several years later after moving from being a customer to a
technical consultant my impression of the effectiveness of the CAB failed to
improve. Midweek and late in the day in the customer’s data center with their
SAN Architect, I’d pointed out that they had cabled up the wrong ports in their
SAN switches and that this would require a change to be raised. “No need for
that” replied the SAN architect, “I’m one of the CAB members”. He then to my
shock and in true Del Boy fashion, duly proceeded to pull out and swap the FC
cables to his production hosts with a big grin on his face. Several minutes
later his phone rang, to which he replied, “It’s okay, I’ve resolved it. There
was a power failure on some servers.” Then with a cheeky grin, a swing of the
head and a wink of an eye, he turned to me and said, “There you go sorted,
lovely jubbly!”
While my initial skepticism to ITIL’s practicality was
centered around my personal experiences it was only embellished by the number
of long white bearded external auditors that would supposedly check whether
proper controls existed within the many firefighting and cowboy organizational
procedures I witnessed. Like a classroom of kids hearing the teacher coming up
the corridor and scurrying to get to their desk to present a fabricated
impression of discipline and order, I never ceased to be astounded by the last
minute changes and running around of our compliance folk to ensure we successfully
passed our audits. Despite having more daily Priority 1s than the canteen was
serving decent hot meals, we still inexplicably passed every audit with flying
colours, which in turn emboldened the rogue “under the radar” operational
practices that served to keep the lights on.
So with such a tarnished experience of ITIL, it was with
great curiosity and interest that led me to look closer at the movement and
initiative of ITPI’s Visible Ops. While still mapping its ideas to ITIL
terminology, the onus of Visible Ops is on increasing service levels,
decreasing costs and increasing security and auditability. In simplest terms,
Visible Ops is a fast track / jumpstart exercise to an efficient operating
model that replicates the researched processes of high-performing organizations
in just four steps.
To summarise, the first of these four steps is what is
termed Phase 1 or "Stabilize the Patient". With the understanding
that almost 80% of outages are self-inflicted, any change outside of scheduled
maintenance windows are quickly frozen. It then becomes mandatory for problem
managers to have any change related information at hand so that when that 80%
of “unplanned work” is initiated a full understanding of the root cause is
quickly established. This phase starts at the systems and business processes
that are responsible for the greatest amount of firefighting with the aim that
once they are resolved they would free up work cycles to initiate a more secure
and measured route for change.
Phase 2, which is termed “Catch & Release” and “Find
Fragile Artifacts”, is related to the infrastructure itself with the
understanding that it cannot be repeatedly replicated. With an emphasis on
gaining an accurate inventory of assets, configurations and services, the objective
is to identify the “artifacts” with the lowest change success rates, highest
MTTR and highest business downtime costs. By capturing all these assets, what
they’re running, the services that depend upon them and those responsible for
them, an organization ends up in a far more secure position prior to a Priority
1 firefighting session.
Phase 3 or “Establish Repeatable Build Library” is focused
on implementing an effective release management process. Using the previous
phases as a stepping stone, this phase documents repeatable builds of the most
critical assets and services enabling their rebuilding to be more cost
effective than to repair. In a process that leads to an efficient
mass-production of standardized builds, senior IT operations staff can transform
from a reactive to a proactive release management delivery model. This is
achieved by operating early in the IT operations lifecycle by consistently
working on software and integration releases prior to their deployment into
production environments. At the same time a reduction in unique production
configurations is pushed for, consequently increasing the configuration
lifespans prior to their replacement or change which in turn leads to an
improvement in manageability and reduction in complexity. Eventually the
output of these repeatable builds are "golden" images that have been
tried, tested, planned and approved prior to production. Therefore when new
applications, patches and upgrades are released for integration these golden
builds or images need merely updating.
Phase 4, entitled “Enable Continuous
Improvement” is pretty self explanatory in that it deals with building a closed
loop between the release, control and resolution processes. By completing the
previous three phases, metrics for the three key process areas (release,
controls and resolution) are focused on, specifically those that can facilitate
quick decision making and provide accurate indicators of the work and its
success in relation to the operational process. Drawing on ITIL‘s resolution
process metrics of Mean Time Before Failure (MTBF) and Mean Time to Repair
(MTTR), this phase looks at Release by measuring how efficiently and
effectively infrastructure is provisioned. Controls are measured by how
effectively the change decisions that are made keep production infrastructure
available, predictable and secure, while Resolution is quantified by how
effectively issues are identified and resolved.
So while these four concise and particular phases look great
on paper what really differentiates them from potentially just being another
theoretical process that fails to be delivered comprehensively in practical
reality? If the manner in which IT is procured, designed, configured, validated
and implemented remains the same there is little if any chance for Visible Ops
to succeed any much further than the Purple Badge lovers of ITIL. But what if
the approach to IT and more specifically its infrastructure was to change from
the traditional buy your own, bolt it together and pray that it works method
and instead transferred to a more sustainable and predictable model? What if
the approach to infrastructure was one of a green fields approach or seamless
migration to a pretested, pre-validated, pre-integrated, prebuilt and
preconfigured product i.e. a true Converged Infrastructure? What impact could
that possibly have on the success of Visible Ops and the aforementioned four
phases?
If we look at phase 1 and “stabilizing the patient” this can
be immediately achieved with a Vblock
where an organisation no longer has to spend time investigating and worrying
about the risk and impact of change. By having a standardized product based
approach as opposed to a bunch of components bundled together, thousands of
hours of QA testing and analysis work can be performed by VCE for each new
patch, firmware upgrade or update on a like for like product that is owned by
the customer. With this acting as the premise of a semi-annual release
certification matrix that updates all of the components of the Converged
Infrastructure as a comprehensive whole, risks typically associated with the
change process are eliminated. Furthermore as changes are dictated by this
pre-tested and pre-validated process and need to adhere to this release
certification matrix to remain within support, it helps eradicate any rogue
based changes as well as inform problem managers comprehensively of the
necessary changes and updates. Ultimately phase 1’s objective of stabilization
is immediately achieved via the risk mitigation that comes with implementing a
pre-engineered, pre-defined and pre-tested upgrade path.
IDC's Research of VCE Vblock customers found a significant reduction in unplanned downtime |
The challenge of phase 2, which in essence equates to
an eventual full inventory of the infrastructure, is a painful process at the
best of times especially as new kit from various vendors is constantly being
purchased and bolted on to existing kit. Moving to a Vblock simplifies this challenge as it’s a
single product and hence a single SKU at procurement. Akin to purchasing an
Apple Macbook that is made up of many components e.g. a hard drive, processor,
CD-ROM etc., the Converged Infrastructure’s components are formulated as a
whole to provide the customer a product. The parts of the product and all of
their details are known to the manufacturer i.e. VCE and can easily be
transferred as a single bill of materials to the customer with serial numbers
etc. thus ensuring an up to date and accurate inventory and consequently
simplified asset management process. When patches, upgrades and additions of
new parts and components are required they are automatically added to the
inventory list of the single product, thus ensuring up to date asset
management.
The Release Management requirement of Phase 3 offers a
challenge that is not only embroiled with risk but also takes up a significant
amount of staff and management time cycles to ensure that technology and
infrastructure remain up to date. This entails the rigmarole of downloading,
testing and resolving interoperability issues of component patches and releases
and relies heavily on the information sharing of silos as well as the success
of regression tests. The unique approach of a Vblock meets this challenge immediately by
making pre-tested, validated software and firmware upgrades available for the
end user enabling them to locate releases that are applicable for their
Converged Infrastructure system. With regards to the rebuild as opposed to
repair approach stipulated in phase 3, because a Vblock can be deployed and up and running in
only 30 days, the ability to have a like for like standardized infrastructure
for new and upcoming projects is a far easier process compared to the usual
build it yourself infrastructure model. On a more granular level, by having a
management and orchestration stack with a self service portal, golden image VMs
can be immediately deployed with a billing and chargeback model as well as
integration with a CMDB. The result is a quick and successful attainment of
phase 3 of the Visible Ops model via a unified release and configuration
management methodology that is highly predictable and enhances availability by
reducing interoperability issues.
IDC's Research also found Vblock customers gained siginficant operational savings |
Measuring the success of metrics such as MTTR and MTBF as
detailed in Phase 4 is ultimately linked to the success of the monitoring and
support model that’s in place for your infrastructure. With a product based
approach to infrastructure the support model will also be better equipped to
ensure continuous improvement. Having an escalation response process that is
based on a product, regardless if resolving a problem requires consultation
with multiple experts or component teams, ultimately means a seamless and
single point of contact for all issues. This end-to-end accountability for an
infrastructure’s support, maintenance and warranty makes the tracking of issue
resolution and availability a much simpler model to measure and monitor.
Furthermore with open APIs that enable integration with comprehensive
monitoring and management software platforms, the Converged Infrastructure can
be monitored for utilization, performance and capacity management as well as
potential issues that can be flagged proactively to support.
The Vblock 700MX |
As IT operational efficiency becomes more of an imperative
for businesses across the globe, the theoretical practices that have failed to
deliver are either being assessed, questioned or in some cases continued with.
What is often being overlooked is that one of the key and inherent problems is
the traditional approach to building and managing IT infrastructure. Even a
radical and well researched approach and framework such as Visible Ops will
eventually suffer and at worse fail to succeed if the IT infrastructure that
the framework is based on was built by the same mode of thinking that created
the problems. Fundamentally whether the Visible Ops model is a serious
consideration for your environment or not, by adopting the framework with a Vblock, the ability to stabilize,
standardize and optimise your IT infrastructure and its delivery of services to
the business becomes a lot more practical and consequently a lot less
theoretical.