Tag Archives: downtime

Do you have an Escalation Plan?

escalation planWhat does your IT organization do when a mission critical event takes place in your company?

Does the appropriate IT support component spring into action to minimize the risk imposed by the problem, , , or have you even sat down to think about and determine what these issues are and what you should do if they occur?

Sadly, many IT organizations wait until a major problem occurs before thinking about it. Unfortunately, this is a terrible time to start analyzing what you would do in the event of a major problem.

Major issues can occur in any industry, , , some things are unique to a particular business or industry. Here are some situations to think about:

  • Server or network failure
  • Remote office loses connectivity
  • Data interface between applications or outside entities goes down
  • Anything that endangers patient care in a hospital
  • Anything that puts employee safety at risk
  • Issues that can cause financial risk to the company
  • Things that significantly jeopardize client satisfaction

fireman1It’s important for your IT support team to respond quickly when major problems occur like the examples above. To do this, you need some type of high alert process that causes your team to take action when key events happen.

It will be much more effective when your employees know what causes an escalation event. what their action steps need to be, and have the knowledge and tools to be able to troubleshoot and resolve the problem, , , even who the escalation owner will be to manage and close out the response activities.

You want escalation to take place automatically so think about these things now. Trying to figure it all out when you have a problem is not a good time to start.

Risk #5: Downtime

The fifth risk listed in the Six Key Risks a CIO Must Avoid post is

This risk is actually two things, , , downtime plus what I like to call “lack of systems availability” when Users can’t access technology needed to do their job.

Downtime is straightforward, , , a server has crashed, a printer has broken, or a remote office router has failed. Something isn’t working so we have downtime.

System unavailability can mean the systems and network are all working properly but something prevents a User from accessing a system. An example might be when the IT organization freezes a server to perform an upgrade or maintenance.

In both situations, the User sees it as downtime. “I can’t work so something must be broken.”

A CIO must create a stable and reliable technology environment. Nothing will get you fired quicker than managing an IT organization that experiences lots of downtime. It is simply unacceptable.

The reason downtime is unacceptable is because it costs the company so much in many ways:

  • Loss of productivity
  • Morale issues
  • Client satisfaction problems
  • Troubleshooting and resolution expense
  • Loss of revenue

Effective CIO’s understand, “UPTIME IS KING !”

It’s important for a CIO to create an environment that supports a stable systems and network environment. To do this, the CIO should put in place a few key things:

  • Reliable hardware and network components – It goes without saying that an environment made up of old, dilapidated systems and network components is going to have failure. Understand where your “achilles heels” are and upgrade as needed to improve the stability of your technology environment.
  • Infrastructure support staff – Your infrastructure support can be staffed in-house or outsourced, but the staff must be capable and qualified to support the technologies used by the company. This team must also be positioned to respond quickly to problem issues.
  • Reliable support vendors – You need vendors you can count on, , , the type that provides reliable and responsive support.
  • Change management processes – Implementing processes to control changes made to networks and systems will help ensure thoroughness and quality of upgrade projects.
  • Monitoring systems – One of the best tools an infrastructure manager can have is an early warning of an impending failure. Good monitoring systems help you anticipate need.
  • Escalation procedures – When a system or network component goes down, you need to fix the problem as quickly as possible, , , this will be handled faster and more effectively when you have sound escalation procedures to follow.

Two additional things the CIO should understand is:

  1. How much downtime the company is experiencing
  2. The cost of downtime

When I joined a small company I knew we were having downtime issues, but with no Help Desk I couldn’t get a good handle on what kind of issues we were having. To gain a better understanding of our downtime situation, I created a simple spreadsheet and started tracking every downtime event we encountered. Within a couple of months I had a very good sense of what was going on which helped me in developing our strategy to stabilize our technology environment.

The other thing I’m a big advocate of is to understand the cost of downtime.

You can do this very easily for any component in your technology environment, from a larger server, a remote office router, , , even a desktop PC. Take a look at an ITLever post I wrote about this and download the Cost of Downtime tool. There is a link to a 20 Minute IT Manager training session that explains it all.

Reducing downtime should be a key focus of any CIO.