ITSM

How to Plan for Major Incidents in ITSM

How to Plan for Major Incidents in ITSM

Major Incident Management and Preparedness

A “major incident” can be a nightmare for any business – that much seems agreed upon. In practice, however, defining what a major incident is and how to respond to one effectively is a bit more challenging to find consensus opinion about. Ironically, agreement is one of the necessary steps in defining what a major incident is, when considering major incident management.

Do you know what a “major incident” is and how your organization should best handle one of the many scenarios that may constitute one? Read on to learn more about how to deal with major incidents, as well as the importance of having a plan in place.

When an Incident Constitutes a Major Incident

Panic, a flood of calls to customer service, management in crisis mode – the hallmarks of a major incident are pretty difficult to miss. But, according to ITIL and ISO 20000, what denotes a “major incident” is rigid in definition. Furthermore, the response required is rigidly defined, as well – and at its essence, what denotes a major incident must be agreed upon by both management and IT.

A major incident requires a separate procedure for addressing the incident—separate from ongoing efforts to address the problem(s) that led to the incident. The primary distinction between incidents and problems is that problems lead to incidents, and are things that can be addressed in order to lessen the occurrence of incidents. This procedure requires the defining of responsibility and a review of what led to the incident, as well as how to prevent it from occurring in the future.

Preparing for Major Incident Management

Advance planning can make a tremendous difference when it comes time for major incident management. Putting together a major incident team or management group, or at the very least, identifying an Incident Manager is a great first step. Defining what denotes a major incident for your organization is also key to the process, as well as defining how major incidents will be identified and communicated to the appropriate departments throughout your company.

Designing and implementing agreed-upon emergency procedures for major incident management is another key advance measure. Finally, designing and implementing an incident management test scenario, with regular practice exercises, is also an integral component of preparing for major incident management.

Shifting Roles in a Major Incident Scenario

As mentioned above, one key aspect of major incident preparedness requires the forming and training of an incident team or teams to execute agreed-upon major incident protocols. Depending on the size of your organization, this could be a single Incident Manager, or in the case of large organizations, several incident management teams. When a major incident occurs, all team members should have clearly defined roles and responsibilities while the incident is being resolved. A Post-incident plan is also necessary in order to make sure that the problem was handled appropriately and that steps are put into place to prevent it from recurring.

Roles to consider assigning include: Incident Manager, Root Cause Analyst (or Problem Manager), and Major Incident Investigator (or Investigation Team). Additionally, protocols need to be established for the service desk team, service level or account managers, and other teams during a major incident to insure that communication around service interruptions and customer issues are handled according to established policy.

Major Incident Management in the Wake of a Serious Issue

Following a major incident, it can be tempting to return to business as usual, as quickly as you can. However, that temptation should be resisted as much as possible. While returning to optimal service levels as quickly as possible is ideal, problem management in the wake of a major incident, the analysis of the organization’s response to the incident, and taking the opportunity to revisit your incident management test scenario for improvements are all equally crucial.

To learn more about incident management from an ITIL perspective, watch our webinar.

New call-to-action hbspt.cta.load(41925, ‘5ac5565b-4805-49bd-acdf-254c674ab881’, {});


Nathan Riley is a Sales Director, ITSM at SolarWinds. He has nine years experience in the industry, and has had a front row seat for the evolution of service management as a platform for the entire organization. He helps organizations ranging from SMB to Fortune 500 bring customized service to employees. Nathan proudly served the United States Armed Forces in the United States Marine Corps.