When thinking about problem management in your IT organization, it’s a good idea to first understand how ITIL differentiates problems and incidents
within the service management framework. Problems are defined as the cause of one or more incidents, where incidents are simply disruptions in service.
In this post, I’ll break down what exactly problem management is, as well as some related terminology.
What Is Problem Management?
Problem management is a term used in the IT service management world to represent the process of managing, analyzing, or fixing the root cause of repeat incidents (problems) to prevent future disruption.
In essence, problem management involves not only finding and fixing errors that may have led to an influx of opened tickets, but also understanding relationships between incidents, configuration items, tasks, and other systems over time.
By taking a step back and looking at other configurations—like changes in your configuration management database (CMDB) or a recent change management process—IT pros can better understand where vulnerabilities exist and prevent issues in the future.
The Importance of Problem Management in the Service Desk
Implementing an IT change management
or asset management
process may be a priority for some IT pros, but problem management is critically important to helping minimize downtime whenever disruptions happen.
To be proactive, it’s a good idea to create a problem management process within your organization along with any change management or incident management processes you may already have in place.
Implementing a Problem Management Process
A problem management process owner
role within your IT organization can create a general problem management process to outline what exactly happens when a problem is identified.
The problem management process may be a list of tasks like creating a problem record, submitting a request for change (RFC) to fix an error, or document workarounds and other problem management-related tasks.
One key component of problem management is being quick to respond to problems, from creating a problem record to kicking off a checklist of tasks in the event a piece of your infrastructure goes down. But it’s more than simply being reactive.
Equally as important is being proactive
and developing a process for predicting and preventing these problems from occurring, to provide better service to employees.
Understanding Problem, Incident, and Change Management
IT problem management processes are closely aligned with IT incident
management tasks, and encompass many activities aimed at problem discovery.
This includes trend analysis and detecting patterns in historical data
about incidents or incident details. Often times the use of diagrams such as cause and effect, tree, or fishbone charts can help map these relationships.
Some conflate IT problem management with incident management
. IT incident management addresses the handling of individual issues that have already occurred, where the goal is to resolve the incident and restore service to normal as soon as possible. With problem management, the focus is on correcting known problems to proactively understand the root cause and prevent future incidents from taking place.
Once problems have been identified, formal procedures for recording, categorization, investigation, diagnosis, escalation (when needed), and resolution should be set and followed.
Especially in the context of ITSM problem management, workarounds are a concept you will likely encounter and should be familiar with. Workarounds are temporary fixes that go with your problem management process if a permanent solution is not yet achievable. This is a great way to avoid SLA breaches
if a resolution isn’t possible yet.
Additionally, workaround solutions must be documented and shared with other members of the IT support team and employees (end users) when appropriate. IT problem management will often require significant structural or architectural changes to address major issues.
This also closely links IT problem management procedures to change management processes.
Tasks and Root Cause Analysis
Since finding a solution to an underlying root cause is the main goal, you want to create and assign tasks to certain people responsible for investigating problem areas within a problem management dashboard.
This ensures teams have visibility into who is responsible for what in terms of identifying the error causing problems, and what may be required to achieve resolution.
By creating a root cause analysis
(RCA) process, your organization will have a basic framework for getting to the bottom of problems, including defining, understanding, researching, and implementing a good solution.
Root cause analysis is a repeatable process for working through what’s going on (to prevent future problem occurrences) and can be customized to the needs of your organization.
The Big Picture
While problem management can sometimes seem like a vague or complex process, it’s just an element of your service management strategy enabling service desk agents to resolve, manage, and prevent problems to limit disruption.
Since problem management leaders typically work with a wide range of employees from technical leaders to other change management and incident management owners, it’s helpful to understand problem management in relation to these various systems and personnel using a flow chart or diagram.
A modern problem management solution
within your service desk can provide IT organizations with increased visibility into what’s going on, making it easier to view related incidents, tasks, or workarounds.