ITSM basics: Problem or incident?


Posted by
Philip Bertholle

July 13, 2017

Amongst the ITIL processes (incident, problem, service request, change), two of these processes collide: incident and problem. In the English dictionary the two words are technically synonyms, however in the IT world the confusion between the two concepts goes beyond mere semantics. Being able to properly determine the difference between an incident and a problem means being able to in turn follow the proper process for resolution.

So what’s the difference? The ITIL definitions of problem vs. incident

An incident is an unplanned disruption or degradation of service. It’s an event which is not part of the standard operation of a service and which causes or may cause disruption to or a reduction in the quality of services and customer productivity.

The purpose of flagging an incident is to alert the IT team that the operations were experiencing a trouble serious enough to impact the service delivered to customers (a customer can be either a physical person or also another IT system that relies on the faulty system).

Consequently, the incident is strongly service driven by nature. It is created to cope with an emergency and strives to recover an acceptable level of service from the customers’ point of view – whatever it takes. Quite often, this means that the end goals – restoring service ASAP to meet SLAs – are far from being a definitive solution.

 For incidents, you’ll want to record the following information:

  • Name
  • Contact details
  • Employee ID
  • Asset tag
  • VIP/critical user status
  • Location
  • Status
  • Type
  • Priority
  • Category
  • Title
  • Description
  • Service affected
  • Assigned teams
  • Resolution details
  • Fix details
  • Related problem record 

Conversely, A problem is a cause of one or more incidents in which the root cause is unknown.

In some cases, problems may be identified because of multiple incidents exhibit common characteristics. Problems can also be identified from a single significant incident, indicative of a single error, for which the cause is unknown. In other cases problems will be proactively identified before any related incidents occur.

In this situation, it is necessary to follow a problem management workflow, wherein the resolution comes from finding and addressing the root cause. A problem may not have the same time constraints as the incident, which allows the Problem Manager to appoint an ad-hoc task-force with more time and resources to investigate until they find a permanent solution. The goal is to proactively and efficiently solve the underlying causes of one or more incidents over the long term.

 For problems, you’ll want to record the following information:

  • Description of issue
  • Service affected and business impact
  • Downtime
  • Priority
  • Remedial actions to date
  • Support team details
  • Root cause analysis
  • Meeting minutes
  • Next steps
  • Related incidents
  • Related changes

Why does it matter? ITSM in practice

Following various JIRA Service Desk and ITSM deployments, I realised that from one company to another an incident is either called a  completely different name or is in fact a problem. According to ITIL and in a nutshell, an incident is an outage, for which if you cannot identify the cause it becomes a problem – which in turn becomes a major problem when no solution is found. In the real life and in a majority of companies, the problem manager is the incident manager, and often they are the service desk manager combined.

Let’s look at the ecosystem a bit more closely to see the problem (mind you, not the incident (wink)):

Today, ITSM functions at the enterprise level which requires companies to have a comprehensive help desk platform to effectively manage a modern IT processes and organisation. Companies need to be highly reactive to their internal and external customers, often with increasingly demanding SLAs. Many recurring challenges include growing and distributed teams. Teams need easy access to information and support to meet their deadlines and maintain business efficiency (this is where Atlassian’s JIRA Service Desk steps in to collect, service and report for an optimal IT service management).

What is concerning is that, in 2016, between 40% to 50% of customers contacted a help desk after they couldn’t find answers to their question via self-service. As nearly half of customers can not represent an isolated incident, this example should be tagged as a problem.

Yet if the workflows from incidents management and problem management collide, it causes confusion and consequently goes against the ITSM objective of improving service efficiency and simplicity!

As we discussed, the objective of the Incident Management Lifecycle is to restore the service as quickly as possible to meet Service Level Agreements – usually targeted at the user level. Yet the emphasis of Problem Management to resolve the root cause of errors and to find permanent solutions, which if the collective incidents reach this point it means it has escalated to the enterprise level.

In short, managing a problem like an incident is the equivalent of trying to put out a forest fire with one bucket of water.

If you need help getting your incident and problem management on track, Valiantys has extensive experience helping companies focus on the processes that matter. Click below to see our ITSM services and get in touch with one of our expert consultants.