This is an excerpt, you will find more detailed descriptions in the book.
Many IT organizations find it difficult to distinguish between incidents and problems. Technical analysts and applications analysts are inquisitive by nature, which results in the underlying problem for each incident being investigated and becoming resolved. As investigating each problem requires resources, and resources cost money, in the worst case there are several problems which should not have been resolved due to the fact that in a larger perspective these resources should have been prioritized differently.
Problems that are investigated by individuals are usually resolved through personal knowledge and experience. Unfortunately, this knowledge and experience can be a limitation if the problem is complex and several different areas are involved. What is needed then is a more extensive investigation and a proven method to find the fundamental cause. Otherwise there is a risk that the problem will remain unresolved for a substantial period. Using Problem Management as a separate process ensures that the IT organization has control over existing problems and can prioritize which should be investigated as well as ensure that proven tools and methods are used for complex problems.
A problem is defined in this book as ”the unknown underlying cause that something has occurred”.
The main purpose of the Problem Management process is to reduce the number of recurrent incidents in the IT environment and the negative impact they have.
The purpose of Problem Management is achieved through:
- Preventing incidents from arising through rectifying the basic cause
- Minimizing the negative impact on the business from incidents which cannot be prevented
A problem is the unknown cause that something has happened. It is not restricted solely to the IT environment. As the process has the tools required, it can also be used to resolve problems which have arisen elsewhere in the IT organization.
The Problem Management process comprises the tools and activities that are required to find the fundamental cause that something has occurred, and defines which measures are required to rectify the problem.
The process also comprises implementing the solution according to relevant procedures. Specifically, those included in the Change Management and Release Management processes. The scope also includes providing a list of known errors and temporary solutions for the rest of the IT organization.
Reactive and proactive problems
Reactive problem records are incidents which first or second line support have not succeeded in resolving. In other words, there is an ongoing disturbance in the IT environment which means that this type of problem is urgent until a temporary solution has been found and the incident is resolved.
Proactive problems are all other problem records. These are usually registered on the basis of recurrent incidents where the cause is unknown, but it can also be individual events where the cause needs to be identified. Proactive problems cost time and money to resolve. These needs are prioritized together with all other issues within the IT organization, which means that they should be managed by the concerned function in Continual Service Improvement.
Known errors and temporary solutions
Not all problems in the IT environment have to be resolved. There might be various reasons: for example, that a new release is planned within the near future which will resolve the problem anyway, or that the cost of investigating or resolving the error is so high that the business feels it is not worth it. Regardless of reason, the error should be described and stored in a database or in a document so future issues received can be linked to an error already known.
There should be a temporary solution for each known error. This usually consists of instructions which enable users to work around the error in the IT environment and thus continue working. In certain issues, the temporary solution may be to inform the user that there is an error and that the customer has decided not to do anything about it. Examples of this might be minor bugs in an application which are detected in connection with deployment, but nevertheless approved by the customer.
Used optimally, the list of known errors is a complete record of all errors in the IT environment. The list is an invaluable aid for Service Desk in ensuring that time is not put unnecessarily into errors which have already been identified. The list also functions as an input for Continual Service Improvement of the services.