Risk Assessment #

TLDR: If you choose to only assess risk once, make sure it is after the implementation is completed
Rationale: To make sure we are aware of the risks of a change and can mitigate them

When handling Change Management the size of the tasks should be a consideration of what and where you assess the risk. Preferably each change is divided into smaller tasks which are assessed individually. This is to make sure that the risk assessment is as accurate as possible and it makes change management more effective and easier to handle.

No risk #

If a task is assessed to have no risk, then it can be handled without any further action. Just note on the task that assessment has been done and that it has no known risks. “Risk assessment done and found no risk”

Risk assessment #

There are a few things to consider when assessing risk.

When in the Change Management routine should we assess risk? #

The requirement is to do risk assessment. Whether you choose one or both of the following is up to you.

There are two main times when we can/should assess risk:

  • When the change is ready for implementation
  • When the change is implemented

The first way of risk assessment happens when you specify the change and break it into tasks. This is when you might find and detect risks that you can mitigate before the change is implemented.

The latter happens when the change has been implemented. Implementation might reveal new risks that you didn’t know about before. It is almost never as simple as specified. Implementation might have altered parts of the system that might introduce new risks.

You can choose to assess risk at one of these times or both. You can use the same templates(found below) to document the risk assessment.

If you choose to only assess risk once, make sure it is after the implementation is completed

If you choose to assess risk at both times, then you should link them in the ## Meta section.

Systems affected #

  • Which parts of the system are affected by the change?
    • Is it a small part of the system or is it a large part of the system?
    • Does it affect several systems?
  • How important is that part and how much does it affect the system as a whole?
    • Does an error in that system affect other systems?

Why should we consider systems affected?

It helps us see assess impact and the probability and mitigation.

Impact #

The change can be new functionality, extending existing functionality, bug fixes, or refactoring.

The impact of the change is dependent on the type of change. A new feature can have a large or small impact on the system, depending on how it is built.

If it is modular with few dependants, then the impact is low as it would be decoupled and would not(mostly not) affect other systems if it fails. If it is built and retrofitted to have other system depend on it, then the impact is high as it breaking this would break others.

When refactoring the impact is dependant on the level and quality of tests. If the tests and their coverage are good, then the impact can and should be be low. If the tests are bad or non-existent, then the impact is most likely high.

Bug fixes are generally low impact, but it depends on the bug. If the bug fix affects critical or high value parts of the system, then the impact is big.

Why should we consider impact?

It helps us see risk level more clearly.

Risk level #

Levels are used to determine the seriousness of the risk.

There are 3 levels of risk:

  • High
  • Medium
  • Low
{level} of seriousness. It is used when the impact of the risk happening is {level}.

Probability #

Probability is used to determine the probability of the risk happening. Determining the probability is done by assessing the systems affected and the change itself. This alongside level helps us determine the mitigation and its priority.

As with There are 3 levels of probability:

  • High
  • Medium
  • Low
{level} of probability. It is used when the probability of the risk happening is {level}.

Mitigation #

Mitigation is used to determine how to handle the risk or reduce the chance of it happening. It almost always involves taking extra care of something, changing something else or making it have less impact. These can range from code changes, to testing, to monitoring, to documentation, to communication, to training, to meetings, to anything else that can help mitigate the risk.

Example #

Description #

Introduce KPI capture during origination. The KPIs will be used to determine the quality of the origination process. See more on (ticket)[]

Systems affected #

  • Flow
  • KPI service (standalone)

Impact #

High impact if we allow updating of kpis to stop process. Should be fire and forget.

Risks #

Description Risk level Probability Mitigation Comment
Service errors stopping process Medium Medium - Use fire and forget when sending data. Catch and continue
Dirty task handlers Medium Low - Use dedicated event tasks for sending data to KPI database Beware of merge conflicts
Redudancy on state High Low - Create dedicated transformer function for sending data. Keep state clean

Templates: #

There are two templates you can use to document your risk assessment.

Table template #

## Risks

| Description | Risk level | Probability | Mitigation | Comment |
|-------------|------------|-------------|------------|---------|
|             |            |             |            |         |

Page template #

# Risk assessment of {title}
## Description
Briefly describe the change.
### Systems affected
- System A
- System B
- etc
### Impact
Describe the impact this change would have if something went wrong.
## Risks
For each risk, fill out the following template:
### Risk
### Risk level
High/medium/low
Also describe why it is high/medium/low
### Probability
High/medium/low
Also describe why it is high/medium/low
### Mitigation
- Mitigation 1
- Mitigation 2
- etc
### Comment
Comments can be used for things that don't fit in the other categories or for other things that are important to note.

© Stacc 2024, all rights reserved