Point and Call

2021-09-14

It’s 2AM. You’re paged to respond to a failing set of components that you are the Subject Matter Expert (SME) for. Sleepy, you load up the playbook for when the SplineReticulatorBlocked alert has gone off, and start executing. The Incident Commander (IC) is vaguely aware of what you are doing, and checks in now and then.

Unfortunately, you’ve run out of Club-Mate. Under caffeinated, you mistype some commands and not only have splines stopped reticulating, but now QuigleyMatrixInsufficientlyObfuscated is firing as well! Your SEV2 has graduated to a SEV0. Executives are involved on the call. How could this have been prevented?

Background

This is a silly example, of course, but it does happen. Incidents and complex systems don’t mix well, especially when commands in a console or similar need to be executed (as opposed to sufficient (semi)-automation).

I’ve been an IC and SME in incidents, and in organisations that are not very operationally mature, there is a sense of “get out of the way of the responders”. This can be fine, but it can obfuscate actions taken by responders from the IC. This makes it harder to satisfy a Conditions-Actions-Needs report – especially Actions.

Additionally, responders may not be at full cognitive capacity – there is plenty of literature around how sleep deficits result in worse decision making, especially during emergencies. In order to mitigate mistakes made by responders, we need to adopt a system that improves responder alertness, as well as informs ICs on what is being done (or going to be done).

The is the pointing and calling method.

Calling Out Actions

Pointing and calling is straight forward – “point” at an important indicator and “call” its status. For train drivers, this is typically done with signals and passenger doors. For system operators, this can be a number of things.

Most commonly, you want to “point” to a metric, alarm, dashboard etc, and “call” out the action you are going to take. It is important that this is done before the action is taken.

As an example, here is a chat-like interface sample:

[2021-09-14 21:31:47] <alice> quig matrix obfuscation % is at 12%. It should be at 95%.
[2021-09-14 21:31:48] <alice> in the quig matix console, I am going to run `QuigleyMatrix.obfuscate_more!`

The IC can then reply, acknowledging the action by repeating, then allowing them to go ahead:

[2021-09-14 21:31:49] <bethany> thanks alice, you are going to run `QuigleyMatrix.obfucsate_more!` in the qmc.
[2021-09-14 21:31:50] <bethany> please go ahead with this action

Alternatively, the IC may decide that they need more information about the action before it is taken:

2021-09-14 21:31:49] <bethany> thanks alice, before you go ahead, what does this command do, and what impact are we expecting?

This is an important step for an IC to take. They may feel the action is not sufficiently described, or it is not clear that the causal link is between the action and remediating the issue. In high pressure situations, clarity of communication is important.

Finally, the responder can execute the action, and update the condition.

[2021-09-14 21:31:50] <alice> i have run `QuigleyMatrix.obfucsate_more!` in the qmc`.
[2021-09-14 21:31:52] <alice> quig matrix obfuscation % is now at 33% and rising.

Review

By pointing and calling actions during an incident, we create an inflection point where another can give us a stop/or go signal, much like the railways where this methodology was invented. It improves clarity of communication during an incident, while also improving responder alertness.

It is not universally appropriate, of course – in cases where time is of the essence, it may be necessary to execute first and document later. However, this should not be standard procedure – in the majority of cases, there is always time to stop, point and call, and get some peer review on the action you wish to take.

Thanks to Raul Murciano and Kenny Parnell for proofreading!