Developing error tolerant workplace systems

What is error tolerance?

Error tolerance refers to the ability of a system to remain functional even after an error. In other words, it is the ability of a system, or the component therein, to continue normal operations despite the presence of erroneous inputs.

An error tolerant system is one where the results of committing errors are relatively harmless. Despite evident errors, the intended result may still be achieved with either minimal or no corrective action by the user/operator. An example of building error tolerance is the scheduled inspection and maintenance program for electrical transmission line poles. For example, allowing multiple opportunities to identify fatigue cracks, degree of pole sag etc, before they become critical.

Individuals are amazingly error tolerant, even when physically damaged. We are extremely flexible, robust, creative, and skilled at finding explanations and meanings from partial and noisy evidence. The same properties that lead to such robustness and creativity also produce errors. The natural tendency to interpret partial information can cause operators to misinterpret system behavior in such a plausible way that the misinterpretation can be difficult to discover. Therefore, designing systems that predict and capture error; in other words that contain multiple layers of defences, are more likely to prevent accidents that result from human error.

The typical features of an error tolerant system include:

  • Open and transparent error reporting programs not focused on culpability and blame.
  • Human factors training with the specific application of error identification, capture and management.
  • Non jeopardy based observational auditing programs that examine threat and error management skills of safety critical workers.
  • Strict adherence to standard operating procedures (SOP’s) and standard communication phraseology.
  • Human centered design of equipment.
  • Systems to continually learn the lesson of previous incidents.
  • Using automation where possible, particular for routine and monotonous tasks that overly rely of operator vigilance.

 Group Discussion:

  1. Examine the Error Tolerance Checklist below and answer the questions as they apply to any workplace you are familiar with?
  2. How does the organisation rate regarding the typical features of an error tolerant system? What is present and what is missing?

Table 1. Error Tolerance Checklist

ERROR TOLERANCE CHECKLIST
10 typical features of an Error Tolerant System:
Examine the following questions and apply them to your organisation in regard to the degree of implementation (Is it present?) and the degree of effectiveness (Does it work?)
Level of Implementation

Full 5 4 3 2 1 Nothing
Level of Effectiveness

Full 5 4 3 2 1 Nothing
1. Are there open and transparent error reporting programs not focused on culpability and blame, where people feel free to honestly report any errors made?
2. Is there formal Human Factors/Non Technical Skills training programs with a particular applied focus on teaching individuals practical error identification, capture and management techniques?
3. Is there formal non jeopardy based assessment of Human Factors/Non Technical Skills applied by safety critical workers as part of scheduled monitoring (safety observations, audits etc) programs?
4. Is there strict adherence to standard operating procedures (SOP’s) and standard communication phraseology?
5. Is there a formal process to ensure that any newly procured equipment/plant or facilities to be used by safety critical workers, follows human centred design principles?
6. Are there formal systems to continually learn the lesson of previous incidents so that they are not repeated?
7. Is there a commitment to use automation where possible, particular for routine and monotonous tasks that overly rely on operator vigilance?
8. Are there installed systems of authentication and authorisation so that people can't do things without a specifically granted permission (Access Permits, System Authority etc). For example, a new employee who doesn't have system authority to accidentally delete your customer database.
9. Is critical equipment/plant designed with automatic shutdown modes that turns off if the user does something unsafe?
10. Is critical equipment designed with constraints that prevent mistakes. For example, a battery unit that cannot be installed incorrectly due to its shape.
Score Interpretation
20- 30: Jurassic Park!
31- 40: You are very vulnerable.
41- 60: Not bad, but there is a long way to go.
61- 85: You are in good shape but don't forget to be uneasy
86- 100: As error tolerant as any organisation can be, but no one is perfect!
Sub Total: Sub Total:
Total Score / 100

Key Points

  • Error tolerance recognises the “normalisation of error” principle.
  • Training teams and individuals to manage error has proven to be an effective strategy in many safety critical industries.
  • A non-punitive approach to error is crucial.
  • Positive examples of how errors are detected and managed should be conveyed to ensure the maintenance of a learning culture.

Want to know more?

For more in depth information about how to develop error tolerant systems in your workplace contact Leading Edge Safety Systems. We are a group of highly qualified and experienced human factors experts, with second to none experience in a range of industries with a proven track record of providing practical solutions to addressing key safety, risk and human factors challenges in the workplace.

Developing error tolerant workplace systems

Leave a Reply

Your email address will not be published. Required fields are marked *