www.panelsoft.com

 

 

 

Home

Training

Reading

PanelSoft

User Interfaces and Usability for Embedded Systems


Safety Critical Devices

Safer Systems Through Better User Interfaces appeared in Embedded Systems Programing magazine, and probably summarises my best writting on the topic of safety.


The World Wide Web Virtual Library page on Safety-Critical Systems gives a huge range of links to other safety related resources.


The Ariane 5 explosion as seen by a software engineer is an interesting insight into the loss of an unmanned rocket, including a discussion of the formal methods used, and the float to int cast that caused the explosion.

A more detailed discussion of that incident is given in ARIANE 5 Flight 501 Failure Report by the Inquiry Board.


Another Space Storey. This insightful article discusses some of the people factors when a team has to produce ultra reliable software. Burning the midnight oil might get a product out fast, but the slow burn is more important when the product has to be perfect. Read http://www.fastcompany.com/online/06/writestuff.html for a description of the kind of people you will find writing software for the space shuttle.


If you are using Commercial Off The Shelf (COTS) software, such as an RTOS or graphics library, in medical devices which are to be submitted to the FDA then you may be interested in a document the FDA has released called Guidance for Off The Shelf Software Use in Medical Devices.


The Therac-25 Medical Device

The Therac-25 was a cancer irradiation device whose faulty operation led to a number of deaths. One of the safety features in the original design was that all of the settings for the device had to be entered through a terminal as well as on a control panel. This was seen as redundant by users of a prototype. It was "redundant" in the best sense of the word, but this was not appreciated by the users, who assumed that the safety of the equipment was beyond doubt. The design was changed before release so that the settings could be entered on the terminal alone. Once the settings were accepted by hitting the return key, the user was asked to confirm that the settings were those that were actually required. This confirmation was performed by pressing the return key again. This extra step was considered a replacement for the original second interface.

Unfortunately users soon learned to press the return key twice in succession, since they knew that they would always be asked for confirmation. The two presses, similar to a double-click on a mouse became a single action in the mind of the user, and no actual review of the settings was performed. Due to a bug in the software, some of the settings were, occasionally, not properly recorded. The bug was a race condition created because proper resource locking of the data was not exercised. Since the cross check of having the settings entered twice had been removed the fault was not detected. This was a case where the design was altered to favor usability, but the safety of the device was compromised.

It is fair to say that if the rest of the design had been sound then removing the second set of inputs would not have been significant, but the whole point of having a safety infrastructure in place is to allow for the times when something does go wrong.

Another point to note from this example is that the later design was also more susceptible to the simple user error of the user entering a wrong value. If the user has to enter the value twice on two different displays the chances that the same wrong value being entered would be slim. The software would have detected the mismatch and not applied either set of settings. It is often the case that safety measures serve the dual purpose of protecting against either device error or user error. In intensive care medical ventilators the pressure rise in the patient's lung is a function of the volume of the lung and the volume of gas added. There is a pressure valve which opens at a fixed pressure limit. Once the valve is open an alarm will sound and the patient is exposed to room air pressure in the fail-safe state. This serves to protect the patient against an electronic or software fault which may cause a large volume to be delivered. it also protects the patient from a user setting 3.0 liters rather than the intended 0.3 liters. This brief description only touched on one aspect of the failures of the Therac-25. See An Investigation of the Therac-25 Accidents by Nancy Leveson and Clark S. Turner for a full description of the accidents and their causes.


Aviation Safety

The Aviation Safety Reporting System Home Page contains a huge amount of information related to aviation safety in the United States. Their newsletter, DirectLine, summaries many of the findings. The March 1997 issue provides, sometimes amusing, accounts of safety hazards caused by uncooperative or unfortunate passengers. The NASA Aviation Operations Branch site contains a number of aviation safety related publications including "OOPS, IT DIDN'T ARM." - A CASE STUDY OF TWO AUTOMATION SURPRISES


Comp.Risks

The newsgroup comp.risks carries a moderated discussion of safety and risk related issues. Areas as diverse as aviation safety, invasion of privacy, computer viruses and fraud are discussed in a reasonably technically sophisticated manner. Crucial reading if the systems you are working on may cause risk to life, money or the fabric of society. Peter Neuman is the group moderator, and author of 'Computer Related Risks'. Peter's homepage is also a mine of information on this area.


[PanelSoft Home | Training Courses ]