An Uber fatality and the limitations of automation -- and the amazing powers of your human operators

March 22, 2018 - Reading time: 6 minutes

I'm Jason Firth.

Recently, there was a fatality in the news, as an Uber automated vehicle hit a pedestrian who was crossing the road.

The circumstances seem to be that the pedestrian carrying a bike crossed the road without seeing the car, and the car basically drove right into the young woman.

A lot of people seemed shocked that the car didn't recognise the young woman was there, and didn't immediately brake or swerve. One person invoked "fail safety", the idea that equipment should always default to the safest state.

This case is, in my estimation, more complicated than you'd think. It's true you want things to fail safe, but it isn't always clear what a fail safe state is.

I'll give you an example.

In a commonly told but apocryphal story, boiler maintenance was being done at the paper mill in Fort Frances Ontario Canada. (You've heard this story from me before) The design of a boiler (at least a recovery boiler like this) is you have tubes filled with water and steam surrounding a combustion chamber. Usually, you'll have a drum called the mud drum that contains a certain level of water. If that level is too low, that's normally considered an emergency situation. In this case, the maintenance they were doing required the mud drum to be empty, and they were still firing the boiler.

The story goes, a new operator came on shift and saw the mud drum was empty and immediately panicked. The operator immediately opened the water valves wide open (what would normally be considered 'fail safe'), and the boiler immediately exploded.

Why did that happen? What happened is the boiler tubes were red hot and virtually unpressurised. When cold water hit the tubes, the water immediately caused an explosive release of steam which caused an explosion. While the involvement of a person is unusual, boilers routinely experience explosions due to water valves having problems like this. If the boiler was running under normal conditions, perhaps dumping water into the tubes would be a safe option -- cooling everything and getting everything to a zero energy state faster.

So despite the valve opening being what you'd normally consider a 'fail safe' state, in this case it was a dangerous action to take.

Let's assume for a moment that both the car and the driver had perfect vision in that moment, and saw the pedestrian long before the moment of impact.

What is the safest action to take if you see someone crossing the street? Everyone here is immediately saying "obviously slam the brakes and swerve!", but let's think about that for a second. Most people are not going to walk directly into the path of an oncoming vehicle. Even if crossing, you'd expect a person to stop, so you can't necessarily use the fact that there's a person there to predict what's going to happen. By contrast, what happens if you slam the brakes and swerve every time you see someone crossing the street a little too close? If there's a car near you, it could cause an accident. If the person was going to stop, then that person could end up getting hit by your actions where they might not otherwise. The driver or passengers in the car might be injured -- probably for nothing, because 99 times out of 100, the person will stop before the car hits them. Often, the safest act is to do nothing.

Here's where there is a divergence between the powers of an AI, and the powers of a human. An AI sees object 15 -- perhaps even a human on bike type object -- travelling at a certain speed at a certain vector. It has to figure out what it can from relatively limited information. By contrast, a human sees a sketchy looking lady walking in a strange way not paying attention. The AI might not recognise there's a threat, whereas the human might recognise something isn't right and take the opportunity to take some of those more aggressive defensive manoeuvres for this isolated case. It isn't just object types and vectors, it's a vivid world of information and context.

Our powers of intuition, empathy, and deduction are much more than we give ourselves credit for. We know more than any purpose built AI, and can make connections that no purpose built AI presently can. Humans aren't perfect, but there's reasons why we still have humans involved with even the most high tech processes.

It's ironic to say this as an automation guy, but the world is about to realize the limitations of automation, as it comes closer and closer to our personal lives.

As interesting as this story is on it's own, I feel it's also interesting to show the limitations of raw automation in the industrial context as well. Sometimes, operations asks for a system that reacts to something the human knows but the machine does not. If you're not careful, you cause false positives and react dramatically to situations that don't exist based on assumptions, causing more problems than you'd prevent.

One I saw for a while was an operator pointing to a spike on a graph and going "That's because of [event], we need to prevent that." Then you'd go down the graph and find another spike and go "Is this [event]?", they'd say "no". You'd go down a little further and say "how about this? Is this [event]?", and they'd say "no". It turns out that the reason the operator knows what's going on is that the operator is a human with eyes and ears and an incredibly versatile mind that can understand things far beyond a series of numbers plotted along a graph. Short of dramatic changes to the process, the PLC can't know that [event] has occurred with any sort of certainty.

Thanks for reading!

 


Therac-25, a study in the potential risks of software bugs

December 6, 2016 - Reading time: 4 minutes

I'm Jason Firth.

 

It's unfortunately common to find that people don't appreciate the risks involved with software, as if the fact that the controls are managed by bits and bytes changes the lethal consequences of failure.

A counterpoint to this is the Therac-25, a radiation therapy machine produced by Atomic Energy of Canada Limited -- AECL, for short.

The system had a number of modes, and while switching modes, the operator could continue entering information into the system. If the operator switched modes too quickly, then key steps would not take place, and the system would not be physically prepared to safely administer a dose of radiation to a patient.

Previous models had hardware interlocks which would prevent radiation from being administered if the system was not physically in place. This newer model relied solely on software interlocks to prevent unsafe conditions.

There were at least 6 accidents involving the Therac-25. Some of these accidents permenantly crippled the patients or resulted in the need for surgical intervention, and several resulted in deaths by radiation poisioning or radiation burns. One patient had their brain and brainstem burned by radiation, resulting in their death soon after.

There were a number of contributing factors in this tragedy: Poor development practices, lack of code review, lack of testing, and of course the bugs themselves. However; rather than focus on the specifics of what caused the tragedy, what I want to show is that what we do is not just computers -- it's where rubber meets road, and where what happens in our computers meets the reality. People who would never dream of opening a relay cabinet and starting to rewire things would think nothing of opening a PLC programming terminal and starting to 'play'.

Secondly, part of the problem is people who didn't realise that they were controlling a real physical device. There are things to remember when dealing with physical devices: For example, that no matter how quick your control system, valves can only open and close so fast, motors can only turn so fast, and your amazing control system is only as good as the devices it controls. Because the programmer forgot that these are real devices, they forgot to take that into account, and people died as a result. This holistic knowledge is why journeyman instrument techncians and certified engineering technologists in the field of instrumentation engineering technology are so valuable. They don't just train on how to use the PLC, they train on how the measurements work, how the signalling works, how the controllers work (whether they are digital or analog in nature), how final control elements work, and how processes work.

When it comes to control systems, just because you're playing with pretty graphics on the screen doesn't mean you aren't dealing with something very real, and something that can be very lethal if it's not treated with respect.

Another point that's near and dear to my heart comes in one of the details of the failures: When there was a problem, the HMI would display "MALFUNCTION" followed by number. A major problem with this is that no operator documentation existed saying what each malfunction number meant. I've said for a long time in response to people who say "The operator should know their equipment", that we as control professionals ought to make the information available for them to know their equipment. If we don't, we can't expect them to know what's going on under the surface. If the programmer had properly documented his code, and properly documented the user interface, then there may have been a chance operators would have understood the problem earlier, preventing lethal consequences.

 

Thanks for reading!

 

full report