Jason K. Firth, C.E.T.

Instrumentation, Control, and Automation

An Uber fatality and the limitations of automation -- and the amazing powers of your human operators

Mar 222018

I'm Jason Firth.

Recently, there was a fatality in the news, as an Uber automated vehicle hit a pedestrian who was crossing the road.

The circumstances seem to be that the pedestrian carrying a bike crossed the road without seeing the car, and the car basically drove right into the young woman.

A lot of people seemed shocked that the car didn't recognise the young woman was there, and didn't immediately brake or swerve. One person invoked "fail safety", the idea that equipment should always default to the safest state.

This case is, in my estimation, more complicated than you'd think. It's true you want things to fail safe, but it isn't always clear what a fail safe state is.

I'll give you an example.

In a commonly told but apocryphal story, boiler maintenance was being done at the paper mill in Fort Frances Ontario Canada. (You've heard this story from me before) The design of a boiler (at least a recovery boiler like this) is you have tubes filled with water and steam surrounding a combustion chamber. Usually, you'll have a drum called the mud drum that contains a certain level of water. If that level is too low, that's normally considered an emergency situation. In this case, the maintenance they were doing required the mud drum to be empty, and they were still firing the boiler.

The story goes, a new operator came on shift and saw the mud drum was empty and immediately panicked. The operator immediately opened the water valves wide open (what would normally be considered 'fail safe'), and the boiler immediately exploded.

Why did that happen? What happened is the boiler tubes were red hot and virtually unpressurised. When cold water hit the tubes, the water immediately caused an explosive release of steam which caused an explosion. While the involvement of a person is unusual, boilers routinely experience explosions due to water valves having problems like this. If the boiler was running under normal conditions, perhaps dumping water into the tubes would be a safe option -- cooling everything and getting everything to a zero energy state faster.

So despite the valve opening being what you'd normally consider a 'fail safe' state, in this case it was a dangerous action to take.

Let's assume for a moment that both the car and the driver had perfect vision in that moment, and saw the pedestrian long before the moment of impact.

What is the safest action to take if you see someone crossing the street? Everyone here is immediately saying "obviously slam the brakes and swerve!", but let's think about that for a second. Most people are not going to walk directly into the path of an oncoming vehicle. Even if crossing, you'd expect a person to stop, so you can't necessarily use the fact that there's a person there to predict what's going to happen. By contrast, what happens if you slam the brakes and swerve every time you see someone crossing the street a little too close? If there's a car near you, it could cause an accident. If the person was going to stop, then that person could end up getting hit by your actions where they might not otherwise. The driver or passengers in the car might be injured -- probably for nothing, because 99 times out of 100, the person will stop before the car hits them. Often, the safest act is to do nothing.

Here's where there is a divergence between the powers of an AI, and the powers of a human. An AI sees object 15 -- perhaps even a human on bike type object -- travelling at a certain speed at a certain vector. It has to figure out what it can from relatively limited information. By contrast, a human sees a sketchy looking lady walking in a strange way not paying attention. The AI might not recognise there's a threat, whereas the human might recognise something isn't right and take the opportunity to take some of those more aggressive defensive manoeuvres for this isolated case. It isn't just object types and vectors, it's a vivid world of information and context.

Our powers of intuition, empathy, and deduction are much more than we give ourselves credit for. We know more than any purpose built AI, and can make connections that no purpose built AI presently can. Humans aren't perfect, but there's reasons why we still have humans involved with even the most high tech processes.

It's ironic to say this as an automation guy, but the world is about to realize the limitations of automation, as it comes closer and closer to our personal lives.

As interesting as this story is on it's own, I feel it's also interesting to show the limitations of raw automation in the industrial context as well. Sometimes, operations asks for a system that reacts to something the human knows but the machine does not. If you're not careful, you cause false positives and react dramatically to situations that don't exist based on assumptions, causing more problems than you'd prevent.

One I saw for a while was an operator pointing to a spike on a graph and going "That's because of [event], we need to prevent that." Then you'd go down the graph and find another spike and go "Is this [event]?", they'd say "no". You'd go down a little further and say "how about this? Is this [event]?", and they'd say "no". It turns out that the reason the operator knows what's going on is that the operator is a human with eyes and ears and an incredibly versatile mind that can understand things far beyond a series of numbers plotted along a graph. Short of dramatic changes to the process, the PLC can't know that [event] has occurred with any sort of certainty.

Thanks for reading!

 

Blue skies, green Fields

Aug 242017

I'm Jason firth.

One commonality I notice when people ask me to help solve a problem is that quite often they explicitly limit solutions to "what sort of control systems can we install?" Type queries.

I immediately force myself to ignore the question as presented, because of the limits it puts on the creativity we can use to solve problems.

Occasionally, we can introduce a new and innovative control system to solve a problem, but just as often, we need to take a step back and re-examine the problem. Sometimes we can solve a problem by providing more data to operators, or by making it easier to follow procedure using their current user interface. Sometimes we need to inform rather than control. Sometimes we need to analyze in a new way. Sometimes it's a maintenance problem and fixing a chronic problem will help. Sometimes there's no problem at all and things must be operated on a certain way for safety or operational reasons.

By looking at problems outside of their ostensible technical scope, we can see the systems involved. We can ask questions we might not have asked otherwise: systems involve processes, equipment, operators, procedures, user interfaces, and control systems. Sometimes the answer comes from looking at the whole picture rather than a small piece.

Looking at problems this way also provides new opportunities. A few years back, I was asked to investigate problems with a certain Historian in gathering process critical data. What I discovered was that we were asking the historian to do something incompatible with its design. Historians consist of dozens of working parts, all of which need to function for data to be saved and retrieved. Instead of fighting the historian to conform, we created a new system which consisted of a single simple program with one purpose. Instead of requiring dozens of systems to work, suddenly we only needed two: retrieval and storage. Once we created this new system, we were able to extend it to automatically produce files for regulatory reporting -- an unexpected boon which saved the site time and increased accuracy.

This provides new opportunities for a shop. Many people want their shop to limit its influence to "what control systems can we install", but by looking at a strategy which embraces increased responsibility and increased work in service to other groups, new opportunities arise, because it's all connected.

Everyone wants to find a new and innovative and cool control system, but sometimes you need to step back from that well trodden lot, and look at the areas nobody is looking, where there are blue skies and green fields, waiting for someone.

Thanks for reading!

All you need to know about PID controller

Feb 272017

27 Feb 2017

I'm Jason Firth.

 

I recently commissioned this article explaining the function of a PID controller by freelance writer Sophia O'Connor. It's one of a few pieces I've commissioned recently. It's partially a test to see how well commissioning freelancers can work, and partially a public service to get some stuff written about some basic concepts. Enjoy!

 

A proportional integral derivative (PID) controller is an instrument that is used mainly in the industrial control applications. PID controller involves three controllers i.e. p-controller, D-controller and I-controller. All these controllers are combined in a way that they produce a control signal. The main purpose of using a PID controller is to control the speed, temperature, pressure, flow and other variables that needs to be processed. It can be installed near the control regulation devices. Moreover, a PID controller is monitored through an SCADA system.

Working of a PID controller:

As explained above, a PID controller involves the working of three different controllers that are combined together to perform different tasks. The main purpose of installing a PID controller is to control the operations. Although a simple machine with the ON and OFF option can be easily used for this purpose. However, when it comes to something complex, the only thing that can be used is the PID controller. It will provide with the maximum opportunity to control the overall system.

A PID controller is responsible for the controlling of the output. Moreover, the desired output can also be achieved with the help of this. The three basic controls have their own working in the PID controller, they all work together to achieve a common goal. The working of these controls is explained below:

Functions of the Proportional controller:

P-controller is responsible for providing the output that is required. The output that is achieved is proportional to the current error value. The main working of a P-controller involves the comparison of the desired set point with the actual value or the value that is achieved through the feedback process. So, if the error value of this controller is zero, the output value of the controller is also zero. Moreover, this type of controller requires a manual resetting every time.

Functions of the Derivative controller:

The requirement of the controlling system involves the prediction of the future behaviour as well. This will not be done with the I-controller. D-controller is the one that will solve this problem. The output value of this controller is dependent on the rate of change of error with the time. It works as a kick start for the output system hence increasing its system response.

Functions of the integral controller:

There are certain limitations with the p-controller that are fulfilled with the help of I-controller. It is needed in this controller system because it will provide with necessary actions that are required for the elimination of the steady state error. It is responsible for integrating the error for a period of time so that the error value reaches to zero value.

 

All of these controller works together to form a perfect controller that can be used in the process control application.

 

 

Thanks for reading!

Therac-25, a study in the potential risks of software bugs

Dec 062016

December 6th, 2016

I'm Jason Firth.

 

It's unfortunately common to find that people don't appreciate the risks involved with software, as if the fact that the controls are managed by bits and bytes changes the lethal consequences of failure.

A counterpoint to this is the Therac-25, a radiation therapy machine produced by Atomic Energy of Canada Limited -- AECL, for short.

The system had a number of modes, and while switching modes, the operator could continue entering information into the system. If the operator switched modes too quickly, then key steps would not take place, and the system would not be physically prepared to safely administer a dose of radiation to a patient.

Previous models had hardware interlocks which would prevent radiation from being administered if the system was not physically in place. This newer model relied solely on software interlocks to prevent unsafe conditions.

There were at least 6 accidents involving the Therac-25. Some of these accidents permenantly crippled the patients or resulted in the need for surgical intervention, and several resulted in deaths by radiation poisioning or radiation burns. One patient had their brain and brainstem burned by radiation, resulting in their death soon after.

There were a number of contributing factors in this tragedy: Poor development practices, lack of code review, lack of testing, and of course the bugs themselves. However; rather than focus on the specifics of what caused the tragedy, what I want to show is that what we do is not just computers -- it's where rubber meets road, and where what happens in our computers meets the reality. People who would never dream of opening a relay cabinet and starting to rewire things would think nothing of opening a PLC programming terminal and starting to 'play'.

Secondly, part of the problem is people who didn't realise that they were controlling a real physical device. There are things to remember when dealing with physical devices: For example, that no matter how quick your control system, valves can only open and close so fast, motors can only turn so fast, and your amazing control system is only as good as the devices it controls. Because the programmer forgot that these are real devices, they forgot to take that into account, and people died as a result. This holistic knowledge is why journeyman instrument techncians and certified engineering technologists in the field of instrumentation engineering technology are so valuable. They don't just train on how to use the PLC, they train on how the measurements work, how the signalling works, how the controllers work (whether they are digital or analog in nature), how final control elements work, and how processes work.

When it comes to control systems, just because you're playing with pretty graphics on the screen doesn't mean you aren't dealing with something very real, and something that can be very lethal if it's not treated with respect.

Another point that's near and dear to my heart comes in one of the details of the failures: When there was a problem, the HMI would display "MALFUNCTION" followed by number. A major problem with this is that no operator documentation existed saying what each malfunction number meant. I've said for a long time in response to people who say "The operator should know their equipment", that we as control professionals ought to make the information available for them to know their equipment. If we don't, we can't expect them to know what's going on under the surface. If the programmer had properly documented his code, and properly documented the user interface, then there may have been a chance operators would have understood the problem earlier, preventing lethal consequences.

 

Thanks for reading!

 

full report