Software Update Shuts Down Nuclear Power Plant 355
Garabito writes "Hatch Nuclear Power Plant near Baxley, Georgia was forced into a 48-hour emergency shutdown when a computer on the plant's business network was rebooted after an engineer installed a software update. The Washington Post reports, 'The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.' Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."
Misreading of the Article (Score:5, Interesting)
I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked.
(I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)
Terminal Error (Score:2, Interesting)
The problem is the update - not business network (Score:5, Interesting)
This is why... (Score:4, Interesting)
I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well.
Re:Misreading of the Article (Score:1, Interesting)
It actually said that the rebooting triggered a problem in which values could not be read.
I feel so fucking vindicated [slashdot.org]:
Looks like necrogram or somebody with his attitude is responsible for this.
Business Network? (Score:5, Interesting)
From the summary: If it's monitoring the primary control system then it seems to me like the machine would have to be on the control network. The real issue is why did the primary control system accept a reset from a monitoring system. It sounds like there's more than one bug to track down.
Re:Only the biz machine was updated. Why trouble? (Score:4, Interesting)
Such systems already exist. (Score:4, Interesting)
Re:Install Complete... (Score:2, Interesting)
Oh wait, you can't; they were all blowed up.
Re:Water (Score:1, Interesting)
Also, if the water level is too high in the steam generator (or pressure vessel, in the case of a BWR), you will get water droplets mixed in with the steam going to the turbines. This is a good way to damage turbine blades.
Third, if you're concerned about maintaining a BWR subcritical, you shouldn't let the water level get too high. The water surrounding the core acts as a reflector, decreasing neutron leakage. So, higher water level leads to increased reactivity. In fact, my recollection is that, in some cases, the emergency operating procedures suggest lowering the water level in order to control reactivity.
On a different note, the reason this incident is somewhat concerning (to me, at least), is that the logic for the reactor protection system is supposed to be not only fail-safe but also fault-tolerant. There are typcially four independent channels, and the logic to actually get a scram is ((A || B) && (C || D)). So the question is, how did one computer failure cause multiple, supposedly-independent channels to indicate a scram condition?
Lastly, given the many statements suggesting that the electrical and software systems are on a hair-trigger, it's worthwhile to note that many mechanical failures don't require the plant to shut down immediately. The tech specs [nrc.gov] have the details. For example, the Hope Creek plant has been operating since Wednesday morning with one of it's Emergency Core Cooling Systems declared inoperable. That's right, they do not currently have a safety-rated system capable of injecting water when the reactor is at operating pressure. And they're allowed, by law, to operate like this for two weeks.