Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Software Power Security

Software Update Shuts Down Nuclear Power Plant 355

Posted by Soulskill
from the we-have-safety-systems-because-we-are-very-stupid dept.
Garabito writes "Hatch Nuclear Power Plant near Baxley, Georgia was forced into a 48-hour emergency shutdown when a computer on the plant's business network was rebooted after an engineer installed a software update. The Washington Post reports, 'The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.' Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."
This discussion has been archived. No new comments can be posted.

Software Update Shuts Down Nuclear Power Plant

Comments Filter:
  • by Anonymous Coward on Friday June 06, 2008 @07:02PM (#23689357)
    Must restart reactor to complete software installation.

    [Yes] [No] [OMFG!]
  • by Anonymous Coward on Friday June 06, 2008 @07:03PM (#23689371)
    I'd rather it shut itself down then suffer major failure.
    • by xlv (125699) on Friday June 06, 2008 @07:44PM (#23689711)

      I'd rather it shut itself down then suffer major failure.
      Personally, I'd rather it doesn't suffer a major failure at all, whether it's after a shutdown or not. Oh you meant than and not then, never mind...
  • by Enderandrew (866215) <.enderandrew. .at. .gmail.com.> on Friday June 06, 2008 @07:04PM (#23689375) Homepage Journal
    Adds a whole new meaning to "Critical Update".
  • Fail-Safe (Score:5, Insightful)

    by lobiusmoop (305328) on Friday June 06, 2008 @07:07PM (#23689405) Homepage
    Personally, I am reassured that these reactors are designed to shut down at the drop of a hat. This is not a situation were fuck-ups should be masked, any discontinuity, however minor, really needs to be highlighted and dealt with immediately.
    • Re: (Score:2, Funny)

      by Sitnalta (1051230)
      Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.
      • Re:Fail-Safe (Score:5, Insightful)

        by snkline (542610) on Friday June 06, 2008 @07:23PM (#23689545)
        Umm, yes you do. If something in the system is shit, you don't want the reactor ON!
      • Re:Fail-Safe (Score:4, Insightful)

        by NMerriam (15122) <NMerriam@artboy.org> on Friday June 06, 2008 @07:25PM (#23689569) Homepage

        Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.


        On, the contrary, shutting down because the system is shit sounds like a much better option than continuing to run despite the shittiness of the computer monitoring everything.

        Of course, the ideal situation would be to have good computers that only get updated in scheduled, planned ways so that you don't have the issue at all. But shutting everything down when something is amiss is the only sensible response.
    • by Drenaran (1073150) on Friday June 06, 2008 @07:48PM (#23689745)
      The problem here is that the system didn't shut down because it detected an error in the data collection system, instead it incorrectly detected a problem that did not in fact exist and then proceeded to take action. While the engineer in me is fairly certain that the system is designed to always fail to a safe state (as in, any automatic emergency response couldn't accidentally make things worse - at least not without raising all sorts of alarms), it is still concerning that internal control systems can be so effected by external servers.

      In the article they mention that the system wasn't designed for security (since it was meant to be internal) - but this isn't a security issue at all! Any sort of system that relies upon other systems should be designed to assume failure can and will occur in other systems - that is not to say that it needs to verify/evaluate incoming data to make sure it is "good", but rather that it can tell the difference between receiving data (such as current water levels) and receiving no data at all (system failure). Once it has that it can ideally automatically switch to a backup system, or do what it did here and enter a fail-safe state (the difference being that it does so while pointing out the actual problem and not a incorrectly perceived problem in a different part of the system).
      • Re: (Score:3, Insightful)

        by Dachannien (617929)

        instead it incorrectly detected a problem that did not in fact exist

        This might be splitting hairs, but I'd say it correctly detected a data inconsistency and responded appropriately. There could be a dangerous condition that is indistinguishable to the failsafe system from what actually happened - and it could be a condition that nobody's ever thought of before. It's far better to trigger the failsafe when a data inconsistency has occurred than to make a potentially incorrect automated judgment concerning the cause of the inconsistency leading to a more severe problem do

    • Re: (Score:3, Insightful)

      by distantbody (852269)
      The problem isn't that it shut down-- that's fine; the problem is that a software update for a nuclear power plant was actually allowed to produce an unexpected/unplanned event!
  • To me it sounds much more like they have a bad system design if it's impossible to reboot one of the machines / it can't run with one of them offline. Not something which are to blame on the software update (shouldn't such things be expected anyway?)

    I guess "software update" can have been used to bash Microsoft a little or something, not that it say windows update, or maybe the poster hates all kinds of software updates?
    • by RiotingPacifist (1228016) on Friday June 06, 2008 @07:20PM (#23689515)
      The only safe way to update a system is a reboot, sure you CAN do some stuff on linux bsd etc to avoid having to reboot( hell this was probably running some unix derivative so it was probably possible to do the update without rebooting), but you wouldn't want to run the risk of introducing an unchecked bug by doing a live update. when your choices are:
      a) high chance of accidentally shutting down a reactor harmlessly
      b) small chance of fucking up a nuclear reactor
      you'll always go for (a), if your sane.
  • by Anonymous Coward on Friday June 06, 2008 @07:12PM (#23689461)

    "Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."
    The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.

    I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked. ;-)

    (I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)
    • The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.

      No, actually, the summary says "when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs"... That doesn't really have much to do with the reboot itself (causing the computer to be unreachable or whatever) but th

    • Re: (Score:3, Funny)

      by beakerMeep (716990)
      So you're saying it was an EPIC auto-fail?
  • Terminal Error (Score:2, Interesting)

    Reminds me of Terminal Error [yahoo.com].
  • by mathfeel (937008) on Friday June 06, 2008 @07:13PM (#23689467)
    did it run Windows?
  • EULA! (Score:5, Funny)

    by bluephone (200451) * <grey@burntel[ ]rons.org ['ect' in gap]> on Friday June 06, 2008 @07:20PM (#23689519) Homepage Journal
    It says right in the EULA that it's not to be used in a nuclear power plant!
  • This was Good (Score:4, Insightful)

    by snkline (542610) on Friday June 06, 2008 @07:21PM (#23689531)
    While perhaps the system should be designed to behave differently, what happened here was a good thing. When things went wrong, rather than the reactor systems freaking out and doing random crap, they were properly designed to shift to a known safe state (i.e. Shut the hell down).
  • by markdj (691222) on Friday June 06, 2008 @07:21PM (#23689537)
    I write this type of software for a living so I know that having a computer on the business network connected to the control computers is a risk, bur that risk can be managed. The problem here is that the software update wiped out the nuclear control system data. This exposes two bad problems. First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.
    • by dissy (172727) on Friday June 06, 2008 @07:30PM (#23689617)

      First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road.
      Oh sure, NOW you think of a debian slogan ;}

      • We liken that to changing your tire while driving down the road.

        Oh sure, NOW you think of a debian slogan ;}


        Good thing it wasn't written in Smalltalk. The slogan there is building the rest of the boat while underway.
    • by Ungrounded Lightning (62228) on Friday June 06, 2008 @07:46PM (#23689725) Journal
      Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

      I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet.

      I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet.

      "I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.)

      But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms.

      Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty?

      IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.

      Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.
      • by Platinumrat (1166135) on Friday June 06, 2008 @08:35PM (#23690021) Journal

        Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one. I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet. I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet. "I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.) But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms. Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty? IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design. Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.
        I agree with the previous post. In railway signalling (at least outside of the USA) formal safety processes must be followed with software design and configuration. Part of that is a formal hazard analysis. There are various Safety Integrity Levels(SIL) for systems that are applied to different control and monitoring components (SIL-0 being lowest to SIL-4 for stuff that can kill people if it goes wrong). There is no condition under which it is even a acceptable for a business system to feed vital sensor data for the control system. This should always be a hazard analysis performed when making any changes to a control system, at which point this sort of thing should have been detected.
        • Re: (Score:3, Informative)

          by Anonymous Coward
          There are such requirements in the US, be they for SIL ratings, performing haz-op reviews, etc. Particularly in nuclear apps.

          In a plant, not all control systems are SIL rated, but the safety backups usually are....though more and more operators are buying or upgrading to SIL qualified systems and extending SIL to other than just the safety and protection backups.

          In this case, the engineers were probably asleep at the wheel and didn't realize the changes they made to the control software impacted the trip
    • The thing I'm a bit puzzled about. . . if this system has data which is so important that the whole plant must be SHUT DOWN for two days if it fails, then why aren't there *at least* TWO of them (I'd say there's a good argument for 3 or 4, but. . .)? That way, you can take one out of the loop for updates, verify the update didn't hose your data, sync the data from the 'live' system, then put it online, take the other one offline, and complete the update on it.

      If I were the power co owning this plant, I'd be
    • The problem here is that the software update wiped out the nuclear control system data.
      Maybe it didn't pass WGA?
  • by layer3switch (783864) on Friday June 06, 2008 @07:25PM (#23689575)
    "... The move to SCADA systems boosts efficiency at utilities because it allows workers to operate equipment remotely."

    Another proof that Homer Simpson was truly ahead of his time. [wikipedia.org]

    Are you mad, woman? You never know when an old calendar might come in handy. Sure, it's not 1985 now, but who knows what tomorrow will bring? -Homer
  • by BlueParrot (965239) on Friday June 06, 2008 @07:25PM (#23689581)
    The chemical diagnostic data is damn important because it may determine things like corrosion rates and the amount of impurities circulating in the water, potentials for clogs etc... As with all other software, occasionally errors occur, and the appropriate way to respond when it does is to shutdown and blow some whistles as to ensure that the reactor is brought into a safe state before something else goes wrong. This is one of those cases where "Better safe than sorry" is a really rather good motto.
  • I'm gonna have to agree with that last statement in the summary. Basically under these circumstances, you take out the switch and you take out the plant and I doubt they guard the network closet as well as the reactor core. Plus the whole hacking thing. You really don't need to watch youtube videos and check your e-mail from a control computer and you can bring any actually needed updates and files to it manually via USB drive.
  • The summary said: when a computer on the plant's business network was rebooted after an engineer installed a software update

    We all know what really happened. Dude rebooted the computer so that Windows automatic update reminder to reboot wouldn't interrupt his Solitaire game every 10 minutes.
  • This is why... (Score:4, Interesting)

    by rat7307 (218353) on Friday June 06, 2008 @07:47PM (#23689729) Homepage
    This is why you keep the IT nerds away from the process network.

    I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well.
    • by rat7307 (218353)
      ...although, after re-reading the story it's a little vauge...was he updating some random PC or was he actually updating the scada/process control software/firmware?

      If it's the latter, I feel for him :-) , but you have to do your homework before going all patch crazy!
  • From TFA

    In June 1999, a steel gas pipeline ruptured near Bellingham, Wash., killing two children and an 18-year-old, and injuring eight others. A subsequent investigation found that a computer failure just prior to the accident locked out the central control room operating the pipeline, preventing technicians from relieving pressure in the pipeline.

    Huh? I've read the NTSB report on that accident - and nowhere in it (IIRC) are computers implicated. The accident occurred due to damage to the pipes from con

  • by Anonymous Coward
    Before there are too many retarded "OMG why was it on the business network!!!?LOL!??!" comments, I'll cover that right here:

    It says the software is supposed to sync data between the control system and the business network. Obviously it has to be connected to both sides somehow. I'm not a power plant designer, but there's probably a good reason why people might need access to that data from the control system, and thus some kind of system acting as a safe bridge between the two rather than allowing unrestric
    • by datajack (17285)
      Hmm .. not sure why the ops network would have to rely on such data sent from the business network. Monitoring of levels of important stuff is an ops function to my mind.

      I'll admit that I'm too drunk to read TFA at the mo, so may have missed some detail :)
  • by dindi (78034) on Friday June 06, 2008 @08:00PM (#23689825) Homepage
    At least it did not turn it into a meltdown, so at least the safety features worked in the software.

    That is definitely a glass half full, as opposed to empty.

  • most freakouts surrounding nuclear power are based on 1960s technology. modern reactor designs, such as pebble bed reactors [wikipedia.org], are designed to be passively safe. that is, you can just walk away from them, doing nothing, and they will not release gas, go china syndrome, or anything else unsafe. older nuke tech requires active safety management: someone must always be on the job, making sure nothing f***s up. designing safety into nuclear reactor design from the philosophical ground up is the way of the future
    • by dbIII (701233) on Friday June 06, 2008 @08:32PM (#23690001)
      While that may be true the first full scale prototypes of pebble bed are yet to go online - however construction of several in China is at an advanced stage. As Superphoenix showed with fast breeders you really need a full scale prototype to identify all of the problems (it was economic ones that killed fast breeders and not safety issues).

      India's accelerated thorium idea is also very promising.

      The major problem I see with US nuclear power is the assumption that it is a solved problem and almost zero has been spent on R&D for decades. The "new generation" of reactors from Westinghouse and others is little more than 1960's white elephants painted green.

  • At the nuc site I worked at, there were two networks. The business network and the ops network. Data flowed from the ops network to the business network for statistics gathering only. The single thing that the business network did that affected operations and safety (regardless of my boss' attempt to justify budget) was the generation of work-orders. A total failure of the buisiness network would - at worst - result ina routine observation job to be missed which would cause the systems on the ops network to
  • by PPH (736903) on Friday June 06, 2008 @08:49PM (#23690111)

    ... enter 4, 8, 15, 16, 23, 42.

    Or else all hell breaks loose.

  • by ZombieEngineer (738752) on Friday June 06, 2008 @09:50PM (#23690451)
    Something is not right here...

    Yes, the safety system kicking in is "a good thing".

    Pulling data from another computer system for a safety related control system is not a bright idea (the weakest link problem).

    Historically a safety control system in an Oil & Gas environment, all the inputs to the safety system are either hardwired or pulled from another safety system controller which has the appropriate level of redundancy (CPU boards and communication paths with communication watchdog timers).

    Even transmitters in some circumstances can not be trusted hence the 2 out of 3 voting systems (take three transmitters measuring the same value and pick the middle of the three, if one of the transmitters fails high or low your choice will be the safe option).

    Someone needs a serious think about where this plant is getting data for its safety shutdown system.

    ZombieEngineer

"If truth is beauty, how come no one has their hair done in the library?" -- Lily Tomlin

Working...