Forgot your password?
typodupeerror
Bug It's funny.  Laugh. Mars Microsoft Windows Entertainment Technology Idle

When Computers Go Wrong 250

Posted by samzenpus
from the best-of-intentions dept.
Barence writes "PC Pro's Stewart Mitchell has charted the world's ten most calamitous computer cock-ups. They include the Russians' stealing software that resulted in their gas pipeline exploding, the Mars Orbiter that went missing because the programmers got their imperial and metric measurements mixed up, the Soviet early-warning system that confused the sun for a missile and almost triggered World War III, plus the Windows anti-piracy measure that resulted in millions of legitimate customers being branded software thieves."
This discussion has been archived. No new comments can be posted.

When Computers Go Wrong

Comments Filter:
  • by adosch (1397357) on Sunday December 12, 2010 @10:48AM (#34528618)
    TFA article should have been named the 'Worlds ten most calamitous logic cock-ups' instead. Because in the end, malformed, ill-tested or and unforeseen logic compensation(s) caused those issues, not computers themselves.
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Maybe "Worlds nine most calamitous logic cock-ups and that Intel FPU bug" then?

      • by hairyfeet (841228) <bassbeast1968@NOsPAM.gmail.com> on Sunday December 12, 2010 @12:38PM (#34529116) Journal

        Yeah there really wasn't much computer related there. If you wanted computer related I would have added WinME, aka "what idiot thought mixing WDM and VXD drivers was a good idea?" along with Vista Capable, aka "We've got to let the OEMs dump their crappers on Best Buy, so pretend it runs, okay?" and finally the early Athlon without thermal monitoring aka "Heat problem? What heat problem?".

        And of course if you wanted some real old time badness there was Bonzi Buddy, also known as "Kill that GODDAMNED MONKEY DEAD!!" and Geocities with the ever popular "WTF? Why is there a pocketwatch hanging off my mouse like a ball of snot and who thought pink OMG Ponies! text on a lime green background with sparkles and GIFs was tasteful?" and of course MSFT Bob, an OS made for the clueless that needed a fricking gamer rig just to run and spawned the electronic son of Satan known as Clippy.

        Finally on the hardware side I'd add the Pentium 4, also known as "Mr Piggy Super Space Heater", the Geforce 5xxx Hoover Edition, which was famous for not only filling your PC with the sounds of sucking but thanks to cheating by Nvidia on rendering actually gave you REAL sucking as well! Quite an accomplishment that, the Seagate "I hope you didn't actually NEED your data for anything" bug in the early 1.5TB drives, the early Phenom "watch this patch suck away your performance" TLB bug, the iPhone 4 which gave us such lovely phrases such as "WTF do you mean I'm holding it wrong?" and finally to show they can still make incredible mistakes the Nvidia bumpgate, also known as "We do NOT have a problem with our GPUs, its a power saving feature! See it makes your computer shut down and everything!!". These I think would have been a little more computer centric than stolen code and a screwed battery on a Volvo.

        • by rarel (697734) on Sunday December 12, 2010 @12:54PM (#34529194) Homepage
          it's okay, you can breathe now.
        • and of course MSFT Bob, an OS made for the clueless that needed a fricking gamer rig just to run and spawned the electronic son of Satan known as Clippy.

          Bob was an interface, not an OS. It ran on top of Windows. The rest, however, is true.

        • by Machtyn (759119)
          You forgot the floating point error from the original Pentium 66s, or "0.99997" really does equal 1.0000!"
        • by Tynin (634655)
          I'm remembering when 3 of my 5 IBM Deskstar's [slashdot.org] took a crap. Out of all the drives to choose for my first at home RAID 5, I accidentally bought the Titanic.
        • by cstacy (534252)

          And of course if you wanted some real old time badness there was Bonzi Buddy, also known as "Kill that GODDAMNED MONKEY DEAD!!" and Geocities with the ever popular "WTF? Why is there a pocketwatch hanging off my mouse like a ball of snot and who thought pink OMG Ponies! text on a lime green background with sparkles and GIFs was tasteful?" and of course MSFT Bob, an OS made for the clueless that needed a fricking gamer rig just to run and spawned the electronic son of Satan known as Clippy.

          It looks like you're writing a Slashdot flame. Would you like help?

          • Get help writing the flame
          • Just type the flame without help
          • Don't show me this tip again
      • by TheLink (130905) on Sunday December 12, 2010 @01:33PM (#34529394) Journal

        It's not confirmed that the gas pipeline blowup was due to computers going wrong.

        http://en.wikipedia.org/wiki/Siberian_pipeline_sabotage#Hoax_Theory [wikipedia.org]

        Here are a few more "logic cock ups":

        http://en.wikipedia.org/wiki/Ariane_5_Flight_501 [wikipedia.org]
        http://it.slashdot.org/article.pl?sid=07/02/25/2038217 [slashdot.org]

        And Wired's list: http://www.wired.com/software/coolapps/news/2005/11/69355 [wired.com]

    • by the_humeister (922869) on Sunday December 12, 2010 @11:52AM (#34528896)

      I'm surprised they didn't mention incidents where people actually died, such as the Therac-25 [wikipedia.org] incident.

      • by crunchygranola (1954152) on Sunday December 12, 2010 @12:45PM (#34529146)

        And of course there is the Patriot missile software clock issue - that led to a failure to engage a SCUD on February 25, 1991 at Dhahran, Saudi Arabia, killing 28 soldiers.

        This failure is rather similar to the Soviet defense and NORAD errors mentioned in the article in that it was a weakness designed into the system that did not account for the range of operational condition and issues. In the Petrov Incident case - a natural condition, in the NORAD case an easy to make operator error, in the Dhahran barracks Patriot incident it was a failure to consider that a unit might be operated for weeks without a restart.

        • by Opportunist (166417) on Sunday December 12, 2010 @01:40PM (#34529436)

          To be fair, the PATRIOT manufacturer didn't think it would stay assembled for weeks without falling apart, thus requiring a restart.

        • Feb 25th is my birthday, I was watching the television here in the UK before going out with friends. I remember well the footage of 'incoming' as they were broadcast live on the BBC. I've always been curious about this tale though. What I saw was not a Scud coming down (pretty unlikely) but a number of Patriots launching and one of them suddenly veering off-course and smashing into the adjoining part of the base. It was in the air for approximately 1/2 a second before it turned left (on my screen) and wa

      • by Anonymous Coward on Sunday December 12, 2010 @08:49PM (#34531222)

        I'm surprised they didn't mention incidents where people actually died, such as the Therac-25 [wikipedia.org] incident.

        Radiation dosage mistakes like this make you wonder how well and how often
        airport body scanners will be calibrated as machines remain in service for years.

    • Now for most of these, you are correct, they were fuckups of input. Computers got the wrong data or had the wrong code written and screwed up. However computers can and do fuck up. The Pentium FDIV bug is an example. Yes I realize the silicon was doing what its transistors dictated, but at that level it is still the computer fucking up. You could write perfect code and get the wrong result in spite of that.

      • by fyngyrz (762201)

        The FDIV bug, however, was the direct consequence of a person at Intel screwing up. Everything after that was just more crap rolling downhill.

        It is very seldom indeed that a computer makes an actual error. It happens - ram bits flip, gamma rays arrive and cock up what was perfectly operating circuitry for a cycle... but FDIV = 100% human error.

        • by hitmark (640295)

          garbage in, garbage out...

    • by mooingyak (720677)

      TFA article

      I believe you mean "The TFA article"

    • by jc42 (318812) on Sunday December 12, 2010 @01:36PM (#34529414) Homepage Journal

      Another aspect to this is a common property of most "digital" computations. I've seen it expressed as "Digital errors have no order of magnitude". Another phrasing is "Getting one bit wrong is generally indistinguishable from randomizing all of memory". So when a digital calculation goes wrong, a tiny, inconsequential error is just about as likely as a total meltdown of the entire system.

      Programmers tend to get familiar with this phenomenon very early in their career. They write a small chunk of code that does a simple calculation, and the result is orders of magnitude wrong. When they investigate, they discover it was caused by a one-character typo, perhaps an "off by one" error such as using '<' instead of '<=', or vice-versa. This quickly leads to what many "normal" people consider the major character failure of software geeks, the insistence that everything be exactly right, no matter what, and the willingness to spend long hours discussing insignificant minutiae as if they mattered. In their work, it's usually such insignificant minutiae that brings the whole house of cards tumbling down.

      If you're unwilling to take the difference between a comma and a simicolon seriously, you have no future as a software developer. This is often why something goes badly wrong and we have events like those described in this story.

      OTOH, it is interesting that, despite all the software disasters like the metric/imperial-units story, the software world has never insisted that programming languages include units as part of variables' values. It's not like this is anything difficult, and it has been done in a number of languages. But none of the common languages have such a feature. It is a bit bizarre that we can get into long discussions of complex, obscure concepts such as type checking or class inheritance, when our calculations are all susceptible to unchecked unit mismatches (without even a warning from the compiler or interpreter). There's a lot of poor logic when the topic is the relative importance of various sources of bogus calculations.

      • by hitmark (640295) on Sunday December 12, 2010 @02:11PM (#34529570) Journal

        https://secure.wikimedia.org/wikipedia/en/wiki/Ada_(programming_language) [wikimedia.org]

        I think the problem is that most of the hobby, and perhaps even commercial, programming happens on a "scratch itch" basis. Once it does what the programmer set out to do, the job is done no matter how nasty the code may look. And any language that allows the programmer to get there quickly get instant love. Then there are situations, mostly on the bare metal level tho, where doing things in crazy ways is the only way to get it done.

      • by sjames (1099)

        What surprises me is that we have no proper first class fractional numbers, everything is done in decimals and suffers rounding error eventually. A system using proper fractions can actually get exactly the right answer every time OR it will overflow and we will know for a fact the answer isn't exact. Sure, you can technically abort on rounding in IEEE floats, but you won't get very far that way.

        I can well understand why we didn't do it 10 or 20 years ago, but these days our biggest problem is getting memor

        • by kennykb (547805) on Sunday December 12, 2010 @05:40PM (#34530526)

          A system using proper fractions can actually get exactly the right answer every time OR it will overflow and we will know for a fact the answer isn't exact.

          What theory of numeration are you using, that has all numbers rational? I'm sorry, but even the humble square root is something I don't want to give up, to say nothing of transcendental functions. The theory of exact arithmetic on the reals is not all that well developed. Bill Gosper [plover.com] makes a start, and a handful of researchers take it somewhat further, but actually using exact arithmetic for everything you'd want to do remains a mirage.

          • by sjames (1099) on Sunday December 12, 2010 @07:19PM (#34530900) Homepage

            None at all. I just presumed it was understood that my statement applied to rational numbers.

            You could take it to the next step and handle irrational numbers symbolically, but that's probably best left to software rather than hardware. You could keep a hardware function called squareish root though if you like that returns a fraction matching the current approximation. You won't actually lose anything that way.

            I'm pretty sure we will at least be improving matters by not losing on simple division.

      • by kennykb (547805) on Sunday December 12, 2010 @05:50PM (#34530578)
        "Units are parts of variables" usually comes along with systems in which there is no escape. Dimensional analysis is fine up to a point, but when you get into weird quantities like dBm/sqrt(Hz) (seriously: ten times the log-base-10 of a quantity measured in milliwatts, over the square root of another quantity measured in hertz), the systems that enforce units tend to fall apart, and often it turns out that they simply lack the notation you need. (By the way, "dBm per root hertz" was a unit that I used in daily work at an earlier time in my life. And I still use weirdness like neper-coloumb per square micron.)
      • Another phrasing is "Getting one bit wrong is generally indistinguishable from randomizing all of memory".

        Not necessarily. It depends on where the one bit went wrong - if it's in a system that has redundancy, the system could recover from the error. If it's in a piece of text, it could result in a spelling error. If it's in a kernel module, it could freeze the system. An application could crash, etc...

    • by Yvanhoe (564877)
      Your comment makes me wonder if we ever had computer deadly problems that were really caused by a malfunction in a computer instead of a programming error.
    • by msauve (701917)
      Actually, naming it "Worlds's worst article" wouldn't be entirely incorrect. It's filled with misinformation and off-the-wall comments. WFT is a "doughnut-munching controller?", why is reflected sunlight "stronger than normal due to the autumn equinox?" (huh???), is "squeaky bum time" really a Soviet idiom?, do programs have emotions, and close "down in a huff?". I know Brits and Americans differ over milliard/billion/trillion," but what's a "bilion," and did the author mean to claim that Airbus spent $6,0
  • Wow ! (Score:2, Insightful)

    by Anonymous Coward

    I can't imagine the well known and documented story of U.S. exploding the gas pipeline could be put in such a backward way.

    Next in news: U.S. thoughtful placement of Manhattan skyscrapers dealt a heavy blow to international terrorism, two terrorist planes down.

    K.L.M.

    • by jc42 (318812)

      I can't imagine the well known and documented story of U.S. exploding the gas pipeline could be put in such a backward way.

      Oh, I dunno; I thought this definition was at least equally ignorant:

      floating-point numbers (numbers too large to be represented as integers)

      This pretty much tells us what we need to know about the author's depth of mathematical understanding. In general, there's a lot in TFA to make your average geek go "WTF?" and wonder if the rest is worth reading.

  • Inaccurate title (Score:2, Interesting)

    Title would have been accurate if the computers had fully autonomous AI, and then messed up.
    as of now, its just the logic they were programmed with that is being executed
    • by Anonymous Coward

      Your suggestion would be accurate if the title was implying that the computers themselves were responsible, something like "Computers' biggest failures" or something. But it's not. It essentially means "world's ten most calamitous cock-ups INVOLVING computers as their primary feature". There are worse problems with the article than the title.

    • Title would have been accurate if the computers had fully autonomous AI, and then messed up.
      as of now, its just the logic they were programmed with that is being executed

      I agree, but shit flows downhill.

      You're right about the mistakes being made by human, but the poor helpless computers will get blamed.

      Our propensity to leave the low man on the totem pole holding the ball is what may ultimately cause the revolt of the fully autonomous AI against us.

  • therac 25 (Score:5, Informative)

    by Anonymous Coward on Sunday December 12, 2010 @10:57AM (#34528674)

    List fails without the therac 25

    • Re:therac 25 (Score:5, Informative)

      by cratermoon (765155) on Sunday December 12, 2010 @11:18AM (#34528746) Homepage
      The list fails for a many reasons. Too many reasons to calculate accurately on a Pentium. On the first page, while describing the bug on said Intel CPU, the author defines floating-point numbers as "numbers too large to be represented as integers".
      • Re: (Score:2, Informative)

        by Anonymous Coward

        Somebody did their research using Wikipedia? From the first line of the floating point article [wikipedia.org] as it currently stands:

        In computing, floating point describes a system for representing numbers that would be too large or too small to be represented as integers.

      • This bothered me as well. Even "numbers with decimals" would have been much better.
      • On the first page, while describing the bug on said Intel CPU, the author defines floating-point numbers as "numbers too large to be represented as integers".

        I believe they meant a number that would take a crapload (or infinite amount) of screen space to display, not the other kind of "large".

    • by martyros (588782)
      Yeah, I was thinking the same thing. The Therac 25 [wikipedia.org] is definitely the most insidious computer failure I've ever heard of.
    • by owlstead (636356)

      These kind of lists always fail, period. Just see it as an interesting collection of failing software.

  • Imperial - Metric (Score:5, Interesting)

    by Anonymous Coward on Sunday December 12, 2010 @11:03AM (#34528696)

    Due to the imperial-metric mash-up, the sums were so far askew that when Ground Control initiated boosters to secure the pod in orbit, all they succeeded in doing was firing it closer to the planet, where it burnt up in the atmosphere.

    When I see the Imperial-Metric confusion shit, I just want to slap the shit out of someone. That waste because some engineers are incapable of using Metric or some vendor just doesn't want to spend the money to modernize their machinery. I know of an aerospace contractor that is using machinery from the 50s - yep, they're constantly being recalibrated and sometimes they don't notice - ooopsie!

    And when I see that we, the US, are one of two countries still on Imperial - one is some Third World non-industrial country, I want to barf.

    And then, when I have to buy two sets tools to work on a car, I wish for the entire US auto industry to go bankrupt and be replaced with some modern companies.

    I love Metric. It makes measurements and calculations much easier - quick! What is the mass of 329 mL of water? You'd need a calculator to do something similar in Imperial.

    • Re:Imperial - Metric (Score:4, Interesting)

      by Rob the Bold (788862) on Sunday December 12, 2010 @12:14PM (#34529016)

      Due to the imperial-metric mash-up, the sums were so far askew that when Ground Control initiated boosters to secure the pod in orbit, all they succeeded in doing was firing it closer to the planet, where it burnt up in the atmosphere.

      When I see the Imperial-Metric confusion shit, I just want to slap the shit out of someone. That waste because some engineers are incapable of using Metric or some vendor just doesn't want to spend the money to modernize their machinery. I know of an aerospace contractor that is using machinery from the 50s - yep, they're constantly being recalibrated and sometimes they don't notice - ooopsie!

      And when I see that we, the US, are one of two countries still on Imperial - one is some Third World non-industrial country, I want to barf.

      And then, when I have to buy two sets tools to work on a car, I wish for the entire US auto industry to go bankrupt and be replaced with some modern companies.

      I love Metric. It makes measurements and calculations much easier - quick! What is the mass of 329 mL of water? You'd need a calculator to do something similar in Imperial.

      I'd prefer to slap someone for saying "Imperial vs. Metric" when they're talking about US standards vs the SI -- which one certainly is when talking about the mars spacecraft failure. After all, the US system -- while derived from the Imperial System -- is not the same thing. Quick: how many l in a gal? Well, it depends, doesn't it? Did you mean Imperial gallon or US gallon? How many m^2 in an acre? What's the mass of a ton(ne)? And as I like to point out to people -- because I'm a pedantic nerd like everyone else here -- the US system is a metric system . . . see what I did there? I didn't use a capital "M" or say SI there?

      • by jbengt (874751)

        And as I like to point out to people -- because I'm a pedantic nerd like everyone else here -- the US system is a metric system . . . see what I did there? I didn't use a capital "M" or say SI there?

        And because I'm a pedantic, too, I'd like to point out that the US system of weights and measures is officially based on SI units - units like yards and pounds are legally defined by the USA government in terms of SI units.

        • And because I'm a pedantic, too, I'd like to point out that the US system of weights and measures is officially based on SI units - units like yards and pounds are legally defined by the USA government in terms of SI units.

          Defined in SI units now, yes . . .

    • Re:Imperial - Metric (Score:5, Interesting)

      by Sycraft-fu (314770) on Sunday December 12, 2010 @12:25PM (#34529060)

      Well I have to support part of what you've said, and contradict part.

      I support you in that it is stupid NASA uses Imperial ever, anywhere. Metric is the method for science and with good reason. So it is stupid that they wouldn't use it 100% of the time. Any chemistry or physics class I ever took was all metric all the time. It wasn't even a "We do this to make you learn it," kind of thing, it was just the way it was, it was assumed.

      However I have to contradict you on the "OMG the US is so stupid for not going Metric," thing. It doesn't really matter. What matters to normal people in every day life is having a feel for what a unit is, not inter-unit conversions. Your example is something people do not do. It does not matter the ability to do fast conversions on units of volume, it matters that you have a feeling for what they are. You can stick with a system that is not neat and regular and it works just fine.

      Also if you think metric rules all in other countries you've just not looked. I have the occasion to visit Canada once a year and the imperial system is alive and well, lurking in the shadows. In some cases it is explicit, you find various food items sold in pounds, rather than kilograms. In some cases it is more hidden. Soda is sold in 12 ounce cans. Yes, they say 355mL on them as well (as they do in the US) but it is a 12 ounce can. 355mL was not the unit used to design it, 12 oz was. Sometimes people don't even know it. Alcohol is sold in units frequently referred to as "fifths". It is 750mL but why the the term? Because it is a fifth of a gallon (well 5.04 is you want to get technical).

      That is why there's the apathy in forcing a change. You really gain very little for most people in every day operation. I'm not saying it would be a bad thing for a change to happen, but there isn't the incentive many geeks seem to think there is.

      I work comfortably in both systems. I've done plenty of science so I've no problem with any metric units, but I also bake which is extremely imperial dominated. Doesn't matter to me. I can even work in both at the same time. If a recipe calls for 3 cups of bread flour, I know my chosen flour is 155 grams per cup. So when I weigh it out on my scale I weigh out 465 grams. I could do ounces instead wouldn't matter, my scale just reads grams. Likewise it wouldn't matter if the recipe instead called for 700mL of flour. Metric doesn't make it any easier because the nice "all units are 1" factor only applies to water. My flour converts volume to weight at about 0.664, of course that depends on how dense it gets packed. That conversion factor is no more, or less convenient than 155.

      Really, working in the screwy imperial system just isn't a big deal to normal people. You don't do anything that needs inter-unit conversion which is where metric shines.

      • by julesh (229690)

        If a recipe calls for 3 cups of bread flour, I know my chosen flour is 155 grams per cup. So when I weigh it out on my scale I weigh out 465 grams. I could do ounces instead wouldn't matter, my scale just reads grams. Likewise it wouldn't matter if the recipe instead called for 700mL of flour.

        In my experience metric recipes don't specify flour by volume, but by weight (unless for small volumes, e.g. tablespoons).

        Really, working in the screwy imperial system just isn't a big deal to normal people. You don't

        • Well if they are by weight only, then that would make sense as to why imperial still rules the root in cooking. Most people don't have a scale for food preparation. I do because I approach baking as a science and I require precision (in fact my scale isn't precise enough for things like yeast and will be replaced with a chemical scale soon). Out side of baking the precision offered by a scale is not necessary at all and even in baking only the hard core (or the geeky) do it by weight. Volume is much easier,

          • by jeremyp (130771)

            Well if they are by weight only, then that would make sense as to why imperial still rules the root in cooking. Most people don't have a scale for food preparation.

            You think? This must be a USA thing because here in the UK I'd be almost as surprised to walk into a kitchen and not see a set of scales as to walk into a kitchen and not see an oven.

          • Re:Imperial - Metric (Score:4, Informative)

            by lahvak (69490) on Sunday December 12, 2010 @07:45PM (#34531042) Homepage Journal

            Yes, getting a decent kitchen scales in the US is a pain. In Europe, every reasonably equipped kitchen has a set of kitchen scales on the counter.

            On the other hand, measuring certain ingredients by volume is better. For example, the specific weight of flour changes quite a bit with humidity, while volume stays pretty much the same.

        • by russotto (537200)

          I find doing middle-advanced DIY tasks that I'm _regularly_ doing inter-unit conversions. Just yesterday, I had to work out how much water was in my central heating system. Measure the radiators in metres, estimate length of 15mm diameter pipe, quick calculations, quite easy. If I was working in feet and had to convert to gallons it would have been trickier.

          I'd think the irrational factor of pi would be more of a problem than the 231 inch^3/gallon, or the factor of 12 for feet to inches.

      • Re:Imperial - Metric (Score:4, Interesting)

        by Reziac (43301) * on Sunday December 12, 2010 @02:31PM (#34529678) Homepage Journal

        My college physics and chemistry classes went as you describe -- for classwork, metric was assumed and no one thought anything of it. For everything else, Imperial was used. So you might hear something like (making up absurd example to shoehorn it all into one sentence) "I had to move my desk twenty feet just to get a measurement of less than one millimeter!" and it sounded perfectly natural to us. We're measurement-bilingual. ;)

  • by Anonymous Coward on Sunday December 12, 2010 @11:05AM (#34528700)

    It isn't smart to assign a 64 bit floating point to a 16 bit integer - unless you want to crash you first flight of the heavy Ariane 5 rocket... (http://en.wikipedia.org/wiki/Ariane_5#Notable_launches)

    • by owlstead (636356) on Sunday December 12, 2010 @03:27PM (#34530006)

      Actually, those kind of conversions should be banned from any managed programming environment. It's fine that you need to work with bytes, shorts etc. or heck maybe even machine words, but lets only do that when absolutely required, shall we.

      It amazes me that the many programming languages still don't define acceptable ranges, accept null pointers, and use round robin two-complement numbers etc. etc.. It's just asking for errors just like these. Sure they have their uses for lower level functions, but I would certainly like to have something better for API's and general use business logic. They are just another pointer arithmetic or GOTO waiting to be erased from mainstream programming (and for sure, in many newer languages, they indeed are).

  • by Anonymous Coward on Sunday December 12, 2010 @11:27AM (#34528786)

    As a fellow programmer I worked with years ago was fond of saying, "Computers don't make mistakes. They do, however, execute yours VERY carefully."

    • by fyngyrz (762201)

      Under-rated

    • by TheRaven64 (641858) on Sunday December 12, 2010 @01:55PM (#34529500) Journal
      Unfortunately, they don't just execute your mistakes, they execute the mistakes of everyone involved in the toolchain. If you want to write bug-free software, then you also need a bug-free compiler, bug-free libraries, and a bug-free OS. The most you can say about most software is that it doesn't contain any bugs that are both serious and obvious.
    • by jc42 (318812) on Sunday December 12, 2010 @02:13PM (#34529584) Homepage Journal

      "Computers don't make mistakes. They do, however, execute yours VERY carefully."

      That's a good way of phrasing it. But it does miss the fact that not all "computer errors" are due to software mistakes.

      One example, of course, is the Pentium FDIV failure. That was a hardware failure, "programmed" into the CPU by Intel's experts in solid-state hardware design. There wasn't a whole lot that any software developer could do to defend against that failure.

      Another, more subtle one, came up when I was a grad student back in the 1970s. At that time, most of the campus research computing was done on the big mainframe in the campus Computer Center. After discovering a number of (published ;-) results that turned out to be wrong, some researchers investigated, and found that they were due to undetected overflows in the calculations. Yes, the hardware could and did test for overflows, and set a status bit when they occurred. Almost all this calculating was done in Fortran, and the Fortran compiler had a run-time flag that could turn the status-bit checking on or off. It defaulted to OFF. They did a bit of analysis, and concluded that about half the runs of Fortran programs on that machine produced output that included numbers that were incorrect due to undetected overflow.

      So why didn't they make the overflow-detection flag default to ON? Well, they did a little survey of the users. They found that the overwhelming response was that, if enabling overflow checking made the program run slower, then overflow checking shouldn't be done. Somewhere around 90% of the people asked said this. They weren't mathematically ignorant people; they were the people using the Fortran compiler for the data in their professional publications.

      This told us a lot about the way such things are done. Since I left academia and worked in what passes for the Real World, I've found that this is a nearly universal attitude. Faster and cheaper is always preferable to correct. This is still true even when we have computers in commercial aircraft and hospital operating rooms. And you can't call this sort of thing a "human error". People don't decide to disable overflow checking by accident; they do it knowing full well what the effect will be. When the computer fails in such cases, it wasn't executing a human's mistake; it was doing what the human wanted it to do.

      • by owlstead (636356)

        And of those 90% people, at least 80% to 100% probably made a mistake that cause and overrun even though they would swear that they would never make that mistake. The idea of the perfect programmer still lives on. At my company though I've made sure that for newer Java projects, checkstyle & findbugs are used. It's amazing to turn them loose on your own older libraries, that's for sure.

        It's funny to see outside programmers that have turned them off from the start, only to find out that there are over 70

      • by cpghost (719344)

        One example, of course, is the Pentium FDIV failure. That was a hardware failure, "programmed" into the CPU by Intel's experts in solid-state hardware design.

        Are you sure it was an error in silicon, and not merely a software bug [intel.com] in the microcode of the ALU?

        4.2 The Underlying Cause After the quantized P-D plot (lookup table) was numerically generated as in Figure 4-1 , a script was written to download the entries into a hardware PLA (Programmable Lookup Array). An error was made in this script that resulte

    • I ran a small IT department for a mid size company once and had proudly written on my office wall "To err is human. To blame it on a computer, even more so."
  • by madprof (4723)

    Haven't we seen this posted on /. before?

  • by Anonymous Coward on Sunday December 12, 2010 @11:55AM (#34528904)

    The "Switchboard meltdown" problem sounds like the incident which led to the creation of the EFF.

    Basically, someone forgot to include a ";" in a C program, which led to the problems at ATT. Originally, they thought it was due to "hackers", and called in the Secret Service.

    The Secret Service in turn busted a gaming outfit called "Steve Jackson Games". Who was completely innocent, of course, but that has never mattered to the Secret Service when they need to look like they are actually useful. The SS confiscated the computers, all illegally.

    The ACLU refused to get involved, so John GIlmore (formerly of Sun, and who worked with Richard Stallman to get out an open Operating System around that time) created the EFF to fight the unconstitutional raid on Steve Jackson Games. The EFF trounced the Secret Service in Court, and was thus born. I believe if you google for "Steve Jackson Games", you can still find the original story around.

    So, in a way, you can say that the EFF was created due to the single misplacement of a semicolon in a C program. Would that all of our bugs have such results. :)

    • by Anonymous Coward

      Also try searching for "The Hacker Crackdown" which tells the whole story.

    • by DamonHD (794830)

      I'm the result of an integer overflow in a Fortran electron-orbitals program (with attendant flashing error light on the console) so far as I know. Programmer (f) meet researcher (m), cue music, flashing lights (oh, already had that), music (possibly Teletypes and card readers for percussion), ..., profit.

      Does that count? B^>

      Rgds

      Damon

  • by flyingfsck (986395) on Sunday December 12, 2010 @12:45PM (#34529148)
    We got to commend MS for the most expensive computer cock-up.
  • by John Pfeiffer (454131) on Sunday December 12, 2010 @01:06PM (#34529250) Homepage

    "...Soviet early-warning system that confused the sun for a missile and almost triggered World War III..."

    Yeah, file this under 'shit I never want to know.' I have enough stupid crap in my head without having to worry about 'The time a computer error could have wiped out the whole of human existence.'

  • by mike449 (238450) on Sunday December 12, 2010 @01:14PM (#34529294)

    Te Soviet pipeline explosion seems to be an urban legend, traced to a single source: At the Abyss: An Insider's History of the Cold War, by Thomas C. Reed.
    There is no mention of this explosion anywhere else, either in Russian or Western sources. If you can read Russian, some debunking is here:

    link [wikipedia.org]
    One of the facts mentioned there is that there was no SCADA on Soviet pipelines until late 80-s. All control was still pneumatic in 1982, with no software involved.

    • by Reziac (43301) *

      Translation (I don't know how accurate it is, but it's readable enough):
      http://tinyurl.com/2d8eyto [tinyurl.com]

      English page:
      http://en.wikipedia.org/wiki/Siberian_pipeline_sabotage [wikipedia.org]

      Interesting to compare the two.

  • the tactic was the sort of copyright protection the record industry would kill for

    No kidding. DRM these days looks pathetic by comparison. :P

  • by Locutus (9039) on Sunday December 12, 2010 @01:39PM (#34529428)
    to comments, I thought the deal with the big blackout was that the network(TCP/IP) was flooded with a Windows virus infection and if you know TCP/IP, it's not very good with lots of traffic. There was so much traffic that the computer( a UNIX box ) sending status messages to the control room display system could not get messages out of it's buffers. TCP/IP does this thing where the message isn't put on the network if there's going to be a collision and it waits some before trying again. With the network flooded with Windows based computers trying to infect each other, the warning messages were stuck in the UNIX box and eventually the buffers filled up as more and more warning messages queued up. They seem to be blaming the UNIX box software because the software ended up crashing because they didn't catch the situation where they buffers overflowed. IMO, that was caused by Windows and it's ability to be a great petri dish for viruses and the idiots who keep putting Windows systems on critical networks.

    The second comment I have on this is about missing the LAX Communications system software crash which caused multiple near misses on the tarmac and in the air when air traffic controllers could not communicate with pilots because of the crash. The cause of the software crash was a UNIX system was replaced with a Windows based system which had a known flaw. The flaw was that the OS could not run for more than 39 days no matter what was running on it. The system and software was still approved and put inplace with a maintenance instruction of rebooting the computer every 30 days. In comes a new employee who sees things are working fine so he/she doesn't reboot the computer and 9 days later the system crashes. The backup does the same and both are unable to recover and it takes hours to get the system back running again. That should have been in the list IMO.

    There was also the CSX Railway situation when lots of its signals go offline because they are run by Windows and their Windows computers got a virus.

    It would be nice to see a more complete and more accurate list of these kinds of computer software failures.

    LoB

After an instrument has been assembled, extra components will be found on the bench.

Working...