Forgot your password?
Bug It's funny.  Laugh. Mars Microsoft Windows Entertainment Technology Idle

When Computers Go Wrong 250

Posted by samzenpus
from the best-of-intentions dept.
Barence writes "PC Pro's Stewart Mitchell has charted the world's ten most calamitous computer cock-ups. They include the Russians' stealing software that resulted in their gas pipeline exploding, the Mars Orbiter that went missing because the programmers got their imperial and metric measurements mixed up, the Soviet early-warning system that confused the sun for a missile and almost triggered World War III, plus the Windows anti-piracy measure that resulted in millions of legitimate customers being branded software thieves."
This discussion has been archived. No new comments can be posted.

When Computers Go Wrong

Comments Filter:
  • by adosch (1397357) on Sunday December 12, 2010 @10:48AM (#34528618)
    TFA article should have been named the 'Worlds ten most calamitous logic cock-ups' instead. Because in the end, malformed, ill-tested or and unforeseen logic compensation(s) caused those issues, not computers themselves.
  • Wow ! (Score:2, Insightful)

    by Anonymous Coward on Sunday December 12, 2010 @10:49AM (#34528628)

    I can't imagine the well known and documented story of U.S. exploding the gas pipeline could be put in such a backward way.

    Next in news: U.S. thoughtful placement of Manhattan skyscrapers dealt a heavy blow to international terrorism, two terrorist planes down.


  • by jc42 (318812) on Sunday December 12, 2010 @01:36PM (#34529414) Homepage Journal

    Another aspect to this is a common property of most "digital" computations. I've seen it expressed as "Digital errors have no order of magnitude". Another phrasing is "Getting one bit wrong is generally indistinguishable from randomizing all of memory". So when a digital calculation goes wrong, a tiny, inconsequential error is just about as likely as a total meltdown of the entire system.

    Programmers tend to get familiar with this phenomenon very early in their career. They write a small chunk of code that does a simple calculation, and the result is orders of magnitude wrong. When they investigate, they discover it was caused by a one-character typo, perhaps an "off by one" error such as using '<' instead of '<=', or vice-versa. This quickly leads to what many "normal" people consider the major character failure of software geeks, the insistence that everything be exactly right, no matter what, and the willingness to spend long hours discussing insignificant minutiae as if they mattered. In their work, it's usually such insignificant minutiae that brings the whole house of cards tumbling down.

    If you're unwilling to take the difference between a comma and a simicolon seriously, you have no future as a software developer. This is often why something goes badly wrong and we have events like those described in this story.

    OTOH, it is interesting that, despite all the software disasters like the metric/imperial-units story, the software world has never insisted that programming languages include units as part of variables' values. It's not like this is anything difficult, and it has been done in a number of languages. But none of the common languages have such a feature. It is a bit bizarre that we can get into long discussions of complex, obscure concepts such as type checking or class inheritance, when our calculations are all susceptible to unchecked unit mismatches (without even a warning from the compiler or interpreter). There's a lot of poor logic when the topic is the relative importance of various sources of bogus calculations.

  • by mikael_j (106439) on Sunday December 12, 2010 @01:37PM (#34529418)

    From your post it sounds like you've been living somewhere that used to belong to the british empire, those people still tend to think of their weight in "stones" and various other oddball measurements but there are definitely countries where imperial units are barely used.

    Here in Sweden the only people who use imperial units seem to be carpenters who call a 5x10 cm piece of wood a "tvåtumfyra" ("twoinchfour") but even they don't actually assume the actual size of it is 5.08x10.16 cm, it's just that "tvåtumfyra" is faster to say than "fem gånger tio centimeter".

    As for degrees, most people tend to use degrees in everyday conversation (when it comes up) but degrees are not an "imperial" measurement, it predates most imperial units by centuries. And most people I've met who have taken "advanced" high school level math or college level math tend to use radians when actually doing any kind of math related to angles.

    Also, you tell someone here in scandinavia that you're 5'10" tall and weigh 176 lbs and they're likely to either not understand you or they'll go "So, a foot is like, 30 cm, right? and how many inches are there in a foot? I know it's not ten but like, fifteen or something, right? And a pound's like, 0.5 kg? or was it less? maybe more? And aren't there two types of pound? Or was that pints?".

    Basically, if you tell someone around here that something is "n <imperial unit>" they will have no clue no matter how "natural" you think it is because you happened to grow up with it.

    Also, as for easy unit conversions, people do use them, just not in the uncommon ways you described, most people just aren't familiar with some of the less common prefixes but milli-, centi-, deci-, hecto- and kilo are all commonly used (and most people know that mega and giga are millions and billions, they just don't have much use for them, so rather than saying 1.5 megameters you say 1500 kilometers).

  • by Locutus (9039) on Sunday December 12, 2010 @01:39PM (#34529428)
    to comments, I thought the deal with the big blackout was that the network(TCP/IP) was flooded with a Windows virus infection and if you know TCP/IP, it's not very good with lots of traffic. There was so much traffic that the computer( a UNIX box ) sending status messages to the control room display system could not get messages out of it's buffers. TCP/IP does this thing where the message isn't put on the network if there's going to be a collision and it waits some before trying again. With the network flooded with Windows based computers trying to infect each other, the warning messages were stuck in the UNIX box and eventually the buffers filled up as more and more warning messages queued up. They seem to be blaming the UNIX box software because the software ended up crashing because they didn't catch the situation where they buffers overflowed. IMO, that was caused by Windows and it's ability to be a great petri dish for viruses and the idiots who keep putting Windows systems on critical networks.

    The second comment I have on this is about missing the LAX Communications system software crash which caused multiple near misses on the tarmac and in the air when air traffic controllers could not communicate with pilots because of the crash. The cause of the software crash was a UNIX system was replaced with a Windows based system which had a known flaw. The flaw was that the OS could not run for more than 39 days no matter what was running on it. The system and software was still approved and put inplace with a maintenance instruction of rebooting the computer every 30 days. In comes a new employee who sees things are working fine so he/she doesn't reboot the computer and 9 days later the system crashes. The backup does the same and both are unable to recover and it takes hours to get the system back running again. That should have been in the list IMO.

    There was also the CSX Railway situation when lots of its signals go offline because they are run by Windows and their Windows computers got a virus.

    It would be nice to see a more complete and more accurate list of these kinds of computer software failures.

  • by TheRaven64 (641858) on Sunday December 12, 2010 @01:55PM (#34529500) Journal
    Unfortunately, they don't just execute your mistakes, they execute the mistakes of everyone involved in the toolchain. If you want to write bug-free software, then you also need a bug-free compiler, bug-free libraries, and a bug-free OS. The most you can say about most software is that it doesn't contain any bugs that are both serious and obvious.
  • by owlstead (636356) on Sunday December 12, 2010 @03:27PM (#34530006)

    Actually, those kind of conversions should be banned from any managed programming environment. It's fine that you need to work with bytes, shorts etc. or heck maybe even machine words, but lets only do that when absolutely required, shall we.

    It amazes me that the many programming languages still don't define acceptable ranges, accept null pointers, and use round robin two-complement numbers etc. etc.. It's just asking for errors just like these. Sure they have their uses for lower level functions, but I would certainly like to have something better for API's and general use business logic. They are just another pointer arithmetic or GOTO waiting to be erased from mainstream programming (and for sure, in many newer languages, they indeed are).

  • Re:Not always (Score:1, Insightful)

    by maxwell demon (590494) on Sunday December 12, 2010 @05:42PM (#34530554) Journal

    No, you obviously didn't, as you show again. The point is that it's as much a truism as that it is a truism that the failure of anything man-made can ultimately be explained as failure of a human. That's not a reason not to call it a computer error, just as you'd not replace the term human error by physics error just because the human behaviour is ultimately the behaviour of a physical system following the laws of physics.

  • by kennykb (547805) on Sunday December 12, 2010 @06:03PM (#34530606)
    One way to calibrate your respect for the press: listen to them when they're talking about something you know about. Assume that they have the same depth of understanding when they're talking about something you don't know about.

It's a poor workman who blames his tools.