Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Math Technology

Why Computers Suck At Math 626

antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"
This discussion has been archived. No new comments can be posted.

Why Computers Suck At Math

Comments Filter:
  • Poor QA (Score:5, Insightful)

    by slifox ( 605302 ) * on Saturday October 31, 2009 @08:17AM (#29933547)
    It's pretty pathetic and negligent that software that controls explosive missles was not tested for over 100 hours of operation. That's a standard Quality Assurance procedure for even the simplest low-budget hardware...

    It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

    I wonder how much time and money was spent in research and development for this thing
    It doesn't seem like we're getting a quality product for the likely huge sum that was paid for it...
  • by Carewolf ( 581105 ) on Saturday October 31, 2009 @08:19AM (#29933553) Homepage

    Use decimal floating point or simple swich to fixed point. Fixed point not used as often as it should, and many developers don't know how difficult ordinary floiting point really is.

  • by Big_Mamma ( 663104 ) on Saturday October 31, 2009 @08:30AM (#29933591)
    Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!

    Computers don't suck at math, those programmers do. You can get any precision mathematics on even 8 bit processors, most of the time compilers will figure out everything for you just fine. If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.
  • by hellfire ( 86129 ) <deviladv@[ ]il.com ['gma' in gap]> on Saturday October 31, 2009 @08:31AM (#29933599) Homepage

    Translation: computers are only as smart as the people programming them... and there's plenty of stupid people out there.

    We knew this. This is no great revelation. So why is this news?

  • What?! (Score:5, Insightful)

    by jointm1k ( 591234 ) on Saturday October 31, 2009 @08:31AM (#29933601)

    of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register

    All they had to do is use integers, where a value of 1 represents 0.1 s.

  • by RichardJenkins ( 1362463 ) on Saturday October 31, 2009 @08:32AM (#29933607)
    Indeed, this seems more like naive design decisions than computers sucking at math.
  • by DarkOx ( 621550 ) on Saturday October 31, 2009 @08:34AM (#29933621) Journal

    yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision

  • Re:"User error"? (Score:5, Insightful)

    by betterunixthanunix ( 980855 ) on Saturday October 31, 2009 @08:42AM (#29933673)
    So they designed a system that accumulated rounding errors over time, and their solution was to ask the system's users to reboot the system every so often? Somehow, that does not add to my sympathy for these programmers...
  • by frovingslosh ( 582462 ) on Saturday October 31, 2009 @08:43AM (#29933679)
    It is absurd to blame the computer (or worse, all computers) for what is bad programming. Computers can store a 1/10 of a second perfectly accurately, as long as it is stored in a variable that counts tenths of seconds rather than seconds. It can easily be stored as an integer that way, avoiding any floating point rounding errors.

    There certainly are cases of bad math in computers, particularly Intel computers. But this isn't such an example. This is just a lazy and stupid programmer who didn't understand what he was really doing who should take the blame for the failure that killed people, not the computer.

  • by Herger ( 48454 ) on Saturday October 31, 2009 @08:46AM (#29933703) Homepage

    This is not an example of computers sucking at math.

    This is an example of engineers and developers failing to draw up valid requirements, failing to develop to specification, and failing to test against real-world use cases.

    Management undoubtedly shares an equal if not greater portion of the blame here. This is typical military-industrial complex, lowest-bidder contractor mentality at work, just another form of corporate welfare if the government doesn't turn around and punish shortfalls like this.

  • by david duncan scott ( 206421 ) on Saturday October 31, 2009 @08:53AM (#29933749)

    Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated.

    You're right. Just as the failure of Samuel Langley's aircraft demonstrated that man would never fly, the failure of an anti-aircraft missile to destroy only half of the ballistic missiles (targets moving at what, twice the speed of the targets it was designed to destroy?) demonstrates that ABM's will never work.

  • Re:Poor QA (Score:4, Insightful)

    by Rising Ape ( 1620461 ) on Saturday October 31, 2009 @08:54AM (#29933755)

    Seriously, what programmer has not heard of floating point errors?

    I had a similar issue with some code of mine for physics analysis. While I had heard of floating point errors, they're a lot more subtle than it first appears, and I ended up falling victim to one. Fortunately I discovered it before it actually let to any serious problems, it just resulted in wasted time.

    Not everyone with a need for programming has a CS background and enough experience to be aware of all the potential problems. You'd hope that someone working on a missile system would have though.

  • by noidentity ( 188756 ) on Saturday October 31, 2009 @08:57AM (#29933777)

    Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.

    Sorry, 0.1 seconds can be represented EXACTLY in such a system. It doesn't even need floating-point. Here is how such a system could represent the durations of 0.1 seconds, 25.7 seconds, and 123.4 seconds: 1, 257, and 1234. So like you say, fixed-point works here. No need for anything beyond integers in this case.

  • by Interoperable ( 1651953 ) on Saturday October 31, 2009 @08:58AM (#29933783)
    The article contains some interesting examples but all of which have been in programming texts and courses for years. I'm not really sure why it's on /.
  • by noidentity ( 188756 ) on Saturday October 31, 2009 @09:07AM (#29933837)

    2.0/2.0 != 1 on almost all FPU's today.

    Say what? Citations please. Me thinks one of those 2.0 values isn't really 2.0. Hint: printing a value isn't a good way to get its actual value, because the printing function most likely rounds it to fewer digits than it's actually stored as.

  • Re:Poor QA (Score:5, Insightful)

    by dbIII ( 701233 ) on Saturday October 31, 2009 @09:09AM (#29933847)
    Oh really? The problem with these systems is that they have never worked in anything other than rigged tests and are just silicon snake oil.
    I remember having this same discussion where there was a story here about some sort of Israeli space lasers that could apparently even shoot down artillery shells. Only a few months after that a very large number of thirty year old rockets dumped at discount price by Iran for being obsolete came flying over the border from Lebanon. Since then a lot of even slower rockets came out of Gaza. The success rate of this amazing new space toy matches that of the Patriot - zero.
  • Re:Poor QA (Score:5, Insightful)

    by OeLeWaPpErKe ( 412765 ) on Saturday October 31, 2009 @09:13AM (#29933861) Homepage

    Mod parent up ! This idiotic article blames computers for programmers using numerical approximation algorithms illadvisedly.

    which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

    So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ? And surely the military has better clock synch than a stupid home pc ? This is stupidity, also known as "human error", causing those deaths. It's a case of "the correct answer to the wrong question".

    What is always brought up as a "computer problem" is the crash in Paris of a jet due to infighting between the human pilot and the autopilot. Of course, there the ultimate mistake was the pilot's : he had forgotten to turn off the autopilot to land. It was set for cruising altitude (3km), and the pilot was trying to land. This resulted in ever more desperate attempts by the autopilot to get the plane to gain height, which eventually resulted in a total loss of lift for the plane, which naturally resulted in the plane hitting the ground nose-down and a big fireball. The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

  • Re:"User error"? (Score:5, Insightful)

    by Joce640k ( 829181 ) on Saturday October 31, 2009 @09:25AM (#29933935) Homepage

    I'm calling "Horsepoo" on the whole story.

    a) If they knew enough about it to put "reboot every 36 hours" in the manual they knew enough to fix it.

    b) According to the summary, 36 hours would still be a complete miss (a third of 687 meters is still 229)

    c) A fixed point integer (32 bits) can mark tenths of seconds with complete accuracy for over 13 years.

    d) Leaving aside a,b and c, the story still doesn't make any sense. The system would start the calculation the moment it saw the missile, not 100 hours before it appeared on the radar.

    Now ... at the speed of a scud missile (mach 5 if google serves me), it may be that an accuracy of 1/10th second isn't enough to compute the trajectory accurately enough to intercept it. At that speed you might need 10,000th second resolution or whatever. *That* would be believable (but unlikely - the designers would have to be complete idiots).

    The rest of the article? Yawn. It's the same old recycled story we've been seeing since the 1970s (those of us who are old enough).

  • by SpinyNorman ( 33776 ) on Saturday October 31, 2009 @09:26AM (#29933949)

    It's the reporting that's garbage. It makes no sense at all. A system tracking missiles travelling at Mach 3 is keeping track of time to 0.1 sec accuracy?! Do you really believe that? Wanna buy a bridge?

    0.1 sec at Mach 3 is 100m, so you'd have a hope in hell of ever hitting a 3m long target.

    The problem isn't the people working for the defence company, who are hard-core PhDs with some very serious domain knowledge. The problem is people like yourself who are so math illiterate as not to be able to fact check a piece-of-shit story!

  • Re:Poor QA (Score:5, Insightful)

    by Hal_Porter ( 817932 ) on Saturday October 31, 2009 @09:34AM (#29934005)

    There is a good GAO report on this.

    This one?

    http://www.fas.org/spp/starwars/gao/im92026.htm [fas.org]

    Wow. People complain about the US government. Still look at the transparency. The GAO wrote a very readable report for the House Of Representatives and now we can all read it on the web. It's not unreasonable to think that the US's vast military superiority over everyone else on the planet is at least in part due to this sort of thing. I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

  • Re:Poor QA (Score:5, Insightful)

    by Shinobi ( 19308 ) on Saturday October 31, 2009 @09:36AM (#29934019)

    To be honest, from working in two specialist fields(HPC system level programming and embedded applications(particularly sensor stuff), I've experienced that CompSci grads are more likely than CompEng or EE grads to make errors like this. A large part of it is simply that CompSci nowadays is too high-level and abstract, many of them don't know very much about how computers ACTUALLY work other than as a theoretical model.

    A common remark is "Why should I need to know that, the compiler will take care of it better than I will anyway", completely forgetting that the compiler is only as smart as the programmer who coded it is. So you can get what I ran into with an odd appliance based around the SH-4 processor I was hired to fix some performance problems with. It ran fixed point integer and decimal math, and was ported over from ARM. But it only reached about 25% of maximum theoretical performance, while the ARM reached around 80%. Turns out GCC was at fault, using a generic method that wasn't suitable for the Super-H architecture. And the CompSci had no clue about such things.

  • by pz ( 113803 ) on Saturday October 31, 2009 @09:53AM (#29934109) Journal

    Well, in this specific instance a decimal system would have been ok, but it isn't a general answer. The general answer is "make sure your increments are divisible into your number base" . . .

    Close. Very close. The general answer is no matter what base you select for time, distance, or any other metric that might accumulate errors, be certain to (a) perform a careful error analysis, (b) include some additional safeguard to control the error if there are potentially large downstream effects.

    Just because these computers counted in, say INT/10, and therefore could represent 0.1 seconds exactly does not mean, for example, that the timebase used to drive that counting was accurate and stable. Errors could still accumulate, although probably in a different modality.

    Kids, long-term error analysis is HARD. Errors creep in through unlikely paths, even when you think you've been super careful as suggested by the parent post. While selecting a good numeric representation helps in controlling error accumulation, it is not a panacea.

  • Re:Poor QA (Score:3, Insightful)

    by Alef ( 605149 ) on Saturday October 31, 2009 @10:05AM (#29934215)
    Even if a flawed design would have worked in the intended usage scenarios as you speculate, given the option of writing a correct program and an incorrect program with no significant difference in effort, why would you ever consciously consider choosing the broken solution from the start? This sounds more like plain and simple incompetence to me.
  • Re:Poor QA (Score:5, Insightful)

    by Alef ( 605149 ) on Saturday October 31, 2009 @10:23AM (#29934367)

    I don't think any other government would do this - mistakes in the military would just get covered up as state secrets and anyone who tried to talk about them would get locked up or worse.

    Eh. Forgive me, but do you have any basis whatsoever for this claim, or are you just being arrogant?

  • by mybecq ( 131456 ) on Saturday October 31, 2009 @10:42AM (#29934519)

    A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.

    Perhaps the tracking radar has a 500m field of view at a range of X km (enough distance to launch a Patriot missile). It doesn't look at the target through a keyhole and just has to be in the general vicinity to detect/confirm the incoming Scud.

    How about if you realized that there are two systems in this story?
    1) Radar (0.1 s accuracy)
    2) Patriot missile (launched after target confirmation by Radar)

  • Re:Poor QA (Score:3, Insightful)

    by Jeppe Salvesen ( 101622 ) on Saturday October 31, 2009 @10:52AM (#29934589)

    I think the guy has a point (altough he's being a bit nationalistic about it): Transparancy is key in order to learn from mistakes. You can say many different things about the US of A, but the US of A is good at open hearings.

  • by publiclurker ( 952615 ) on Saturday October 31, 2009 @11:20AM (#29934761)
    Actually the main purpose is a cost plus fixed profits contract for the weapons manufacturer. Even if no one ever dies on either side of the gun, it's still a success to them.
  • Re:Poor QA (Score:5, Insightful)

    by Jeremi ( 14640 ) on Saturday October 31, 2009 @11:34AM (#29934859) Homepage

    The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

    Sounds like a user interface problem to me. Given the potential consequences of that particular user error, the fact that the autopilot was still engaged should have been made more obvious to the pilot. (e.g. when the plane computer sees that a struggle is going on between the autopilot and the manual controls, it should prompt a loud, un-maskable synthesized voice shouting "THE AUTOPILOT IS ENGAGED, YOU IDIOT!")

  • Re:Poor QA (Score:4, Insightful)

    by Anonymous Coward on Saturday October 31, 2009 @11:44AM (#29934927)

    So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

    Do you want to be the one to explain to the generals why their stand-alone, truck-based mobile air protection system needs a hard-line network connection to work?

    The real idiocy is here:

    Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register

    Taken charitably, the article writer has oversimplified to the point of obscuring the point. It's perfectly possible to represent a 0.1-second tick in a 24-bit register. There's an overflow about once every 19 days. The problem is doing calculations *with* that number, and that takes knowing what the hell you're doing. Given the problem the system designers were trying to solve with Patriot, this should not have been a problem.

    And surely the military has better clock synch than a stupid home pc ?

    You'd be surprised how hard clock accuracy is to get right, *especially* under military conditions. A drift of 0.3433 seconds over 100 hours works out as an accuracy of 1 part in a million, give or take. Besides, the problem here wasn't clock drift, so it's a irrelevant.

  • Re:Poor QA (Score:1, Insightful)

    by Anonymous Coward on Saturday October 31, 2009 @11:55AM (#29934987)

    nobody bothered to run "ntpdate" even once in hundred days?

    If I understand this correctly, running ntpdate or something similar would not have helped - the data type used to store the time since system power up is floating point, and the smallest representable unit just gets bigger with every second the system is up. After 100 days, thigs apparently get so bad that the best you can get is 0.34s, and I suspect that the error would approach 1s if you let the system just sit there for a year. I suspect that at some point, the system clock would just (appear to) stop, because the increments are below the representable precision.

  • by ToasterMonkey ( 467067 ) on Saturday October 31, 2009 @12:02PM (#29935033) Homepage

    FTFA:
    "So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths - or at least some of them do."

    Thank you, come again.

    So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ?

    Yes, obviously they just needed to ssh into their patriot missile air defense system, edit a few lines in /etc/inet/ntp.conf and svcadm restart ntp.

    The obvious problem in the article, if you read it, is computer's finite precision, and how it is dealt with. By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

    Everyone defending the way 'computers' is used in this article, and conflating it with 'processor' is a complete idiot.

  • Re:Poor QA (Score:2, Insightful)

    by quickOnTheUptake ( 1450889 ) on Saturday October 31, 2009 @12:03PM (#29935045)
    No he didn't. Had you finished the article you might have seen these lines:

    But all of today's computers are universal computing machines, which means that they can solve any problem involving logic and maths.
    So if a processor's internal instructions can't operate on large enough integers or on floating point numbers with sufficient precision, it's always possible for the programmer to implement arithmetic routines that will.

    So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths – or at least some of them do.

  • Re:Poor QA (Score:1, Insightful)

    by russotto ( 537200 ) on Saturday October 31, 2009 @12:16PM (#29935111) Journal

    In any case, the only way that can be done in any permanent manner is to not give Hezbollah any reason to fire rockets in the first place. Not "occupation".

    Yeah, like Hezbollah needs a reason to fire rockets. Not much the Israelis can do -- besides cease to exist -- to eliminate reasons for Hezbollah to fire rockets at them.

  • by david duncan scott ( 206421 ) on Saturday October 31, 2009 @01:09PM (#29935441)
    OK, we'll go with 0% success. My point is that the failure of any one implementation does not invalidate the concept. Edison tried hundreds of wrong ways to make a light bulb, none of which demonstrated that the light bulb was unworkable.

    Oh, and the Scud hunting in Gulf One was largely an air exercise, as I recall, and of course they went after the launchers. It's always preferable to destroy the enemy on the ground (or in harbor, or asleep in barracks) then when they're incoming. The Japanese didn't bomb Pearl Harbor because it's impractical to sink ships at sea--it's just easier to hit slow- or non-moving targets.

  • by sochdot ( 864131 ) on Saturday October 31, 2009 @01:20PM (#29935531) Journal
    I'd just like to point out here that the 28 people were not killed by the failure of the intercept system. They were killed by the nice folks who launched the missile in the first place.
  • Re:Poor QA (Score:3, Insightful)

    by Entropius ( 188861 ) on Saturday October 31, 2009 @01:21PM (#29935543)

    The US's "vast military superiority over everyone else on the planet" is due to us spending an equally vast amount of cash on our military.

  • Re:Poor QA (Score:4, Insightful)

    by OeLeWaPpErKe ( 412765 ) on Saturday October 31, 2009 @01:34PM (#29935601) Homepage

    You missed the third option, which is for the motivation behind the firing of rockets to be removed.

    http://www.youtube.com/watch?v=iNrCMdFoZqQ [youtube.com]

    So who do we allow to settle there ?

    The "kingdom of Egypt" (the state of the Farao's) ? (exterminated to the last man by muslims)
    The Hittite Emptre ? (exterminated by the Greeks, Romans, Persians)
    The kingdom of Israel ?
    The Assyrian Empire ?

    Which of these do we restore ? (note that the palestinians, or to be more exact, the arabs only come into play about 4500 years after the Assyrian Empire)

    Which do we restore ? And why do they have more rights than all the others who conquered that piece of land ?

    Note the obvious truth : the Jews controlled Israel about 4300 years before the arabs even left their tiny province ...

    What if some Greek starts firing rockets at the Arabs ? Will you tell them to leave ? He has at least as much right to Israel as they do ? What if the Jews start firing rockets into Jordan (territory that was part of the kingdom of Israel) ?

    And of course, you shouldn't count out yourself. You're an Indo-European living in America. It seems hypocritical in the extreme to tell others to leave conquered lands. Your province of origin is northwestern Iran, every other place on this earth indoeuropeans live (including Europe), is obviously conquered from someone else.

    So when will you give the good example ?

  • Re:"User error"? (Score:4, Insightful)

    by Sir_Lewk ( 967686 ) <sirlewk@gmail. c o m> on Saturday October 31, 2009 @02:02PM (#29935785)

    Integer arithmetic does not accumulate error, only floating point does that. Now they may have been using floating point, but his point is they should have been using integer arithmetic.

    Had they been doing so, it could have run for 13 years with absolutely no accumulated error.

  • Re:Poor QA (Score:3, Insightful)

    by Jeremy Erwin ( 2054 ) on Saturday October 31, 2009 @02:17PM (#29935905) Journal

    Are you sure that the computer was even capable of IEEE floating point? wikipedia [wikipedia.org] suggests that the computer used a 24 bit word.

    Although IEEE 854 float uses a 23 bit mantissa, 8 bit exponent and a sign bit, the wcc might well have used a proprietary scheme.

  • Re:Poor QA (Score:4, Insightful)

    by danlip ( 737336 ) on Saturday October 31, 2009 @02:44PM (#29936081)

    The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

    Sounds like a user interface problem to me. Given the potential consequences of that particular user error, the fact that the autopilot was still engaged should have been made more obvious to the pilot. (e.g. when the plane computer sees that a struggle is going on between the autopilot and the manual controls, it should prompt a loud, un-maskable synthesized voice shouting "THE AUTOPILOT IS ENGAGED, YOU IDIOT!")

    Or if the pilot is pushing hard on the stick the autopilot should disengage (with loud alarms).
    If I tap on the breaks in my car the cruise control disengages, it does not fight me.
    - Dan

  • by Jane Q. Public ( 1010737 ) on Saturday October 31, 2009 @03:27PM (#29936343)

    The obvious problem in the article, if you read it, is computer's finite precision, and how it is dealt with. By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

    Not at all, since correcting an inappropriate hardware design with software is like fixing an automobile that was designed with square wheels by manually sawing off the corners to make them octagonal instead. You could create a recursive software routine to continue sawing until the wheels were a good approximation of round, but that's an awful lot of sawing to fix something that should have been right in the first place.

    The clock in modern systems is nothing but a hardware register that gets incremented periodically (as correctly described in the article). The ONLY rounding error introduced by software is in converting that number to decimal. But rounding had nothing to do with the problem described. The appropriate solution is a better hardware design, not attempting to patch or correct it in software.

    The problem was error accumulated in the clock register itself due to the imprecision of the clock, and overflows due to the inappropriately small size of the register. Both are hardware issues and represent bad design decisions. They way to fix them is to design the hardware properly in the first place so that it is appropriate for the job at hand.

  • by jbolden ( 176878 ) on Saturday October 31, 2009 @03:43PM (#29936479) Homepage

    By 'computer', the author could have easily included the system libraries that are actually doing all the rounding and overflows instead of implementing arbitrary precision in software.

    Everyone defending the way 'computers' is used in this article, and conflating it with 'processor' is a complete idiot.

    This is a programmers blog. We don't conflate that sorts of things. If a program is using the wrong library that's not a problem with "computers sucking at math" but a problem with "programmers not understanding arithmetic libraries very well". The topic of computer arithmetic and the issues with various representation is covered standard in undergraduate programming classes. In other words these problems happened because:

    1) They picked the wrong programmers
    2) The didn't do QC
    3) They didn't have test libraries that tested their systems correctly.

    etc...

    Computers don't suck at math. Nurses can do 98% of what a doctor does and for most of it more quickly than the doctor. It is that other 2% which is the difference between the doctor's education and the nurse's.

  • Re:Poor QA (Score:2, Insightful)

    by OeLeWaPpErKe ( 412765 ) on Saturday October 31, 2009 @03:57PM (#29936581) Homepage

    If you knew more about the Middle Ages, maybe you'd understand that the population did not move so much as changed religion over the centuries.

    No offence, but you really should read a bit about Arab history, and pay attention to just how much ethnic cleansings these people comitted. The population of Europe, you are correct, did indeed merely change religion ("mostly", as there was certainly no shortage of armed conflicts, though they declined over time. Slowly). The population of the middle east was eradicated, several times in fact. Everywhere, muslims have always created conflicts along ethnic lines, even with "fellow muslims" (google "Sudan" or "Darfur", and note just how racist any brotherhood islam supposedly provides really is. And to tell the truth, just walk into a European city and look for a few Turks and a few Moroccans, and note how much they like eachother. See for yourself).

    After researching arab/islamic history, any reasonable person would seriously ask himself what exactly is so terribly remarkable about this German guy from WWII (and don't google "aymin al-husseini", it will not improve your view of these people).

  • by Jane Q. Public ( 1010737 ) on Saturday October 31, 2009 @10:30PM (#29938819)
    I'm obviously not a hardware designer? That's funny. I am not the cluless one here. How about some simple math? Maybe you would learn something.

    A 24-bit register, with clock ticks every 0.1 second, would overflow in less than 20 days. And if the clock ticks were faster, then it would overflow even sooner. No wonder they recommended rebooting the system every few days.

    Of course I do not recommend an infinitely large register. Simply one that is large enough for the job at hand. This one obviously isn't. Further, a 0.1-second resolution clock is obviously not adequate to a job requiring this kind of precision.

    If the hardware clock is off (not overflowed but INACCURATE, which was the real situation here), no amount of software tweaking will properly fix the problem. The article did not state but implied -- incorrectly -- that the clock register was accumulating rounding errors; that is not the case. Nobody makes system clocks that way, nor did they in the 90s or even the 80s. The system clock is nothing but a counter that is incremented every clock tick. The actual problem was that the clock ticks were not sufficiently precise, so over time the count was off. Math libraries and rounding errors played no part whatsoever in that error.

    Finally, I would like to point out that today's standard PC-type system clocks are large enough that they won't overflow for 100 years or so; that is the obvious and proper solution to the overflow problem. The problem of clock ticks that are sufficiently precise for timing of missile navigation, as far as I know, has not been addressed on standard PCs, however, and they do not try to correct for that in software because the adequate precision in the clock simply does not exist. It would amount to tilting at windmills. Keeping a count in software of the number of times the register overflows is also NOT an appropriate solution for a system clock, nor is any software tweak, because software by definition is volatile while the hardware clock is not. In other words, nobody does it that way, dude, because it's just plain the wrong answer.

    As for your final comment, most Unix programmers know what epoch time is, when it started (00:00:00 UTC on 1 January 1970 according to ISO 8601), and when that date will roll over in the counter (approximately 65 YEARS later, so it isn't much of an issue). Nobody is arguing that we should make a missile system that needs to last, unmodified, for over 65 years. But proper hardware design in the first place, which was certainly possible at that time using ASICs if not straight-up custom chips, would have eliminated the problem.

They are relatively good but absolutely terrible. -- Alan Kay, commenting on Apollos

Working...