Why Computers Suck At Math 626
antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"
Re:Fixed point numbers? (Score:3, Interesting)
yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision
The OP's suggestion for scheduled reboots could be solved by having redundant systems, no? System X comes up at 0 hour mark, System Y comes up at 233 hour mark. System X switches to System Y and reboots at 466 hour mark; System Y only has 233 hours uptime.
Re:Didn't read TFA but... (Score:4, Interesting)
because military computers are 20 years out of date to start with. Heck even the awesome modern land warrior hardware, is 10 years out of tech date. Heck they could probably shave 5 pounds off of the hardware by using modern chips, and displays.
Military Spec is only good at rugged. up to date with the best is far behind.
Re:Poor QA (Score:5, Interesting)
Hindsight is almost 20/20. Except that the original purpose of the Patriot was to shoot down much slower aircraft, flying parallel to the earth, not ballistic missles. This new use for Patriot was essentially experimental and had had been rushed to war - and in war you run into alot of unexpexcted circumstances. For example, conventional doctrine in the 1980's required Patriots to move constantly on the battlefield to avoid air attack. The clock would then reset when repositioned. No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100. But in a missle defense role they did. There is a good GAO report on this.
They do exactly that! (Score:2, Interesting)
Seriously flawed reporting (Score:4, Interesting)
There's no way a real-time missile tracking system is going to be dealing with time at an accuracy of 0.1 sec.
A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.
Who knows what the real story is vs the garbage that was reported, but even if there was a cumulative error that's the fault of the programmer rather than a lack of a computers ability to do math. You do your error analysis and use whatever accuracy needed to keep the errors in a tolerable range.
The part about the system running for 100 hours was pure gibberish. Yes, we can all divide that by 0.1 sec, but what on earth does that have to do with a real-time tracking system tracking a target is acquired a few minutes ago?!
A better title for the story rather than "computers can't do math" would be "we can't do tech reporting".
Re:Poor QA (Score:5, Interesting)
The Iron dome [wikipedia.org] system works perfectly. It's just not capable of protecting any kind of large area. It can, however, make a military base invulnerable to rocket fire, and they're working on making the system mobile, to protect tanks. The only real problem left for doing this is the power requirements.
For ships, another such system exists, and protected the ships perfectly well from those same rockets fired by hizbullah. It's "protection range" ? In the largest deployment about 200 square meters.
There is also the problem that a downed missile presents. What is a "downed missile" ? Well it's a large collection of very-high speed pieces of metal that have been heated up by a large explosion that's about to crash into the ground. So far so good.
So what is "the ground" in the case of a hizbullah or hamas missile launch ? Well it's the center of the city that's controlled by the terrorists. It's their human shields. Markets, schools, you name it. So a successfull missile intercept is reported in the press as "Israel fires a rocket into a palestinian kindergarten". That is, by the way, the literal truth, even if the rather important detail of a rocket's presence above said kindergarten is left out. In the deployed missile intercept installations "the ground" is chosen to be something else, like the ocean surface.
Missile intercept systems are no solution for terrorism. Most unfortunately, the only solution for those rocket attacks is preventing they're fired in the first place. Which obviously requires either palestinians police their own terrorists, or someone does it for them (that's called "occupation").
These systems work, they are deployed successfully in the field. They're no silver bullets, and any bullet that's fired, whether a missile or a missile-intercept-missile, will eventually hit the ground at rather high speeds. Which makes their use above urban environments result in civilian casualties.
Re:Curse of binary floating point (Score:3, Interesting)
Or just keep track of things in increments that make sense in binary. 0.1 seconds is arbitrarily chosen to be nice number in decimal. They should have chosen an arbitrary time interval that is a nice interval in binary, the base they were actually using.
This article isn't about how computers suck at math, it's about how people suck at math.
Re:Poor QA (Score:3, Interesting)
Agreed completely!
Why did this thing not get designed with continuous feedback on position instead of a closed loop with cumulative errors?
Also, it's not the computer that sucks at math. It's the guy who decided a cheaper programmer was more cost effective than a good one. Turned out not to be a very wise decision.
Re:Poor QA (Score:4, Interesting)
1. The Patriot version used in the Gulf War (round 1) was not designed to be used against Tactical Ballistic Missiles (like SCUDs), but against opposition aircraft. A fighter isn't going to be flying as fast, and thus the error is going to be much smaller, which means the missile would probably still find the plane.
2. The Patriot has a quite good record against SCUDs (after the software upgrades). Much better than the Soviet SA-2s did against B-52 raids in Vietnam.
3. Systems don't always work right the first time, and if you do a full on test to start with, and something goes wrong, it's a lot harder to find where the error is than if you test one part at a time.
Re:Curse of binary floating point (Score:3, Interesting)
I believe that the problem was not that 0.1s could not be represented. After all, the article states that there were 0.1s ticks and they likely counted ticks as integers. No problem there.
However, I gues that 0.1s was no integer multiple of the system clock. If for example the tick should occur after 6,666,666.67 clock cycles, the system likely emitted a tick after 6,666,667 clock cycles. Such a system would accumulate 3.3 clock cycles of error each second.
The solution is to keep an explicit error term: Use Bresenhams line drawing algorithm. Imagine drawing a line where X are the clock cycles and Y are the ticks. Minimum error integer algorithms are known for decades for this problem and Bresenham is a very elegant one.
Re:Poor QA (Score:4, Interesting)
I ran into this when someone was using my library with DirectX. I was initializing a filter kernel and using double-precision calculations, but apparently DirectX put the processor in single-precision mode, so all my double-precision calculations weren't done as such. Same compiled code, just a run-time difference. I took the opportunity to improve the algorithm to work even with single-precision floats, which was probably good to do anyway.
Re:And this is why... (Score:4, Interesting)
I could see designing the system to synchronize both launch times and observations with a timer tick (it wouldn't be surprising if the whole system was driven by the timer interrupt), and then you're not going to have an error due to the spacing between ticks.
I am more bit dubious about the 24 bit thing, though. Was it fixed-point or floating-point?
I don't think it was a float. What would that be? Maybe 16 bit mantissa, 1 bit sign and 7 bit exponent would seem to be the likeliest bet for a 24 bit float. If so, then after about two hours doing t += 0.1 would stop changing t, and the error would be much bigger.
So presumably it was fixed point. But if you're doing it fixed point, instead of storing x, you store nx in an int, for some appropriate scaling factor n. But if you're going to do that, surely you'll choose n in a smart way, and in this case the obvious choice, as pointed out by many posters, is n=10. This is not only the obvious choice because it gets you more precision, but it's the obvious choice because the easiest, most obvious and most standard way of coding timers is to just increment a register with each tick. It would be silly, for instance, to let n=2^8, and then increment a register with 0.1*2^8 = 0x20. It would be a very unlikely assembly language programmer who would have put an add reg,20h opcode in interrupt hander code when inc reg would have worked.
Now maybe at some point the timer value would get converted to a float for computations. But that surely wouldn't be a 24-bit float.
So maybe the article has mangled things and it was not a 24-bit register, but a 32-bit float, with 24-bit mantissa, 7 bit exponent and 1 bit sign, and the "24" in the article came from the mantissa. That's a much more realistic choice. Still, the standard way to handle timers is to just increment a timer variable. So what I could see happening is this. There is a timer system variable t at full 0.1 second precision incremented on interrupt. (That's how PCs used to work--maybe still do--except the timer resolution was 1/30 sec.) Then for their launch calculations, they do: (float32)t / 10. And now they're going to get nasty roundoff errors as the mantissa gets filled up. At the 36 hour point, t is already about 23 bits long. So when you do a float divide by 10, you'll certainly have roundoff problems. But you're still not going to be more than one tick (0.1 sec) off, because each tick still adjusts the mantissa, while the article says they were 0.36 seconds off.
So I think something got mangled in the article. Or we had a really unlikely assembly language programmer who had floating point code executed with every tick of a timer interrupt. But even if the interrupt is only at 10hz, that's just completely contrary to the instincts of an assembly language programmer. And this would have been done back in the hey-day of assembly language programming, when one would try to optimize every clock cycle one could. (And, yes, I've worked with timer interrupt handlers, both on the Z80 and the 8086.)
Re:"User error"? (Score:3, Interesting)
When you write programs which deal with time like this, you never use floating point math. If your required precision is 1/10 of a second, your units are in 1/10 of a second. You do not resort to floating point. I'd probably use 1/100 or go to 64 bit and use 1/10000 of a second. With a high level language, there are better ways to do it of course.
The reboot hack is a reasonable workaround in the field, as long as the downtime is documented and understood by leadership, and as somebody mentioned, the severity of the problem needs to be communicated to the field. Ship an alarm clock with the launcher, with clear instructions to reboot it and reset the unit when the alarm clock says so.
The *requirement* of this kind of field maintenance from overstressed people in the field is a bad idea. When writing disaster recovery instructions for fieldwork in normal systems, I like to remind my coworkers...
"...these instructions are for the *least* qualified admin, three years from now, at 2:00 in the morning, on Christmas, to be able to do this without assistance, with second line management yelling at them, while everyone else is on vacation, partying, or utterly unreachable. They need to be able to find the instructions, and execute them, with a minimum of stress or doubt as to the accuracy of the documentation."
I've never done military work, but I can just imagine...
...the new guy doing shift rotation on the Patriot system at 2am on Christmas, he never got proper training, isn't sure if the last guy rebooted the system, realizes his cell is dead... now there's been talk of heightened awareness. An alarm goes off. There's a sticker next to it. "For the love of all that is holy, Press this button when this alarm goes off!" Does he hit the button?
Still alive and well (Score:4, Interesting)
Crap like this was alive and well when I was in uni and its still alive and well.
Witness: Limits to Growth written by Meadows et al: http://en.wikipedia.org/wiki/The_Limits_to_Growth [wikipedia.org]
Consider that book was written in 1972. I was programming computers in 1972. I actually did a course in numerical analysis in 1972 and just re-read the first 10 pages or so. I happen to have read a masters thesis that came out of the Colorado School of Mines where the author stated Meadows' Runge Kutta Numerical Integrations did not converge.
Yet that book is still often quoted. Its been flawed from the get go. So consider something else! How fast were the machines that Meadows used? How big? What would be the MOST SOPHISTICATED model he could use at the time. How could _anyone_ take seriously predictions made by a primitive model run on such a machine?
Witness: The current discussion about Global Warming and Climate Change. The change in CO2 over the last 100 years is about 100 ppm if you can believe the data. This is 100/1,000,000 = 0.0001. Now the thing is this. A 32 bit float holds about 6.9 digits of precision. Lets call it 7 digits. If one were to add a whole number of some kind to the fractional change of the CO2 as measured relative to the total gases in the atmosphere then one has 7-4 = 3 digits or less to work with.
Of course one can use a double precision float. That isn't my point. One has to be an EXPERT in order to avoid huge problems with propagating rounding errors.
Its not just about pretending computers use base 10 when they don't, its about knowing the actual properties of a number of type float and what the consequences are when we use it.
In the case of that rocket I suspect the rounding error can be solved by normalizing everything so the time line is not in seconds but is actually in clock ticks... as accurately as they can be determined of course.
But in my career I have seen so few programmers who can do this that I've never even needed to look at a finger or a toe for something to count on. Nada - never met one.
I'll give another example. More than one project team that I worked with had no idea how floats even work! To sit there and try to use floats for their Accounts Payable and Accounts Receivable and then say they can't understand why nothing will balance? Arrghh! IMHO its downright incompetence. They needed to use comp which COBOL supported which is base 10 or normalize all their money into pennies and handle the decimal when the data was read in and printed.
Re:Poor QA (Score:3, Interesting)
The problem with the Palestine / Israel issue is that Israel is not working towards any solution. What is Israel's long term solution? Have sovereign absolute rule over a few million people in a prison that their citizens can, at will, and with army backing, snatch up pieces for settlement? Oh yeah, that is going to work out. Palestinians either need to be sovereign or citizens of Israel. Israel needs to pick one because the keeping a ghetto of nationless people method isn't working.
Don't get me wrong. I am sympathetic to Israel in many regards, but they have fucked up the Palestinian issue with epic skills since 1967 onwards. Instead of immediate developing and executing a plan to 'deal with' the conquered land either through integration or by creating a sovereign democracy they opted to basically imprison a few million people form now until the end of time. It should come as a "no shit Sherlock" that 40+ years later these nationless people are pissed.
Israel needs to rip a page out of the American handbook on imperil power. If your flatten another nation you have three options.
1) You can integrate them into your nation as citizens and give them some level of enfranchisement as they did to Native Americans, Hawaiians, and Mexicans. This is not a basket of roses method, but as pissed as Hawaiians might occasionally be, I haven't seen many draw weapons or plant bombs.
2) You could commit to rebuilding a conquered nation more or less in your own image, as the US did in Germany, Japan, South Korea, and Bosnia. This is expensive, but when it works everyone leaves the table more or less happy.
3) Leave, stop trying to kick over their government with bombs, and accept the fact that these people are going to hate you for what you have done for a while and that only time is going to heal. This was done with Mexico, Vietnam, Haiti, North Korea, Lebanon, half of south America, etc.
Re:Poor QA (Score:5, Interesting)
The last fight between them happened in 2006. Hezbolah kidnapped a few SOLDIERs to trade for PoWs (a common thing since israel has a shit ton of prisoners).
Israel responded by sending in an army many 100s of times larger than lebanon's they bombed many buildings including hospitals, school, UN bunkers and apartment buildings. Hezbolah fired rockets back to show resistance.
In the end Israel killed 1200 civilians, 300soldiers, and a significant percentage of the countries economy. Hezbolah killed 120soldiers, 40civilians. Notice the fucking difference in ratios. Oh and the whole time hezbolah conducted rescue missions, gave out food and helped transport people to safety. So fuck off.
Also: "Hezbollah is now also a major provider of social services, which operate schools, hospitals, and agricultural services for thousands of Lebanese Shiites, and plays a significant force in Lebanese politics.".
Also hezbolah states that they distinguish between zionists and jewish. Their stated reason for firing rockets is continued resistance against israeli attacks and to put an end to any colonial entity within lebanon. NOT kill jews.
How the fuck parent got modded up is beyond me. Every single point is a verifiable falsehood.
Re:Poor QA (Score:3, Interesting)