Data-Sifting For Timely Intelligence Still an Elusive Goal 131
gyrogeerloose writes "Although there was evidence to suggest that the Japanese navy was up to something in December 1941, that information was scant and came too late. Today's intelligence agencies have another problem altogether — more information than they can deal with, and computers aren't helping as much as one might expect for reasons that will be familiar to Slashdot readers: computers can crunch numbers faster and more accurately than humans, but they're still easily baffled by language as it is commonly used in the real world. Metaphor, slang and simple figures of speech can confuse the best algorithm and, as quoted in the linked article in the San Diego Union-Tribune, 'A system that takes a week to discover a bombing that will occur in a day isn't very useful.'"
The Real Issue (Score:3, Insightful)
No More Data Needed (Score:3, Insightful)
Today's intelligence agencies have another problem altogether — more information than they can deal with . . .
This is the ultimate argument against those defending increased surveillance activities to fight terrorism (or any other crime). Intelligence agencies already have way more information than they can deal with just from public sources. 99.999% of it is the noise of people going about their normal lives. Getting out the interesting bits is a hard problem, and adding more is only going to slow you down. It can help if you've already nailed down a good list of suspects and therefore have a small, targeted list of people to watch. But if that's the case, what's the big deal about getting a warrant?
Re:Any statistician could have told them that (Score:4, Insightful)
Isn't that kind of begging the question? The problem here is, as you said, not being able to discriminate between useful and useless data. So how do we know what's relevant (a.k.a. useful)? Do we only collect data by using humans interpret the data? If so, then the role of the computer is much diminished. Do we automate the process by having computers discriminate between useful and useless data? Well, that's exactly the problem - we can't figure out how to do that yet. Even if we only have relevant data, how do we assign semantic value to the data in order for the computer to properly parse the data and give us semantically useful results?
It's not as simple as just collecting relevant data. Even if it were, that in and of itself is a major hurdle.
The problem is misunderstoood... (Score:2, Insightful)
The meaning of a piece if a communication involves not just the text, but the specific context (who is the source, who is the recipient), the social context, and the cultural context.
For an example of the first - a 8 year old who says "I'm going to shoot her" (especially if the context is a game of cops and robbers) should be understood differently to an adult to says the same thing. And the meaning also varies depending on whether the adult is a photographer or not, and whether 'her' refers to a model or an ex-wife. None of these things may be made explicit anywhere in a any intercepted communication.
As another example, a description of a gory murder by a wild animal carries a very different meaning if the text starts with the words "Once upon a time".
You can't separate text, meaning and culture and consciousness. Which is why the problem of interpreting natural language is so hard; harder than even the article author seems to acknowledge.
Re:America forced Japan's hand (Score:4, Insightful)
You're missing the point, denying the Japanese access to steal and other resources during wartime was, for all intents and purposes, and act of war. Without those resources, Japan wouldn't have been able to hold the ground they had already taken, let alone continue advancing. When the US cut off access to critical war resources, Japan had only two choices: End the war almost immediately and retreat back to Japan proper, or take control of the resources by force. For political and ideological reason, the former option wasn't much of an option at all.
Imagine if the US were fighting a major war (against a powerful, conventional enemy) and OPEC said "No more oil exports for a while". You don't think the US govt would see that as an act of war?
Easy Fix (Score:3, Insightful)
It's a cryptography problem. There's information stored in codes. Sometimes the code is regular language, sometimes slang, sometimes coded language, but it's all decoding meaning from words. Problem solvers are better at solving the problem than having someone program a solution when no one actually figured out the solution, or having some linguists come up with direct matches that miss a large portion of what they want and get huge numbers of false positives.
But hiring someone that doesn't know what they are doing and training them is anti-American. We'll import our labor at a higher cost than actually train someone for the position. So I don't think anyone will ever do it. Google proved me wrong, but it doesn't seem anyone else is following their lead.
A week? A day? Whatever. (Score:2, Insightful)
Re:For example (Score:2, Insightful)
Are you a TSA agent?
Re:The Real Issue (Score:3, Insightful)
I would argue, probably nothing. In hindsight, it is always easy to second-guess why the system "didn't work," but in fact, all these same clues occur in thousands of other cases where nothing ever comes of it. So, much of this disappointment comes from people unconsciously (and always in retrospect) hold systems up to an impossible standard - not only where an infinite number of KGB agents have an infinite amount of time to track everybody, not only where a super-intelligent AI acts in the most rational possible manner, but beyond that, demanding prediction of the future given some information which is relevant, but nevertheless insufficient.