Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Media Technology

How Journalists Data-Mined the Wikileaks Docs 59

meckdevil writes "Associated Press developer-journalist extraordinaire Jonathan Stray gives a brilliant explanation of the use of data-mining strategies to winnow and wring journalistic sense out of massive numbers of documents, using the Iraq and Afghanistan war logs released by Wikileaks as a case in point. The concepts for focusing on certain groups of documents and ignoring others are hardly new; they underlie the algorithms used by the major Web search engines. Their use in a journalistic context is on a cutting edge, though, and it raises a fascinating quandary: By choosing the parameters under which documents will be considered similar enough to pay attention to, journalist-programmers actually choose the frame in which a story will be told. This type of data mining holds great potential for investigative revelation — and great potential for journalistic abuse."

This discussion has been archived. No new comments can be posted.

How Journalists Data-Mined the Wikileaks Docs

Comments Filter:
  • by MimeticLie ( 1866406 ) on Sunday June 12, 2011 @11:34PM (#36422094)
    Isn't that one of the major reasons we have journalism? To synthesize and contextualize information? If the contextualized (or perhaps editorialized, depending on your point of view) information was the only kind available, then yes that is an issue. But with Wikileaks, the data is there for anyone who wants to parse it.

    This strikes me as being similar to when Anderson Cooper was criticized for calling Mubarak a liar. Or the behavior that Colbert mocked the White House press corps for at the correspondents' dinner. Pretending that journalists are free of bias doesn't make it so, and saying that they should just regurgitate facts and talking points verbatim is counter-productive. Reasoned analysis should be encouraged.
  • Re:Not Newsworthy (Score:3, Insightful)

    by Anonymous Coward on Monday June 13, 2011 @12:22AM (#36422284)

    I think you miss the point - that it was used in a journalistic context most certainly *is* newsworthy: the AP guy was going to great lengths to stress evidence-based reporting, and uncovering associations, vice pre-supposing those things and backfitting the data.

    Data mining - like stats - allows bias to creep in quite readily, and once a study, a number, a story is out there, it's very difficult to pull it back, even when it's demonstrably wrong, biased or fabricated.

  • by buchner.johannes ( 1139593 ) on Monday June 13, 2011 @12:44AM (#36422374) Homepage Journal

    Worked miracles after I've gotten around the ugly HTML format they use to release all those INFORMATIONS. Still, there was very little new or worthwhile in the heap of those news clips and rumour aggregations. Frankly, the more I grep it, the less it looks like the "largest leak in history", and the more it seems like "the largest controlled release of information" in history.

    / takes off conspiracy theory hat // flame on

    When you use grep you have to know what you grep for. You can not stumble upon a search keyword with grep.
    Clustering allows that, if you let it build the clusters itself. Perhaps you are missing out on the interesting bits.

  • by Anonymous Coward on Monday June 13, 2011 @01:02AM (#36422432)

    Life's a bitch. Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

    Is it more important to prop up the current system to keep a few agents of the empire safe from harm or is it more important to try to bring some sanity to the whole entire thing and do some longer-term good by shedding light on things people are afraid of showing to even our own public?

    Whether one agrees with the leaks or not, it's quite obvious from the cables that we're doing some rather unsettling things that I don't want to be associated with. I'm more concerned about the long term effects of that than the leaks themselves.

    It's a sorta philosophical debate... it's not a crime if you don't get caught, I guess. But now we're caught... what now? Pretend these things didn't happen?

  • by MimeticLie ( 1866406 ) on Monday June 13, 2011 @01:19AM (#36422498)
    That's part of the point of the video, using data mining techniques to broaden analytical tools beyond a simple keyword search and the preconceptions it can reinforce (the reporter mentions seeing a cluster of tanker truck incidents that was bigger than his organization was previously aware). He ends by noting that the way one writes the algorithm can determine what trends pop out and thus how the story is framed, which seems like a perfectly reasonable statement. Then someone (either the submitter or the Slashdot editors) transforms that into a "great potential for journalistic abuse."

    I don't have an issue with the methodology portrayed in the video. But to than take the presenter's words and twist them to support a "just the facts, ma'am" style of journalism seems dishonest and unproductive.
  • by wvmarle ( 1070040 ) on Monday June 13, 2011 @02:52AM (#36422802)

    If you know what's in the documents, then life gets easy of course. The trouble is that usually you do not know what's in the documents without reading them. And if there's nothing new, that's a pity. But anyway the fact that one could say "there is nothing but local newspaper clips and gossip" in a set of documents indicates that they actually went through them all.

    And for sure with the WikiLeaks documents there's a lot of noise in it. The same will be with the Palin e-mail trove. And finding the interesting bits out of that enormous noise that's what journalists are for, and what those interesting bits are no journalist will know beforehand - which is exactly why they are interesting.

  • by Dails ( 1798748 ) on Monday June 13, 2011 @04:36AM (#36423100)

    we're doing some rather unsettling things that I don't want to be associated with

    And that's why you're not some sort of government agent doing those things. This attitude bothers me for the same reason the "No blood for oil" types bother me. You don't get how important that sort of thing is. No blood for oil? Then what will you shed blood for? Losing oil supplies will so vastly change your way of life that you would argue it impossible if someone accurately showed you. If you think shady goings-on are an endeavor unique to America, you need to wake up. Every country (EVERY country - if you're not an American, believe that your country does it, too) does that. Even if only to stay in power and not out of a desire to provide for the people, every government strives to provide a certain lifestyle or quality of life to the people, and this is the price. If you don't like it, stop doing anything that requires oil (drive a car, use electricity, buy processed foods, etc). Don't get upset at the government for doing what it has to to provide you with something you'd complain about losing (probably here).

  • by cold fjord ( 826450 ) on Monday June 13, 2011 @04:44AM (#36423118)

    Perhaps the US shouldn't be doing things that it has to keep so secret. That's just a consequence of empire-building. Preach one value to the masses, do something else in practice.

    Is it the duty of the United States government to serve the interests of the United States, as opposed to say, Iran? Is it the duty of the United States government to care for and protect its people, as opposed to say, the people of Venezuela? If so, then it must differentiate between different sets of interests, American, and those of others.

    If American citizens have been taken prisoner unlawfully by pirates, the United States government could try to negotiate with the pirates. If the pirates want $1,000,000, but the US is willing to pay $20,000,000, should the government go in and up front announce the maximum amount they are willing to pay instead of try to pay the least amount? Wouldn't that be a fundamentally stupid bargaining tactic? But to do that, they would need to keep secrets from the pirates. Well, not just pirates, they would need to keep it secret from the media, since there are many media outlets that would gladly publish it, and force the US to pay $20,000,000 instead of $1,000,000. So, do you think the US should keep the maximum bid a secret and serve American interests, or announce it and server pirate interests by undermining the government's own negotiating position?

    Let's say negotiations with the pirates are going badly, they heard in media that the government is willing to pay $20,000,000 but they got greedy and now think they can get $50,000,000. The US Government isn't willing to pay that much, decides to use a commando raid to rescue the hostages while stalling in negotiations. Military actions are generally at least twice as effective over short periods of time when the attacking force attains surprise. Even if the pirates think it is possible, they don't really know if, when, how, who, or where they will come from. Should the US Government announce to the pirates that it has given up negotiations, and that it is going to use military force to free its citizens? If not, that would mean keeping a secret from the pirates - do you oppose that? Of course, it will also have to keep the rescue plan secret from the media as well or it will be published, the pirates will see it, and will be prepared to defeat it. Should the government tell the next of kin that it is going to try a military rescue? They might tell the media, or their kin being held by the pirates, and either the media or the prisoners might tell the pirates. So, it looks like we can't tell the pirates, the media, or the next of kin. What about other people in the United States? Same problem.

    As part of the planning for the rescue mission, it appears that it would be really helpful to refuel some aircraft in a country near where the pirates are holding the American captives. This third country has a government that is friendly to the United States, but much of the population is hostile as they are being influenced by religious extremists from outside their country. The government of this third country agrees to the refueling operation at one of their island military bases, but demand that it be kept secret to avoid agitating their citizens. Since it helps the mission of recovering Americans help hostage, shouldn't the US make use of the island for refueling? What about the request to keep it secret? Should the US stir up problems in the country by making it known, despite the request of the government? If the use of the island is revealed, it could hurt diplomatic relations, and perhaps even generate civil unrest, getting people killed. Shouldn't this be kept a secret? From the pirates? From the media?

    During the flight to the pirate locations, and on the ground, US forces will be using radios for command and control, and various flight operations. Should the US inform the pirates about the radio frequencies it uses? What about the media, who might listen in? Suppose a

  • by Archtech ( 159117 ) on Monday June 13, 2011 @05:50AM (#36423328)

    Mark Twain summed up the central problem of journalism with his epigram, "Get your facts first... then you can distort 'em as much as you please". But, amusing as it is, this completely misses the point! In the very process of "getting your facts" you have the opportunity - indeed, the obligation - of selecting them from among the infinite number of facts that you could choose. Having selected the facts that you think are most important, there is no longer the slightest need to distort them. The work is already done.

    Suppose you are the New York Times, and you are reporting on events in Afghanistan. You have a certain amount of space, so do you write up the IED explosion which killed a couple of NATO soldiers and put a few more in hospital - or do you describe the NATO helicopter raid that killed a dozen villagers and wounded another few dozen? Well, your readers are far more interested in the fate of NATO people (especially if they are from the USA); moreover, they don't particularly want to read about how their glorious forces have accidentally (or otherwise) killed a lot of civilians. So it's a no-brainer - you write up the IED event. After a few years of such a policy, consistently followed, readers get the idea that all that happens in Afghanistan is that NATO soldiers occasionally get blown up. Yes the NYT has accurately reported the facts. It hasn't reported all of them, but its editors could argue that such an attempt would be physically impossible. The only practical way of giving a more balanced impression would be to read, as well as the NYT, a newspaper that takes an anti-NATO, pro-Afghan point of view. But no such newspaper can survive commercially in the US market, because it wouldn't sell enough copies (even if it were allowed to go on operating for long).

    Indeed, the Wikileaks documents currently under discussion are subject to such a filtering effect too. Remember, all those documents were written by American officials, for US government consumption. You won't find many mentions in there of atrocities by our forces - even if the US authorities in Afghanistan or Washington were aware of such atrocities, they wouldn't put them into messages with such a low level of security. What you can expect to find is a fairly high level of unguarded opinions - either honest or carefully angled to make a particular desired impression.

Old programmers never die, they just hit account block limit.

Working...