Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Technology

Software Sorts Electronic Evidence 85

securitas writes: "The New York Times has a very interesting article about the legal industry using new search software to sort through electronic evidence such as e-mail, documents and recovered files, and the process that they go through to make the evidence usable. It has spawned an industry."
This discussion has been archived. No new comments can be posted.

Software Sorts Electronic Evidence

Comments Filter:
  • grep (Score:5, Funny)

    by astafas ( 232064 ) on Tuesday September 04, 2001 @01:26PM (#2252270)
    Lawyers discover grep?
  • The legal industry probably generates 50% of the Fed Ex traffic ;P Shipping all those reams of paper back and forth.

    Software to sort intelligently has been around for quite some time- googles 'i feel lucky' link is probably a great example of this. The number of times i've hit what i'm looking for with a few judicious '+'s using google is unbelievable.

    Sounds almost like they are reinventing the wheel...
  • policy to the rescue (Score:2, Interesting)

    by Anonymous Coward
    This is why it is critical to have a retention schedule so old emails don't come back to bite you. I haven't seen employees take this issue seriously even when there is a clear policy.
  • I thought the most interesting part was: "Responding to a request for documents during a merger review by the Federal Trade Commission took one company nearly a year because the electronic documents were kept in offices all over the world and in all manner of different formats". But the twist there is: would you like to have everything easy to find (and if so, how do you do that, what with all the fragmentation between office 97, office 2000, not to mention different file servers and such)? Or are you better off making it inconvenient for other litigants to get at the data?
    • I believe one of the services of these companies, in particular Ontrack, is to put all the data in one format and provide you with a "viewer" (with an outlook feel) to view the data they re-generated. This way you can have different formats, but if you pay enough you can get it uniform...
  • Wow, someone actually included the "archive" link within the article. Yay!

    *golf clap*

    -- Brett

  • by M_Talon ( 135587 ) on Tuesday September 04, 2001 @01:34PM (#2252321) Homepage
    Wonder if they would be kind enough to run that against my email box and sort out all the spammers for me? Then I could take it to court to request compensation for the bandwidth consumption as well as "emotional damages" because of all the pron spam :)
    • by Anonymous Coward
      Wasn't sniffing through e-mail done by something called Carnivore, which everyone here bitched about as an invasion of privacy? Bleh, be consistent about this stuff, if you don't want something looking through your e-mail, stick with that opinion.
      • Just to clarify, there's a difference between this stuff and Carnivore. Carnivore is/was basically a wiretap, a way to monitor ongoing communications both incoming and outgoing. What's being discussed here is a way to organize and sift through information that is archived and already subpoenaed. Apples and oranges, my friends.
    • I knew a guy that used to write code for the defense (electronic) industry. He told me that it could cost up to about $75 million to trace someone's email or telnet sessions if they did it smart enough. The problem comes out trying to get so many companies and foriegn countries to cooperate with you enough to look at their logs. He said it was because there isn't much goodwill towards the US in many parts of the world.
  • For the convenience of everyone who likes using the registration link, here ya go [nytimes.com].

    ;)

    -- Brett
  • by wiredog ( 43288 ) on Tuesday September 04, 2001 @01:42PM (#2252358) Journal
    The continuing investigations into the Clinton presidency ran into a major problem. Every email sent to, and within, the administration was saved. But not indexed. Apparently there are terabytes of the stuff. On tape. Sorting through it looking for evidence is likely to take years. And then it has to go to the national archives. And that's just the email. Add in all the other digital content they generated, all of which had to be saved, and there's a major problem.
    • On the bright side, archiving this ever increasing amount of material is very expensive, and the size of the archive is increasing faster than the size and sophistication of our army. Mankind will indeed live in a machine imposed era of peace, except instead of holding us hostage with our own weapons, we will be forced to use the resources formerly dedicated to making war for the preservation of the machine records.

  • Biased Searches? (Score:3, Insightful)

    by Robber Baron ( 112304 ) on Tuesday September 04, 2001 @01:49PM (#2252385) Homepage
    I can see a potential for even more widespread abuse. Couldn't searching for keywords give some bloodthirsty prosecutor the ability to present a biased, subjective, out-of context version of what was communicated? We already know of several instances where a lack of understanding of the technology coupled with a lack of understanding of the context under which a message was communicated has led to abuse by those in positions of authority.
    • Sure. That's why we have defense laywers as well.
      Waiting for my 20 seconds to expire....
    • What you're worried about is nothing new; lawyers have done this using relatively (or extremely) out-of-context quotes found on paper or remembered from conversation since courts were invented. :-) Besides, both sides can use similar software, and communications that might be used to present a different perspective could be obtained just as easily.
  • Link (Score:3, Funny)

    by sydb ( 176695 ) <michael@@@wd21...co...uk> on Tuesday September 04, 2001 @01:50PM (#2252389)
    Here's the correct link [nytimes.com] for those who'd like to register before they read the article.
  • when lawyers attourneys and judges have been replaced by computer programs.

    Lawyer Eliza: (Walks up to the witness) So how are you today?
    Witness: I am fine thank you.
    L Eliza: How long have you been fine thank i?
    Witness: I don't understand the question...
    L Eliza: Don't you really understand the question?
    Witness: That's right.
    L Eliza: Is it really that right?
    Prosecutor Eliza: Objection Your Honor! She is harassing the witness!
    Judge Eliza: Why are you concerned about my honour she is harassing the witness?

    and so on...
  • by nm42 ( 310685 )
    I'm not looking for a karma whore, but could anyone see for sure what the product is? or what the company is?
    The only company i see is OnTrack Data International. Anyone?
    I'm interested here cuz I do tech/litigation support in a law firm.

    The article does miss one important little detail. The first level of sorting is done by clerks or paralegals. Associates do the law-related grunt work, but that's AFTER someone making $10-$15/hour has gone through and sorted out the pr0n(trust me, lawyers get A LOT!), and other pointless crap.
  • by Minupla ( 62455 ) <minupla@gmail . c om> on Tuesday September 04, 2001 @02:01PM (#2252429) Homepage Journal
    I worked for one of the state level governments in N.A. and had access, and "da-buck-stops-here" responsibility for the IT side of "Archives". Archives is leglislatively required to hold in permenant storage, "All materials relating to the ongoing business of the government". This caused some real problems:

    1) we had a case of an outgoing elected official low level formatting their HDD on the way out the door. Had to be sent out to a special data recovery lab. (they can do some amazing things with scanning electron microscopes on half tracks and such)

    2) there are stacks and stacks of 8" floppy disks, in formats like IBM DisplayWriter, and other chunks of physical hardware that haven't been seen by mortal man in 20 yrs.

    Finding a chunk of info is damn tricky, but after you find it, you have to find something that can read the punchcard/papertape/magtape/floppydisk/harddisk in question. And due to a querk in how the original act was written (keeping in mind that these things were written back when data was carved on rock slates and format isn't a big consideration) we were required to keep it in its original form.

    I feel for someone with my job in 50 yrs. I ran away from govt work after that. It was scary!

    One plus side. EMP has a hard time taking out papertape!
    • there are stacks and stacks of 8" floppy disks, in formats like IBM DisplayWriter, and other chunks of physical hardware that haven't been seen by mortal man in 20 yrs.

      Hell, just try to find a 5 1/4" drive these days.

      • I used to have a cassette tape drive for a Vic 20. Whenever I had to load a program, I had to rewind the tape to a point before I knew the file was saved. The computer would responde with "Press play to continue...". I also had to be sure to keep them separate from my regular audio cassettes. It sounded like a cat fight from Hell if I mistook one for a music cassette.

        Those were the days.
  • This process has been around for years, and is still being refined. It is referred to as "text-mining" and there is some spectacular software out there to accomplish these tasks.

    The leader in the industry is a Company called Megaputer [megaputer.com], and their clients included the US government, Boeing, the CDC, and many large companies.

  • I know the tendency for unix enthusiasts to believe that you invented every useful technology in the 1970's, but it's simply not the case. Grep simply isn't suitable for interactive searches of gigabytes of data broken up into millions of files.

    To efficiently work with several years worth email, more advanced techniques are required. Specifically, you need a text indexing program tied to a relational database. While this doesn't give you any more power than recursive searches using the grep and find combo, it's much much faster as your keyword and message attributes can use b-tree index lookups and a cost based optimizer to reduce disk reads.

    That being said, it's still not that impressive of software. I'm certain that I could build the search component in a couple of weeks using Microsoft SQL Server (with the neat full text indexing feature) and a moderately adept gui developer could hammer out a decent interface in the same amount of time.

    Still, there's a difference between "trivial to implement with any decent rdmbs" and "I can do it with a 2 line bash script". You would do well to remember it.

    Your friend,
    --Shoeboy
  • I am all for the government writing any kind of search program, block program, security program or what have you for whatever purpose it wants.

    If we really want the Internet to permeate into our lives, then it should go into our lives as they really are. Perhaps some people will be less wary about leaving evidentiary data lying about on the Net.

    When we decry against censorware, or searchware or whatever, we are decrying a social use of technology and not the technology itself. Rather than stifling the developemnt of search technologies or other supposedly "authoritarian" tech, we should be adding to the debate about what kind of a society we live in.

    I will be writing a variant of this for a controversial website [adequacy.org] soon, in support of rigidly restricted appliance computers and limited-access proprietary content AOL style networks developing alongside the open Internet. In this society we have prisons, in which the prisoners can't use the Internet much because the software and hardware that would allow them to use it within prison rules (reliably, monitored by non-technical prison officials) does not exist.

    I would rather the educational and self-betterment resources available on the Net be extended to prisoners with the blessing of prison officials, so prisons which have lost their education budgets can restore these services cheaply.

  • by D3TH ( 15279 ) on Tuesday September 04, 2001 @02:12PM (#2252470) Homepage
    In reality, the biggest difference between grep and so-called "forensics" software is the emphasis on examining the data without modifying it and maintaining the chain of custody and audit trail. In fact, many experienced computer investigators do their jobs with little more than DD, grep, and various other Unix utilities. Most of the digital forensics software out there simply attempts to make this funcionality more accessable to your less tech saavy investigator. (The problems caused by inexperienced/unqualified investigators performing this type of analysis are beyond the scope of this response.)

    I am currently the designer and project lead for a cross-platform open source (GPL) digital evidence processing suite. It is intended to bring together the various functionalities required to perform this type of work, and (ideally) operate on whatever platform the investigator desires. Our primary development platform is RedHat 7.1.

    There are currently software packages out there that attempt to do this, including EnCase [guidancesoftware.com] and The Forensic Toolkit [accessdata.com] in the commercial arena and The Coroner's Toolkit [porcupine.org] in the open source arena, however they lack the broad filesystem support and/or true ease of use to make them usable by everyone. The other barrier is price as EnCase, for example, costs thousands of dollars per copy.

    We're well funded, and have already done a significant amount of work. We have some of our core components functional and plan on starting beta testing and releasing our first code drop later this year. If this field interests you and you'd like more information, or you work in the investigative field and have thoughts on what you'd like to see in such a tool, I'd love to hear from you.
  • It seems to me that this could also bring about another niche for programmers... software that goes through a companies systems and completely wipes out data. I'm sure this would cause havoc with a company's ISO procedures, but with more and more companies getting caught because of e-mail or other data that is archived on their tapes, I could see the need to make sure that they have control over the "evidence" that exists on their own systems.

    I'm not saying it's the right thing to do, or even that it's legal, but since most companies are not even aware of what data they do have backed up, and what is retrievable and what isn't, I could see this happening, if it hasn't already.
    • Obligatory Netscape story [jwz.org].

      Obligatory text to avoid the postercomment compression filter. I'm beginning to think that the trolls are right about Taco not being able to code; considering how much ASCII art I've seen in the last couple weeks, it's amazing that my little bit of HTML won't fly...

  • w o w...taht is about all that i have to say about this. why are they even worried about this?

    Admin - www.newspad.org [newspad.org]
    NewsPAD - the daily news source for geeks!
  • by cyberdonny ( 46462 ) on Tuesday September 04, 2001 @03:14PM (#2252685)
    If this software is so good at finding "hot" (i.e. incriminating or embarassing) documents, how long before the virus writers will "discover" the same techniques. Rather than just SIRCAM'ing out a random file out of the My Documents folder, spider the whole hard disk, and all reacheable network drives, and selectively mail out those items that score high on a "hotness" scale. This would make opening those SIRCAM attachments (using a Linux office suite, for safety...) much more rewarding...
  • About a half dozen lawyers spent weeks at a time in the period leading up to the trial in 1998 wading through through many thousands of pages of printed electronic documents. "It was a lot of paper," said one former government lawyer who worked on the case.

    Which explains a lot about why litigation is so expensive! :) What I find humorous, being in the jury pool for our county's Superior Court, is that we (as a society) can afford to pay a half dozen lawyers to sit around poring over printed e-mails, but can't afford to validate parking for jurors. Assuming you can prove the validity of your evidence, I can see how a method to automate this would be very attractive.

  • by raresilk ( 100418 ) <raresilk AT mac DOT com> on Tuesday September 04, 2001 @04:12PM (#2252965)
    One criticism of the NYT article is that it makes it sound like the legal profession just yesterday caught on to digital discovery and forensics. Although there are always some Luddites out there, lawyers who do major commercial or product liability litigation have been using digital techniques for years.

    As far as user-friendly interfaces for forensic-ware, and other suggestions by comment-posters for improving the technologies, don't forget that in order to be useful to a lawyer, digital forensic evidence must be admissible in a court of law. Nobody is going to settle a lawsuit based on some damning piece of deleted email recovered from their hard drive, unless you convince them that the jury trying their case is going to see a big blow-up poster of all the bad things they said in it. In order to get that recovered data into evidence (at least in the USA), the lawyer must "lay a foundation" that the evidence has some reliability. An eyewitness to an event, for example, can testify about things she was able to see or hear from her particular location, but her testimony about what might have been happening out of her eye-earshot is not admissible in court. Another way to lay a foundation is through a qualified expert opinion, for example, an accident reconstruction expert who measures the skid marks and applies a scientific method to determine whether the car was speeding before the accident. The point being, even if I as a lawyer could read up on the relationship between skid marks and vehicle speed, make those measurements on my own, and perform the calculations just as accurately, that would not do me a bit of good. I would still have to go out and retain someone with considerable expertise in such matters in order to get the court to admit the results of the calculations into evidence, or I never get to put them on my blow-up poster for the jury. And this is not just a gimme. Especially in federal court, there are specific criteria for the qualifications that an expert must have, and the demonstrated reliability of the expert's method, before the results can be admitted in court.

    So for those of you who are devising tomorrow's user-friendly forensics - a warning. No matter how point-and-clicky you make them, my lawyer colleagues and I will likely never touch them. Even though I am technically literate enough to grep anything you can grep, I'll keep on hiring one of you technical experts when I need some digital forensics done, because I need your experience, credentials and signature to convince a court that the results are reliable and not just wishful computer hokey-pokey by a lawyer who wants her client to win. (Also, lawyers don't testify in their own cases, as a rule, for various reasons.) This is especially true with things that *sound* somewhat unreliable, like recovering from low-level formats and such. The more extrapolation and guesswork is involved in the "recovery," the less likely it is to get into evidence.

    And if you're developing a search method, or some other new technique for data recovery, keep in mind that in order to qualify yourself and the technique as proper expert testimony, you're likely going to have to disclose quite a bit about how you did it in order to lay the foundation for admissibility. You can just throw those valuable little trade secrets and patentable methods out the window. That's another reason why legal tech forensic shops tend to rely on things like grep and dd rather than innovating - where's the big payoff? Now if you don't care about admissibility, and are just mining the hard drives of your ex-employees (or ex-spouses, or whatever) for business reasons, maybe that's a different story. But most people don't think they're about to get into a lawsuit until it happens, so I wouldn't be so sure.
  • Now that is a hit load of links.
  • What is now Autonomy [autonomy.com], the knowledge management company, started about ten years ago when Mike Lynch's PhD research was sponsored by the police in the UK, to find ways to scan the mass of witness statements that are gathered in a major incident enquiry (often inconsistent, with varying content and terminolgy), and to automatically identify important features and cross-reference them.

    From that original start, they then (allegedly) gained the interest of the intelligence services,and then the media companies and dot coms, to become the players they are today.

  • Oh, this just screams for people to pull a NSA Line Eater [tuxedo.org]-like filler into their mail, if just to PO the lawyer at the other side of the litigation and render their indexing useless.
  • Sounds to me like electronically sifting through evidence might require breaking software protections or interpreting custom file formats. Isn't that against the law now in the USA?

    I don't think the DMCA is a good idea. In fact, I think it sucks. The best way to defeat that trash legislation is to hold EVERYONE, especially the legal community, to the letter of the language.

Perfection is acheived only on the point of collapse. - C. N. Parkinson

Working...