Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Most Digital Content Not Stable

Posted by Zonk on Tue Mar 20, 2007 11:55 AM
from the define-stable dept.
brunes69 writes "The CBC is running an article profiling the problems with archiving digital data in New Brunswick's provincial archives. Quote from the story: 'I've had audio tape come into the archives, for example, that had been submerged in water in floods and the tape was so swollen it went off the reel, and yet we were able to recover that. We were able to take that off and dry it out and play it back. If a CD had one-tenth of one per cent of the damage on one of those reels, it wouldn't play, period. The whole thing would be corrupted'. Given the difficulties with preserving digital data, is it really the medium we should be using for archival purposes?"
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by iamacat (583406) on Tuesday March 20 2007, @11:57AM (#18416771)
    That content can not be preserved at all. We'll be a civilization without written history, like American Indians.
    • And if they didn't insist on DRM in their smoke signals, they might still be a pretty formidable group today.
        • Re: (Score:3, Informative)

          While whites did enough evil, like stealing the whole country, American Indian writing systems were actually developed by missionaries.

          I think that was the point behind "depending on how you define American" -- the GP was referring to the urbanized cultures of Mexico, Central and South America that had writing systems that they were forced to give up along with the rest of their culture.
            • by saforrest (184929) on Tuesday March 20 2007, @01:03PM (#18417971) Homepage Journal
              If by forced you mean they lost the war then yes, they were forced. If somebody tried to claim your land would you ever stop fighting. I know I would stop when i was dead. They were just pussies. If they had any conviction we woudl be at war with them today.


              Ballsy words for an Anonymous Coward. Hopefully you'd stick to them if your hometown were invaded.
              • Re: (Score:3, Informative)

                To them, it was impossible to 'claim' their land -- since they didn't consider it 'their' land.

                Best summed up by Chief Seattle, in 1854: "This we know: the earth does not belong to man, man belongs to the earth. All things are connected like the blood that unites us all. Man did not weave the web of life, he is merely a strand in it. Whatever he does to the web, he does to himself."
        • Re: (Score:3, Insightful)

          [quote]While whites did enough evil, like stealing the whole country[/quote]

          Well, I'm 1/8th Native American (but 7/8ths White) if that counts for anything, but this is always overblown. Whites/europeans came in and conquered the land. That's what people have done throughout all of recorded history. The Romans Conquered the Greeks, the Normans conquered the Saxons, etc. The list goes on and on. The case has ALWAYS been that if some other nation wanted your land and you couldn't stand up to them in a mil
          • The tragedy of what was done to the Native Americans isn't that Europeans came in and conquered them. It's the way they were treated afterwards. I don't think anyone can read about the Trail of Tears and not feel something. You can't confuse war with murder. There is a difference.

            That being said. What's done is done. It should be remembered so we learn from those horrible mistakes. It shouldn't be a constant source of guilt to be used against people that had no part in it. The same goes for slavery, genocid
          • by vertinox (846076) on Tuesday March 20 2007, @01:25PM (#18418377)
            The Romans Conquered the Greeks, the Normans conquered the Saxons, etc. The list goes on and on. The case has ALWAYS been that if some other nation wanted your land and you couldn't stand up to them in a military confrontation, then you were gonna loose that land.

            As a person who loves to study European antiquity I would point out some flaws in this thinking...

            1. When the Romans conquered the Greeks they actually adopted Greek culture and didn't kill off the Greeks.
            2. When the Normans conquered the Saxons they didn't kill off the Saxons nor really conquered their land as much as just intermarried with them (Hence Anglo-Saxon Culture)

            The only whole sale Genocides that history can come up with is the Crusaders massacre of Jerusalem (which wasn't really as much as hatred of Muslims as it was starving Europeans killing off everyone in the city regardless of religion out of rage of having to starve in the desert for several months) and then the Mongol sack of Baghdad which wasn't over so much as land, but out of spite of the execution of Mongol diplomats (considering they burned and salted the lands made the "take your lands" point of conquering sort of a non-issue).

            The genocide and seizure of lands in this scale was never really seen before until the colonization of Americas. It wasn't as much as the Indians could not defend them as much as it was that the westerners thought they were subhuman.

            Which sadly we saw again in the European theatre in WW2.
            • Re: (Score:3, Insightful)

              It wouldn't be such a big deal if history books and text books didn't lie so badly about it to make us "feel good" about it. Lies like the natives were "uncivilized", in various forms mostly, when in fact the early settlers here learned an amazing amounts from the natives, including some fundamental concepts of democracy. When you have textbooks teaching that the Boston Tea Party perpetrators dressed up as natives to "disguise" themselves, and the actual reason was that the native american was a symbol of f
  • by WinterSolstice (223271) on Tuesday March 20 2007, @11:57AM (#18416781)
    Isn't that the point of digital? Lossless copies are possible (depending on format obviously). Why have one plastic cylinder that can be lost when you can have it in 5 or 10 locations?
    • by t00le (136364) on Tuesday March 20 2007, @12:02PM (#18416877)
      Any good backup strategy will have multiple media types, so CD/DVD should not be your primary backup media type. If you prefer to have an medium for fast access, then it is still viable. As long as it is not your primary media type, which should be something with tried-and-true longevity.

    • by elrous0 (869638) * on Tuesday March 20 2007, @12:07PM (#18416969)
      Yes, analog tape is durable. But let's take it and that "CD" and put them in front of a large electromagnet and see how each fares.
    • I don't know what it is with /. but it seems this kind of infopocalypse story comes up at least once every 6 months in regards to digital data. I can only think one thing in each case: This is fucking retarded.

      As you said, the great thing about digital data is that is can be replaced cheaply, perfectly, and spread around. It's resilience isn't in the one copy lasting 1000 years, it is in having copies everywhere, so no even short of nuclear war can eliminate them all, and maybe not even then.

      This also is th
      • This also is the response to the other big cry-wolf thing, "What happens when the data is in a format that's too old???!!11one" The answer is we just keep copying it to new formats. I have digital copies of papers that I wrote in high school. They were written on an old copy or Works for Windows 3.1 and usually saved to floppy. I don't have a floppy any more but it isn't a problem. I long ago transferred them to a harddrive and I just keep transferring them to new drives when I get them. I also periodically load the old documents in to whatever my current word processor is, convert them, and re-save them as a new format.

        I think you're missing an important element here. As you move along in time, the volume of data that must be converted to the format du jour only gets bigger and bigger.

        For a single person, it's probably not too bad. I, too, have pretty much everything I ever wrote since I first got a computer, and every few years I've committed to rolling the whole thing onto new media. So I've gone from offline backups on floppies, to Zip disks (in retrospect a mistake), to CDs, to DVD-R, and now to DVD+R (the -R discs were crappy and I've since heard that +R is a superior format anyway). This isn't much trouble, because the amount of data I have to backup hasn't really grown that much faster than the data density of available media. I'm probably up to a couple of DVDs for the stuff I really, really care about, maybe a binder if I include all the photos and video.

        But what's a basic Saturday-afternoon copy-and-burn job for an individual is a Sisyphean task for a large government agency or library, particularly one who is constantly generating new content. I've seen places that could barely keep up with archiving the stuff they were producing, much less roll their vast archives forward onto new media. So they'd have vaults of hard drives, sitting next to DLT cassettes, next to IBM 3480, next to racks of old half-inch open-reel tapes. Probably back in some dark corner there were piles of punched cards; it really wouldn't surprise me. The problem of data loss due to unreadable formats isn't some abstract 'maybe,' it's already happened in a lot of places (but nobody really wants to talk about it, so it mostly gets buried and whatever's on the tapes gets written off).

        The reason why there's so much interest in preservable formats is because while it may not be strictly impossible to constantly roll old backups and archives forward, it's very hard, and requires vast amounts of effort and expense. If you have a backup that's being written into a format that you know is going to be readable for a long time, even if it's more expensive to write initially, you can save a lot of money and time down the road by not having to copy it forward as often.

        People may get a little shrill when they're talking about these issues, but they're quite real.
    • by gad_zuki! (70830) on Tuesday March 20 2007, @12:19PM (#18417187)
      The cost of multiple backups is very real. The real issue here is that this is a frivolous complaint. First off, wet tape being readable is an artifact of the medium. The rosetta stone in the british museum is pretty readable but we arent exactly throwing out our modern media to go back to stone. Also, lets consider a reel to reel tape is about 90 minutes (7inch). 650 megabytes on a standard disc at encoding similiar to the quality you get out of a reel to reel tape is something like 1,500 minutes. And its smaller. So lets not go a little too crazy with idealizing the past.

      Also I'm certain for every analog horror story there is a digital lucky story (and vice versa). Not to mention digital encodings usually have some kind of redundancy. A small scrach does nothing but the same scratch on an lp forever destroys some part of the track. I wont even go into the magic of data restoration (which the author ignores). There's really no 'tough medium for the ages' out there that can do it all. Just complaints and blind-luck stories.
    • Exactly! Why store it on plastic at all?

      What I do is take files I care about, encrypt them, rename the file to something tempting like "Cheerleader Sex Orgy XXXIV.avi," note the MD5 (sticky note on the next of the monitor), and share it on a P2P network.

      Instant distributed backup! 8D
      • "Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it " --- Linus Torvalds
      • Re: (Score:3, Insightful)

        You assume though that the digital format you've chosen will be readable decades later. The details of the encoding method may be forgotten or even hidden behind DRM laws and the physical means of reading them may be lost as the technology changed. How many 5.25" floppy drives do you still see? I think NASA has faced this issue with old Apollo data fom the 60s.
      • by eno2001 (527078) on Tuesday March 20 2007, @12:39PM (#18417533) Homepage Journal
        Digital media is OK, it's the storage that sucks. That's your basic point. But I have to disagree with you on the ubiquity of CD-ROM and DVD-ROM drives. Trust me... of all those devices that exist today, you'll only find less than 1% in a serviceable state in another 75 years. What we really need is a self-replicating storage system that builds copies of itself. I propose that for proper storage of digital information, we should really be looking at systems that can store the data in a sequential chemical form (to represent the bits). These systems should be very compact and only contain a limited set of data + the ability to copy that data to neighboring units. (Death by a thousand paper cuts sort of thing) These small systems would be contained within larger systems whose sole responsibility would be acquiring the necessary physical resources (complex matter that could be broken down into the base chemicals needed by the smaller storage systems).

        The larger systems could also provide mirroring by interfacing with each other as directed by chemical interactions in order to preserve original data as well as integrate new data that may be useful in assuring that future units are even more resilient to any sorts of flaws or possible malfunction caused by inappropriate chemical input. The key to all of this is going to be to make sure that the larger units are impelled to continue the duplication and exchange of data ad infinitum. To do that, there should be some sort of mutual benefit that the engaged units acquire from the mirroring. Multiple levels of mutual benefit would likely be more successful than just one level. So I propose that at a base level, the units should be programmed with routines that make them feel more or less successful whenever a mirroring connection is attempted. I know that sounds strange, but it should be a pretty simple subroutine and will at least get the units to attempt mirroring.

        The next level would also be an expansion of the data mirroring to the actual manufacture of a tertiary (or even more) unit that contains selected data from both origination units. As part of the mutual benefit relationship between units, the origination units should be programmed to protect the manufactured unit in order to safeguard its data as it would be the freshest copy (chemically speaking) and therefore more viable. So the relationship between origination units and next generation manufactured units would be that of security and stability from the origination units as applied to the next generation.

        Another aspect to all of this that would add even more value would be to provide the larger units with various sensors that would store ANY and ALL possible forms of energy radiation and chemical exposure to the environment. This would assure that the units would not only contain the originally stored data, but would be constantly gathering the data in a parallel fashion in every corner of the world where the units are deployed.

        As you can see, this would ensure after several generations, that all the original data is in tact and could simply be retrieved by reading all units chemical stores simultaneously and reassembling the original data as well as newly stored information. Imagine that... a sensor array that spans the planet with historical functions as well. And all self-sustaining and chemically based.
        • All of which indicates that digital is not a preferable mechanism for recording, but only for working copies and transmission. The very process of converting from analog to digital automatically results in tremendous data loss the moment you do it when you get right down to it.

          You're assuming the source is analog... what about material that is no different in digital then in analog... if I write a book, or an application, what if the source is a picture, video or audio but one that was originally created o

  • by IckySplat (218140) on Tuesday March 20 2007, @11:58AM (#18416785)
    Stone tablets. Just drill a hole for a zero and your away and laughing
    Now we just need a large enough area to store them :)
    • Re: (Score:3, Insightful)

      Rather glib, but a very important point. The biggest problem is data density. The higher the data density, the less damage it takes to destroy it. The other upside to digital data is the ability to build in fault tolerance. CDs, for example, are fault tolerant. They can accomodate a certain number of scratches and bad blocks and still produce 100% accurate output. On the other hand, this tolerance comes at the expense of (wait for it) data density. The upside to analog data, is that damage distorts w
    • This is a dual problem:

      1) Digital data needs to be moved about once every 5 years onto a new physical store, disk, whatever. Think of the amount of data sitting around on floppy disks that is being lost as we speak.

      2) Data has to be recorded in a way that that presumes whatever software you use to create it will not exist in the future. Anyone who saved their life's work in some ancient binary word processor file will know what I mean. For most computer-based data storage that requires data be stored s

  • oh, just (Score:3, Interesting)

    by superwiz (655733) on Tuesday March 20 2007, @12:00PM (#18416817) Journal
    let's play it all by memory. Seriosly, do we really have a choice? The more densely we pack the information that more of a chance it has for corruption. The "CD" mentioned by the article has effectively 700 minutes of music of the same quality as the 60 minute tape.
  • 3.5" (Score:5, Funny)

    by otacon (445694) on Tuesday March 20 2007, @12:00PM (#18416825) Homepage
    At the enterprise level we use 3.5" 1.44MB Floppy drives in an elaborate redundant array. It consists of roughly 70,000 Disks, each changed nightly. We haven't had any problems yet. Hopefully the rest of the world will play catch up soon.
  • by Anonymous Coward on Tuesday March 20 2007, @12:00PM (#18416827)
    Ridiculous. It's not the fact that content is digital, it's the fact that the media being used to store the information (CDs etc) is fragile. If these mythical audio tapes had been digital tapes, recovering the signal from them would have been just as easy.
    • Re: (Score:3, Insightful)

      But it *is* that the content is digital.

      Those audio tapes were "recoverable", but I bet they didn't sound all that great. Good enough to be understood, but nowhere near the original quality. An analog signal that is "garbled" is still usable.

      If there had been *digital* data on those tapes, then it's pretty likely that enough of the data had been corrupted that the files would have been *unusable*. Once the bits are gone, they're gone. Throw in the fact that there no guaranteed that the encoding and file for
      • Re: (Score:3, Informative)

        Nothing is wrong with digital it all depends on the medium.

        It depends on the content as well. Content that is inherently analog tends to be more 'robust' in analog form. For instance, in the military they say [wired.com], "A computer with a bullet in it is just a paperweight. A map with a bullet in it is still a map."
  • by dave420 (699308) on Tuesday March 20 2007, @12:00PM (#18416839)
    ... wasn't *exactly* what you put on. You have the appearance of stability, that you can retrieve something off a damaged tape, but the truth is something different. That's the beauty of analogue. The same simplicity and fault-tolerance of the format also means the format will naturally degrade over time. The contents may be retrievable, but they've degraded, and as such are not the same contents as when first written. Digital fails, but when it doesn't fail, you have exactly the same content as you did when you started. Archivists will not run from digital - their techniques will improve instead. or something.
  • First (Score:3, Interesting)

    by WormholeFiend (674934) on Tuesday March 20 2007, @12:00PM (#18416841)
    we need to realise that nothing lasts forever.

    Then, we can figure out the most cost-effective medium to record stuff on, with determined re-archival cycles.
  • by webword (82711) on Tuesday March 20 2007, @12:02PM (#18416871) Homepage
    Shouldn't it be possible to take all the media and just crush it? You know, like throw it into a Mega Power 3000 Digital Garbage Collector (TM) and crush it into a diamond or something? Let future generations figure out how to decompress it.
  • by Red Flayer (890720) on Tuesday March 20 2007, @12:04PM (#18416905) Journal

    If a CD had one-tenth of one per cent of the damage on one of those reels, it wouldn't play, period.
    That's because you're trying to optically read through the damaged part. It is possible to recover data from damaged discs, as long as only the coating (and not the reflective surface) is damaged. It is quite possible to polish the surface and read the data, or even to fill in some of the damage and repolish for reading.

    Just because it's harder to recover the data doesn't mean it's impossible.

    Of course, anyone using CDs or DVDs for large data backup must have a lot of interns to do the disc swapping.
    • by Criffer (842645) on Tuesday March 20 2007, @12:16PM (#18417129)
      Exactly. If you try to put a bent CD into a CD drive, you're obviously not going to be able to read it. But that doesn't mean its not recoverable.

      To recover data from a CD, you can simply photograph it at high enough resolution. Even with huge scratches, even with parts of the disc physically missing, you can recover the data exactly as it was encoded. How? Reed Solomon code [wikipedia.org] .
      Quoth wikipedia:

      The result is a CIRC that can completely correct error bursts up to 4000 bits, or about 2.5 mm on the disc surface. This code is so strong that most CD playback errors are almost certainly caused by tracking errors that cause the laser to jump track, not by uncorrectable error bursts
  • by Waffle Iron (339739) on Tuesday March 20 2007, @12:04PM (#18416913)
    The CD wouldn't play with an off-the-shelf CD player. That doesn't mean that a special "archaeological" CD player can't be built that would perform extensive microscopic image analysis of the disk surface in order to read the data in the face of extensive corruption.

    Some analog technologies, like old color films, have also degraded and need image enhancement to recover the original content.

    • a special "archaeological" CD player
      I believe they exist already - just as there exist devices for reading fragments of shattered hard drives. Forensic data recovery experts have some pretty funky kit at their disposal.
  • by phantomfive (622387) on Tuesday March 20 2007, @12:10PM (#18417029) Homepage Journal
    Have people already forgotten the advantage of digital? If you have an analog tape, every time you make a copy of it, the quality will be degraded. But with digital, you can make a million copies and the final copy will be the byte by byte equivilent of the original. So what if CDs only last 10 years before becoming unusable? You can make another copy! So what if this guy wouldn't have been able to recover after physical damage to his media....if it was important, he should have had digital offsite backups! And those backups would have been 100% equivelent to the originals.
  • 1% = Total Loss? (Score:4, Interesting)

    by JesseL (107722) on Tuesday March 20 2007, @12:12PM (#18417057) Homepage Journal
    If losing 1% of the data on a CD means the data is a total loss, doesn't that say to you that you should be using a file system and data formats with more redundancy and parity?

    Of course for the ultimate in durable electronically readable storage you should be burning everything to PROMs [wikipedia.org].
  • Apples and Oranges (Score:3, Informative)

    by Overzeetop (214511) on Tuesday March 20 2007, @12:16PM (#18417133) Journal
    The tape had analog data on it. Analog, as we all know from years of television and radio, is very forgiving of damage. CDs are digital data. There is error correction, but for normal playback/reading devices there is a limit beyond which they simply give up trying. The data is perfect or its gone for those machines.

    Sad to say, tape dies too.

    What is more interesting is the use of compression (and rights management, though if your originals are encrypted you deserve to get screwed - physical security comes first). With analog and simple stream encoding of time domain data (such as audio recordings) much data can be recovered using an external benchmark for the time code. Compress that data and lose your parity and you're totally hosed.

    I've never been a proponent of compressed or encoded backups. Sure they save space and add a layer of "security", but that comes at the cost of flexibility should damage occur.

    Of course, as has certainly already been mentioned - with digital data, you have the luxury of making multiple perfect copies as well as the ability to perform automated checks of that data, mostly possible without user interaction necessary.

    Othwise, stone tablets have the best track record so far, though the storage density is a bit on the light (or should I say heavy?) side.
  • ...the solution is simple. We need a way to take a quantum snapshot of the whole of the Earth at least once every 24 hours and then to send that data out into space as a broadcast in all directions. To retrieve the quantum structure, we'd simply pop out of a wormhole near where the data is passing and retrieve it, then retransmit it back to here and reconstruct the Earth as it was before catastrophe struck. The nice thing about this is that if we can find another M class star like Usolia (our sun), we don't even have to beam the data through the wormhole. We could just intercept it near the star and start the assembly process there. Point-in-time restores for the whole of the planet. Imagine that. You're welcome.
    • Point in time restoration, brings back all the bugs and vulnerabilities too. Unless you could apply all the security patches released after you have check pointed Earth, it will be pwned in no time.
    • We need a way to take a quantum snapshot of the whole of the Earth at least once every 24 hours and then to send that data out into space as a broadcast in all directions. To retrieve the quantum structure, we'd simply pop out of a wormhole near where the data is passing and retrieve it, then retransmit it back to here and reconstruct the Earth as it was before catastrophe struck.

      That service is already available [magrathea.px]. However, only the ultra-rich can afford it, and what with the whole galaxy in a bit of a rec

  • by hopbine (618442) on Tuesday March 20 2007, @12:23PM (#18417245)
    In the 1980's they digitized the Domesday Book. Trouble was the format they used is now obsololete. The good news (apart from still having the origional) they have re-inveted the wheel. http://news.bbc.co.uk/2/hi/technology/2534391.stm [bbc.co.uk] for details.
  • Umm.. (Score:5, Insightful)

    by phasm42 (588479) on Tuesday March 20 2007, @12:25PM (#18417287)
    If a CD had been submerged in water, it would've been fine. There's no point in making the comparison if it wouldn't have been damaged in the first place. They need to find a better example.
  • by zuki (845560) on Tuesday March 20 2007, @12:28PM (#18417335) Journal
    There is much that has already been documented and guidelines exist [cdpheritage.org] to guarantee somehow the short to medium-term preservation of digital assets; this particular link is for audio-related digital assets, but data is all the same...!

    A combination of multiple sets of magneto-optical and tape backups maintained in separate locations, all temperature and humidity-controlled environments should easily yield 25~30 years shelf life, which guarantees that by then we'll hopefully have found better long-term options to transfer these to.

    I am transferring most of my 15 to 20-year old audio DAT tapes digitally with no problems. Good brand-name CD-R's (like Tayo-Yuden) kept out of the light and at a steady temperature seem fairly resilient so far, but there has been batches which over time have developed 'rot' or layer oxydation, which sometimes renders them partially or wholly unusable.

    DLT tapes are so far the most trouble-free type of media I have encountered, but with only 10 years to go back on, not sure that is accurate.

    Z.
  • by mihalis (28146) on Tuesday March 20 2007, @12:39PM (#18417543) Homepage

    I know I'm offtopic, injecting facts into this debate, but I thought it might be interesting to bring up the VXA tape format. It allegedly survives all kinds of abuse like freezing, see Freezing Test [exabyte.com]

    I have never tried these drives, and would love to hear from someone independent who has.

  • by jeevesbond (1066726) on Tuesday March 20 2007, @12:41PM (#18417569) Homepage

    Chappies in New Brunswick:

    'I've had audio tape come into the archives, for example, that had been submerged in water in floods and the tape was so swollen it went off the reel, and yet we were able to recover that. We were able to take that off and dry it out and play it back.

    From an earlier /. article:

    No problem. You reach for your back up tapes only to find out that the information on the tapes is unreadable.

    Quick someone tell the author of: 'So You've Lost a $38 Billion File [slashdot.org]' that everything is alright! New Brunswick had data that was submerged in water, tape so swollen it was off the reel; they still managed to recover it.

    And don't come out with that: 'Polar Bear ate the backup tape' excuse again!