Follow Slashdot stories on Twitter


Forgot your password?
The Internet Data Storage

Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.
This discussion has been archived. No new comments can be posted.

Vint Cerf: Data That's Here Today May Be Gone Tomorrow

Comments Filter:
  • I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

    • Re: (Score:2, Insightful)

      by Nerdfest ( 867930 )

      The same applies to any *open* format.

      • by KGIII ( 973947 )

        Hell, even the non-open formats are pretty easy to get to a readable level of functionality. They won't contain the markup necessarily and certain features won't be available but, frankly, if we're able to decode all the other ancient languages I'm pretty sure someone will be able to decode these as well.

        Speaking of ancient... Err.. When did Vint go to Google? That's kind of cool that he has but that is, in itself, news to me. I must have missed the announcement as I'm sure there was one.

    • In to a usable document from scratch? Pretty hard. Ever looked at the XML of a moderately complex document?

    • I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

      Yes, the problem is not "data" but "data in proprietary formats" ... and even that is becoming less of a problem. A converter to/from almost anything is usually just a google search away. With VMs and emulators, even proprietary binary programs are easier than ever to deal with. I can run any CP/M or C64 program on my desktop Linux computer using free emulators. This was indeed a "hard problem", but today it is mostly solved.

    • I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.

      Binary formats were standard for everything up through Office 2003. Office 2007(2003 with optional converter pack and some weird bugs) could output something XML based, though I have the vague memory from the OpenDocument/Open Office XML slugfest that 2007 produced something that deviated from the theoretical ideal of OOXML in some respects, and that full conformity happened at 2010 or 2013. I might be remembering that wrong; but anything before 2003, and a lot from 2003 were definitely binary.

    • Maybe. (Score:5, Insightful)

      by MrEricSir ( 398214 ) on Tuesday June 04, 2013 @11:03PM (#43910927) Homepage

      XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.

      I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.

      • The file being in ZIP format is documented. the character encoding of the XML file is specified in the XML file itself, like all XML files should do.

        From what I've used so far the Open XML formats aren't hideously complex, although i've only been working with XLSX files.

        • ZIP format is documented.

          Right now it is. What about the ragtag bunch of misfit librarians who are all that's left after the zombie apocalypse?

          They burned all the books for warmth and to keep the zombies away.

    • Both have published specifications, so reverse engineering shouldn't be necessary. However, Microsoft's XML includes things that are not defined in the specification. That was one of the objections to giving it status as an open standard.
    • Re:XML? (Score:5, Insightful)

      by gweihir ( 88907 ) on Wednesday June 05, 2013 @01:03AM (#43911563)

      Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.

      Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.

      Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

    • Re:XML? (Score:5, Informative)

      by Dr_Barnowl ( 709838 ) on Wednesday June 05, 2013 @04:13AM (#43912299)

      Not even Microsoft can implement their Office XML "standard" ; from examination it's pretty much a direct name-for-name serialization of their internal binary structs, with some of the more obvious gaffes like explicitly saying "do this like this old version of Word" hastily renamed to placate ISO. It needs you to implement a whole bunch of specific behaviours if you want it to work in the MS software (things like "if you update this bit, you also have to update this other bit just so or it won't work"), but these aren't documented.

      You've got more of a chance, sure, just because the structs are marked and you don't have to infer where their boundaries are, but it's a far cry from ODF which was designed from the outset to be an open XML format rather than just hastily being bunged together to permit large purchasing bodies (like governments) to tick the "Open format" box on their form.

  • by drinkypoo ( 153816 ) <> on Tuesday June 04, 2013 @10:14PM (#43910607) Homepage Journal

    My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.

    If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.

    • by Bremic ( 2703997 ) on Tuesday June 04, 2013 @10:59PM (#43910897)

      Until HTML includes DRM and half the stuff you create ends up being unreadable.

      Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.

      As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.

      How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?

      With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.

      Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.

      • by Nutria ( 679911 ) on Tuesday June 04, 2013 @11:18PM (#43911031)

        Or NASA data from deep space probes that's stored in now-unknown formats on mag tapes from long, long, long gone manufacturers.

        • Well, there's the problems with the medium itself, then there's the format, as you say (ought to be right up a cryptanalyst's alley, tho), then there's the real blocker: number of tracks, head design, and the circuitry that goes with it. Unless there are good documents for the machine's design and building, or one can be found in working order in a museum, you're SOL. It's a big problem that doesn't get much exposure.

      • From a 2002 slashdot story:

        mccalli writes :
        "Thought people might find this amusing. In 1986, the UK compiled an electronic [copy of the] domesday book. They used BBC Master computers to do it, and the result was put on laserdisc. I actually used this project whilst at school. This article states that nothing can now read these merely 15-year old discs. The original, written approx. 1086, is still doing fine thank you very much."
        Sounds like a good candidate for Bruce Sterling's Dead Media Project. (Speaking

        • In fairness they did manage to transfer the stuff off the discs and put the stuff without copyright issues online.

      • why didnt you OCR and then make the edits? There are numerous OCR options that would have fit that need no?
      • "How many people are going to shell out $600 for software to open something they want to make an edit on?"

        The upside to this is that when somebody wants to update that nifty company Flash web site and discovers that Flash now costs an arm and a leg, the site gets re-written in html.

  • by smash ( 1351 ) on Tuesday June 04, 2013 @10:17PM (#43910619) Homepage Journal
    Support emulatorVM developers! Encapsulate your entire machine in a VM and you can run the entire software stack if necessary. Anything you need convenient access to, export to CSV, XML or some other standard format.
    • and when Unicode and ASCII are replaced?

      • There's a pretty good chance LaTeX will still support them. There's a reason the TeX distribution is like 2 Gigs.....
      • Unicodes is a bit sprawling; but ASCII is only 128 characters(unless dealing with the wonderful world of nonstandardized non-latin extensions or ad-hoc 8-bit extensions-of-convenience is your problem, in which case I'd advise shirking your duties and drinking heavily), making preserving the whole thing even by chiselling it into stone monuments or other archaic methods potentially viable.

      • Honestly, reverse engineering ACII plain text files would be trivial. Not to the average person, but to somebody with a bit of background:
        A) We have software that can use something called frequency analysis to decipher something encoded that has a 1-1 correspondence so something we know (ie the english alphabet).

        B) Ignoring software, frequency analysis is something that could be (and before the days of computers, was) done by hand. Hell, some things could be picked out by eye. For one, all files would h
        • by mrsurb ( 1484303 )
          ASCII is even easier than that - because 0-9, a-z and A-Z are represented by sequential binary numbers.
    • How do you encapsulate the VM so it will still work 20 years in the future?

  • by Anonymous Coward on Tuesday June 04, 2013 @10:19PM (#43910633)

    We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.

    I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

    • by Lehk228 ( 705449 )
      what's this "we" shit all my files are odt, ods, html, tex or txt files. they will be just as accessible in 100 years as they are now.
      • by Tr3vin ( 1220548 )
        Without developers maintaining editors/viewers, open formats are only slightly more usable than proprietary ones. 100 years is a really long time from now as far as technology goes. I wouldn't be so quick to say that open formats will still be easily accessible.
    • And we're not doing it now with Apple products?
  • by Narcocide ( 102829 ) on Tuesday June 04, 2013 @10:24PM (#43910655) Homepage

    Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...


    Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

    • by Nerdfest ( 867930 ) on Tuesday June 04, 2013 @10:36PM (#43910749)

      Odds are that you don't need to convince Vint Cerf or Google in general about the advantages of open formats.

    • But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

      • by PPH ( 736903 )

        Just Googled "ebcdic to ascii converter"

        About 123,000 results.

      • But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

        How deep are your pockets?

        *IBM Consulting*

      • by Anonymous Coward on Tuesday June 04, 2013 @11:31PM (#43911091)

        But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

        $ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
        $ dpkg -S `which iconv`
        libc-bin: /usr/bin/iconv
        $ apt-cache show libc-bin | grep -e Essential -e Priority
        Essential: yes
        Priority: required

        So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?

        Are you on crack?

    • ...everything except that Zip Drive I saved it on.

  • by Neo-Rio-101 ( 700494 ) on Tuesday June 04, 2013 @10:31PM (#43910709)

    A perfect example of this is basically the issue of old video games. (I may as well bring this up because it's going to come up)

    Recently, the Internet Archive stored a whole pile of TOSEC collections of games from various old systems (thanks to their DCMA exemption of being an archival repository so that they can legally do this). Data and information that would have otherwise been completely lost into a digital black hole, if it weren't for the fans of the system, and the dedicated teams of people collecting and amassing this software as a hobby.... in breach of copyright.

    The problem with DRM is that without dedicated crackers and pirates, unless the original rights holders are around long enough to resell old titles for that long (which most aren't), old games will simply disappear into a digital copyright black hole and never be seen again. This happens once the computer/console system system is old, not sold anymore, and forgotten about, and the media degrades and isn't backed up in some form (in breach of EULA). If people aren't able to collect the software and hang on to it, preserving/duplicating the media while still in copyright, it's going to vanish. Culturally important games of significance will be lost forever, and that, if anything is as much a crime as it is to pirate software in the first place.
    It's only due to the efforts of an army of swappers/crackers, etc, that most of the old games on old systems were even preserved.

    The steam model on PC is quite good though as it makes a few compromises where you can actually make backups and go offline if you want.
    For old computers and consoles however, this doesn't apply,.... and with some more restrictive attempts to squash the used game market, and force internet-always-connected authentication on upcoming consoles to even play the game... one has to wonder if the game companies deliberately want to squish all traces of their old work, let it disappear into the ether, and to resell you this year's football game which is just like last year's. I fear that this is where we are headed (if we aren't there already)

  • Don't forget DRM (Score:5, Insightful)

    by onyxruby ( 118189 ) <onyxruby&comcast,net> on Tuesday June 04, 2013 @10:33PM (#43910721)

    Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.

    In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?

    Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.

    Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

    • Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

      That is utterly a waste of time. It makes me sick to think of how much good effort is wasted jailbreaking the iPhone, when Apple could merely write a few lines of code and none of that would have been necessary. The entire jailbreak community around Apple is compensating for a few lines of code.

      I say that with complete respect for the jailbreakers, but it could be so much better.......

  • by MrBandersnatch ( 544818 ) on Tuesday June 04, 2013 @10:34PM (#43910729)

    Digital archival is one of the HARD problems. Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history. A great deal of that is useless garbage of course but the original moon landing tape? 1000s of government emails reavealing exactly what was going on at pivotal times in history?

    The truth is, we need systems for hardcopy; digital is too tranient; emulators are a useful stop gap measure but dont protect againt the kinds of catastropic failures that we will likely see over the longer time frame; and we need indexing because someone at somepoint will want to wade through our digital ditritus.

    • I've been part of archival problem planning. We went with DVD. now I am not there, I suspect they are thinking DVD sucks and are moving "forward" when the DVD was more than good enough and those plastic discs will last a century. mpeg-2 files will have open source decoders. Now physical readers will still be a problem... the only solution is to wait as long as possible and then switch to the next long lasting format - but not necessarily the newest one at that time. (which is why moving to blueray is a w

  • by flogger ( 524072 ) <non@nonegiven> on Tuesday June 04, 2013 @10:43PM (#43910791) Journal
    This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)

    Within arms reach, I have Floppy drives that contain files created in AMI Pro work processors.... WHen I say Floppy, I am talking about the 5 1/4 inch floppies.
    Technology hardware and software is not stagnant... It will always continue to develop and progress (ignore windows 8). Data that is worth keeping will get converted. Data that isn't will get left behind. I would not be surprised that in about 25 years, there will be "classic" software as there is Classic literature...

    Too much typing.. going back to drinking.....
  • by PPH ( 736903 ) on Tuesday June 04, 2013 @10:45PM (#43910811)

    The IRS wants to audit me, going back several years. I kept the records as required but they are unreadable now.

    Thanks Microsoft!

  • That people in the far future would be getting smarter to accomplish this - probably a tossup - and apart from it, it's very questionable if a far future for humanity even exists, the way "humanity" is behaving this days/years/decades/centuries/millenia....

    Maybe there are smarter robots by then babysitting...
  • For open source. Save your files in open and/or openly defined, standardized formats and there will always be software that can deal with it.

    But I guess it's difficult for people to hear you explain that to them with their head up their ass.

    • Argument for open standards, yes. Open source, no. You don't need open source for open standards. And open source does not necessarily mean open standards.

  • I would solve this by installing a Windows XP VM with a copy of Office XP. Now that I solved Google's hard problem they must now see I am qualified to work there. Google is on a FUD rampage of which the likes I haven't seen since the great Microsoft FUD storms.
  • MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3 []. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed [] with MS09-017.

    On the Mac, they removed then even earlier, when they ported Office to Carbon [].

    IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

  • Uh, hello? (Score:5, Funny)

    by DogDude ( 805747 ) on Tuesday June 04, 2013 @11:10PM (#43910979)
    For a supposedly smart guy, he seems a bit silly:

    He could've just downloaded MS's Powerpoint 97 viewer []
  • I remember over two decades ago there was talk of making data objects, that is data that new how to present an object interface to get at its information. Data self contain its own reader in some ubiquitous language. But wait, we never got a ubiquitous language. Perhaps javascript today? But if you want to solve this problem then this is how to solve it. Or perhaps you could just package a converter to convert format XYZ to BSON as being good enough or at least better than today's breakage.

    One thi

  • Some are glass plate Daguerreotypes. Somehow, I am not too confident that my digital pictures will be legible 150 years from now, unless I make a good quality print on archival paper. Digital files are too easily corrupted and made totally useless. Media formats will change. 8" floppies anyone?
    • by AK Marc ( 707885 )
      I have gifs and jpegs from much older than the document in question, and have no trouble with any of them (and BMPs from before that). I was listening to MP3 in the 1990s. They work fine now.

      What digital picture standard (not raw) do you think you'll have any trouble reading, and roughly when do you think you'd have any trouble with it?

      Media formats will change. 8" floppies anyone?

      So you are worried that your MFM HD from the 1980s won't work, not worried about the .BMP on it not being readable, if you found some way to spin it up? I can't even tell

      • by jafac ( 1449 )

        yes - this is a real issue - and ARCHIVED data that is important DOES need to be "spun up" and refreshed to new media.

        If it's hard drives, yes. If it's optical media. . . well that depends. Because some optical media just plain degrades over time. Some is written in special proprietary formats (like Apple's early implementations of CD+R) that you're going to have a hard time reading with CURRENT equipment.

        If your data is archived to tape, and more than 10 years old, I'm afraid you're fucked.

  • by HockeyPuck ( 141947 ) on Tuesday June 04, 2013 @11:34PM (#43911109)

    We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.

    Fast forward to 20yrs from now, nobody's going to be carrying the computer boards for a 2004 Toyota Pruis or a 2013 Tesla.

    However, you'll still be able to restore your grandfather's '57 Chevy...

  • by michaelmalak ( 91262 ) <> on Tuesday June 04, 2013 @11:35PM (#43911113) Homepage

    I presented a solution [] to this long-standing problem last year to the Denver HTML5 Meetup.

    Code should never be separated from data. This is possible with HTML5, JavaScript, and open source.

    In the presentation, I steal and repurpose Hofstadter []'s analogy of DNA to an LP vinyl record, which is an information bearer, but useless without its information retriever (the record player). Like the cell of an animal, which contains both DNA and the means to "play" it, I ask why not the same with software?

    My maxim is: data should always carry the code with it to play itself. It was inspired from the field I've spent 50% of my career in: non-destructive testing where, for example, X-Rays and ultrasounds are performed on safety-critical industrial parts with 50-year service lives. If one of those parts fails and kills someone, you're going to want to go back into the old data and find the earliest indication of the flaw or fault and reinspect every other part in the world like it that is still in service. And maybe you need to go back 50 years. Under such a context, not providing the code with the data could be considered an act of gross neglect.

    In my presentation, I use the 1990's era trick of embedding XSL into an XML file, with the addition of the XSL now being able to use HTML5/JavaScript. Sadly, I've only gotten it work with Firefox -- the other browsers consider it a security violation.

    • From a future data recovery standpoint, how is the "code" any more useful than data? You'd still need to be able to figure out how to execute the code itself --- the code is just an especially complex and capable file format (which likely makes it very difficult to figure out if you've lost the execution instructions). Some file formats are already complete programming languages --- like PostScript. Do you think you could make much sense of a PostScript representation of a document if you started without a

      • It's an issue of installed base.

        The installed base of any given NDT system is typically less than Qty. 100, often much less. The installed base of HTML5 interpreters is on the order of a billion. The installed base of PowerPoint 97 at its peak was in the tens of millions. To be honest, I think Vint Cerf is complaining a bit much. Anyone (including him) could download the appropriate VMs, archival operating systems, and archival Microsoft Office systems to read PowerPoint 97 and even convert it to a modern f

    • by lahvak ( 69490 ) on Wednesday June 05, 2013 @12:41AM (#43911455) Homepage Journal

      No! Fail! You don't get it!

      1) Code is data
      2) Code is data that is especially hard to interpret
      3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.

      Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.

  • I do blame Microsoft (Score:5, Informative)

    by Darinbob ( 1142669 ) on Tuesday June 04, 2013 @11:59PM (#43911255)

    Seriously, why would Vincent Cerf not blame Microsoft? They have an extremely poor track record with backwards compatibility, and I don't think they even know what forwards compatibility is. If you design the data formats correctly then you can keep things usable for decades (or centuries). Guess what, twenty year old TeX documents still work, and yet Word X won't work with Word X-2. I've pulled runoff documents off of 70's versions of Unix that can still be printed. That says to me that one can deal with compatibility issues.

    This is all intentional on Microsoft's part too. They make money when customers buy new copies of software, so it is in their best financial interests to make sure that customers have significant pressure to upgrade. I remember the solution to an acknowledged bug for Word 97 was to make sure that everyone who was going to read your document had the appropriate Word 97 plug in in their older version of Word. I completely blame Microsoft here.

    This is not that hard a problem, IF the company pays attention to it and gives it even a small amount of priority.

    • To say that MS has a poor record of backwards compatibility is, well, ridiculous. It's only just about *the* most important thing for them, because the majority of their business is with busnesses, and if their FooBar app doesn't run, then they don't upgrade.

      No other OS has near the level of compatibility that the MS sequence does. [] [] []

      • No other OS has near the level of compatibility that the MS sequence does.

        Somebody's been drinking the kool-aid.

        There's a small, little known company called IBM selling a type of computer called a "mainframe" which might beg to disagree. You can buy a modern mainframe which will still run your unmodified programs which you wrote on an original System 360. In 1964.

        Microsoft have not even existed as long as that chain of backwards compatibility, and you try getting the original digger to run on Windows 8 (or

      • The OS stays compatible in some ways (Windows is not at all unique here). However the Microsoft applications have serious problems in this regard. Maybe some of the competition is not so great either but it's no excuse when Word can't even be compatible with itself. They have changed the file format in Word in fundamental ways several times.

      • No other OS has near the level of compatibility that the MS sequence does.

        It's called ANSI C on Unix. Pick up a copy of The UNIX Programming Environment and you can still use the examples verbatim on a Linux machine today. And you can even still use Motif apps, if we're talking about GUI programs. They still work just like they did when they were new, except a hell of a lot faster.

        Oh, you want backwards compatibility for closed-source software? Guess what? Plenty of software craps itself when it does anything interesting on the wrong version of windows. In reality, there's only o

  • This problem isn't new to anyone. If it's new to you, then you need to get involved in the digital preservation movement. []

  • "hard problem" (Score:5, Insightful)

    by macraig ( 621737 ) <> on Wednesday June 05, 2013 @12:29AM (#43911387)

    Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.

    The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

  • by gweihir ( 88907 ) on Wednesday June 05, 2013 @12:41AM (#43911459)

    My first Latex publications from 20 years back and all my human-readable ASCII scientific data still be read and used without any problem. Human-readable file
    formats in the UNIX tradition completely solve this problem.

    This problem is only hard if the people making the data formats are either stupid or do not want their formats to be easily accessible to other applications, as Microsoft does. Of course, others are creating just as fundamentally broken formats for either of the same reasons.

    • Just hex print the MS 97 file and you have a human readable format:

      00007b0 5f00 675f 6f6d 5f6e 7473 7261 5f74 005f
      00007c0 696c 6362 732e 2e6f 0036 5f5f 7270 6e69
      00007d0 6674 635f 6b68 6500 6978 0074 6573 6c74
      00007e0 636f 6c61 0065 626d 7472 776f 0063 706f
      00007f0 6974 646e 7300 7274 636e 7970 7000 7475
      0000800 0073 6177 6e72 0078 5f5f 7473 6361 5f6b
      0000810 6863 5f6b 6166 6c69 6900 7773 7270 6e69
  • by stenvar ( 2789879 ) on Wednesday June 05, 2013 @12:49AM (#43911497)

    You can get emulators for just about every machine you can imagine: PDP-10, PDP-11, DOS, Atari, Amiga, C64, microcontroller, etc. You can get hardware emulators with FPGAs if you like. Almost any important format is documented or has been reverse engineered. Yes, you can easily read 1997 PowerPoint files, even if his weird choice of Office on Mac can't. And that's only with current technology. Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."

  • by hduff ( 570443 ) <> on Wednesday June 05, 2013 @08:04AM (#43913197) Homepage Journal

    The best safeguard is the abandonment of all existing proprietary formats to freedom (so anybody can write conversion software) and the proliferation of open formats on an ongoing basis.

Today is the first day of the rest of your lossage.