Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet Data Storage

Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.
This discussion has been archived. No new comments can be posted.

Vint Cerf: Data That's Here Today May Be Gone Tomorrow

Comments Filter:
  • What's a Macintosh (Score:0, Insightful)

    by Anonymous Coward on Tuesday June 04, 2013 @10:15PM (#43910611)

    What's a Macintosh?

    What ever it is, I bet if he used LaTeX+Beamer he wouldn't have this problem. Whether it was authored in 1997 or 2011, it almost certainly would still work on a "Macintosh". Maybe he could learn a thing or two from Donald Knuth and Leslie Lamport, and stop playing around with the rugrats at Google.

  • by Anonymous Coward on Tuesday June 04, 2013 @10:19PM (#43910633)

    We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.

    I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

  • Re:So? (Score:5, Insightful)

    by MrBandersnatch ( 544818 ) on Tuesday June 04, 2013 @10:23PM (#43910649)

    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

  • by Narcocide ( 102829 ) on Tuesday June 04, 2013 @10:24PM (#43910655) Homepage

    Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...

    OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.

    Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

  • Don't forget DRM (Score:5, Insightful)

    by onyxruby ( 118189 ) <onyxrubyNO@SPAMcomcast.net> on Tuesday June 04, 2013 @10:33PM (#43910721)

    Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.

    In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?

    Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.

    Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

  • Re:XML? (Score:2, Insightful)

    by Nerdfest ( 867930 ) on Tuesday June 04, 2013 @10:34PM (#43910737)

    The same applies to any *open* format.

  • by Bremic ( 2703997 ) on Tuesday June 04, 2013 @10:59PM (#43910897)

    Until HTML includes DRM and half the stuff you create ends up being unreadable.

    Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.

    As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.

    How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?

    With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.

    Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.

  • Maybe. (Score:5, Insightful)

    by MrEricSir ( 398214 ) on Tuesday June 04, 2013 @11:03PM (#43910927) Homepage

    XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.

    I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.

  • MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3 [microsoft.com]. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed [microsoft.com] with MS09-017.

    On the Mac, they removed then even earlier, when they ported Office to Carbon [microsoft.com].

    IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

  • by Anonymous Coward on Tuesday June 04, 2013 @11:31PM (#43911091)

    But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.

    $ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
    EBCDIC
    $ dpkg -S `which iconv`
    libc-bin: /usr/bin/iconv
    $ apt-cache show libc-bin | grep -e Essential -e Priority
    Essential: yes
    Priority: required

    So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?

    Are you on crack?

  • "hard problem" (Score:5, Insightful)

    by macraig ( 621737 ) <mark.a.craig@gmaFREEBSDil.com minus bsd> on Wednesday June 05, 2013 @12:29AM (#43911387)

    Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.

    The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

  • by lahvak ( 69490 ) on Wednesday June 05, 2013 @12:41AM (#43911455) Homepage Journal

    No! Fail! You don't get it!

    1) Code is data
    2) Code is data that is especially hard to interpret
    3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.

    Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.

  • Re:XML? (Score:5, Insightful)

    by gweihir ( 88907 ) on Wednesday June 05, 2013 @01:03AM (#43911563)

    Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.

    Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.

    Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

There are two ways to write error-free programs; only the third one works.

Working...