Vint Cerf: Data That's Here Today May Be Gone Tomorrow

Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358

Posted by Soulskill on Tuesday June 04, 2013 @10:08PM from the tell-that-to-my-stack-of-punchcards dept.

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.

Vint Cerf: Data That's Here Today May Be Gone Tomorrow

This discussion has been archived. No new comments can be posted.

Search 358 Comments Log In/Create an Account

Comments Filter:

What's a Macintosh (Score:0, Insightful)

by Anonymous Coward writes: on Tuesday June 04, 2013 @10:15PM (#43910611)

What's a Macintosh?
What ever it is, I bet if he used LaTeX+Beamer he wouldn't have this problem. Whether it was authored in 1997 or 2011, it almost certainly would still work on a "Macintosh". Maybe he could learn a thing or two from Donald Knuth and Leslie Lamport, and stop playing around with the rugrats at Google.

We should have listened (Score:5, Insightful)

by Anonymous Coward writes: on Tuesday June 04, 2013 @10:19PM (#43910633)

We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.
I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

Re:So? (Score:5, Insightful)

by MrBandersnatch ( 544818 ) writes: on Tuesday June 04, 2013 @10:23PM (#43910649)

I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....

Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

by Narcocide ( 102829 ) writes: on Tuesday June 04, 2013 @10:24PM (#43910655) Homepage

Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...
OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.
Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

Don't forget DRM (Score:5, Insightful)

by onyxruby ( 118189 ) writes: <onyxrubyNO@SPAMcomcast.net> on Tuesday June 04, 2013 @10:33PM (#43910721)

Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.
In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?
Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.
Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

Re:XML? (Score:2, Insightful)

by Nerdfest ( 867930 ) writes: on Tuesday June 04, 2013 @10:34PM (#43910737)

The same applies to any *open* format.

Re:My data will be readable (Score:5, Insightful)

by Bremic ( 2703997 ) writes: on Tuesday June 04, 2013 @10:59PM (#43910897)

Until HTML includes DRM and half the stuff you create ends up being unreadable.
Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.
As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.
How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?
With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.
Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.

Maybe. (Score:5, Insightful)

by MrEricSir ( 398214 ) writes: on Tuesday June 04, 2013 @11:03PM (#43910927) Homepage

XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.
I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.

On the PowerPoint 4.0/95 converters... (Score:5, Insightful)

by yuhong ( 1378501 ) writes: <yuhongbao_386 AT hotmail DOT com> on Tuesday June 04, 2013 @11:07PM (#43910965) Homepage

MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3 [microsoft.com]. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed [microsoft.com] with MS09-017.
On the Mac, they removed then even earlier, when they ported Office to Carbon [microsoft.com].
IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

Re:Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

by Anonymous Coward writes: on Tuesday June 04, 2013 @11:31PM (#43911091)

But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.
$ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
EBCDIC
$ dpkg -S `which iconv`
libc-bin: /usr/bin/iconv
$ apt-cache show libc-bin | grep -e Essential -e Priority
Essential: yes
Priority: required
So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?
Are you on crack?

"hard problem" (Score:5, Insightful)

by macraig ( 621737 ) writes: <mark.a.craig@gmaFREEBSDil.com minus bsd> on Wednesday June 05, 2013 @12:29AM (#43911387)

Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.
The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

Re:Code should NEVER accompany data! (Score:5, Insightful)

by lahvak ( 69490 ) writes: on Wednesday June 05, 2013 @12:41AM (#43911455) Homepage Journal

No! Fail! You don't get it!
1) Code is data
2) Code is data that is especially hard to interpret
3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.
Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.

Re:XML? (Score:5, Insightful)

by gweihir ( 88907 ) writes: on Wednesday June 05, 2013 @01:03AM (#43911563)

Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.
Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.
Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358

Vint Cerf: Data That's Here Today May Be Gone Tomorrow More Login

Vint Cerf: Data That's Here Today May Be Gone Tomorrow

What's a Macintosh (Score:0, Insightful)

We should have listened (Score:5, Insightful)

Re:So? (Score:5, Insightful)

Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

Don't forget DRM (Score:5, Insightful)

Re:XML? (Score:2, Insightful)

Re:My data will be readable (Score:5, Insightful)

Maybe. (Score:5, Insightful)

On the PowerPoint 4.0/95 converters... (Score:5, Insightful)

Re:Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

"hard problem" (Score:5, Insightful)

Re:Code should NEVER accompany data! (Score:5, Insightful)

Re:XML? (Score:5, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot