Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358
dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'"
We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.
My data will be readable (Score:4, Informative)
My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.
If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.
Re:Yes, backwards compatibility, blah blah blah... (Score:5, Informative)
Odds are that you don't need to convince Vint Cerf or Google in general about the advantages of open formats.
Re:My data will be readable (Score:5, Informative)
Or NASA data from deep space probes that's stored in now-unknown formats on mag tapes from long, long, long gone manufacturers.
I do blame Microsoft (Score:5, Informative)
Seriously, why would Vincent Cerf not blame Microsoft? They have an extremely poor track record with backwards compatibility, and I don't think they even know what forwards compatibility is. If you design the data formats correctly then you can keep things usable for decades (or centuries). Guess what, twenty year old TeX documents still work, and yet Word X won't work with Word X-2. I've pulled runoff documents off of 70's versions of Unix that can still be printed. That says to me that one can deal with compatibility issues.
This is all intentional on Microsoft's part too. They make money when customers buy new copies of software, so it is in their best financial interests to make sure that customers have significant pressure to upgrade. I remember the solution to an acknowledged bug for Word 97 was to make sure that everyone who was going to read your document had the appropriate Word 97 plug in in their older version of Word. I completely blame Microsoft here.
This is not that hard a problem, IF the company pays attention to it and gives it even a small amount of priority.
Re:No different than cars (Score:4, Informative)
Re:XML? (Score:5, Informative)
Not even Microsoft can implement their Office XML "standard" ; from examination it's pretty much a direct name-for-name serialization of their internal binary structs, with some of the more obvious gaffes like explicitly saying "do this like this old version of Word" hastily renamed to placate ISO. It needs you to implement a whole bunch of specific behaviours if you want it to work in the MS software (things like "if you update this bit, you also have to update this other bit just so or it won't work"), but these aren't documented.
You've got more of a chance, sure, just because the structs are marked and you don't have to infer where their boundaries are, but it's a far cry from ODF which was designed from the outset to be an open XML format rather than just hastily being bunged together to permit large purchasing bodies (like governments) to tick the "Open format" box on their form.