Catch up on stories from the past week (and beyond) at the Slashdot story archive

Vint Cerf: Data That's Here Today May Be Gone Tomorrow 358

Posted by Soulskill on Tuesday June 04, 2013 @10:08PM from the tell-that-to-my-stack-of-punchcards dept.

dcblogs writes "Vinton Cerf is warning that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — may not be readable in the years and centuries ahead. Cerf illustrates the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. 'It doesn't know what it is,' he said. 'I'm not blaming Microsoft,' said Cerf, who is Google's vice president and chief Internet evangelist. 'What I'm saying is that backward compatibility is very hard to preserve over very long periods of time.' He calls it a 'hard problem.'" We're at an interesting spot right now, where we're worried that the internet won't remember everything, and also that it won't forget anything.

This discussion has been archived. No new comments can be posted.

Vint Cerf: Data That's Here Today May Be Gone Tomorrow

Load All Comments

Search 358 Comments Log In/Create an Account

Comments Filter:

XML? (Score:2)

by AlphaWolf_HK ( 692722 ) writes:

I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.
- Re: (Score:2, Insightful)
  
  by Nerdfest ( 867930 ) writes:
  
  The same applies to any *open* format.
  - Re: (Score:2)
    
    by KGIII ( 973947 ) writes:
    
    Hell, even the non-open formats are pretty easy to get to a readable level of functionality. They won't contain the markup necessarily and certain features won't be available but, frankly, if we're able to decode all the other ancient languages I'm pretty sure someone will be able to decode these as well.
    Speaking of ancient... Err.. When did Vint go to Google? That's kind of cool that he has but that is, in itself, news to me. I must have missed the announcement as I'm sure there was one.
- Re: (Score:3)
  
  by cheater512 ( 783349 ) writes:
  
  In to a usable document from scratch? Pretty hard. Ever looked at the XML of a moderately complex document?
  - - Re: (Score:3)
      
      by Half-pint HAL ( 718102 ) writes:
      
      Holy shit, yeah, you're right - it's totally impossible to strip out the XML tags and be left with readable plain text content!
      I bet nobody could ever decode it!
      You seem to be assuming a flat-text file with predictable order. Strip the XML out of anything in a tabular format (eg a spreadsheet -- see TFS) and you lose vital data. Blank cells are lost and the tabulated data no longer lines up.
      It gets worse in a filetype with unstructured formatting, eg DTP and slideware. You've got a collection of elements that are only ordered by their metadata. The explanatory labels you want to overlay on top of that image? They're no longer linked to it and you've no way of k
- Re: (Score:3)
  
  by ShanghaiBill ( 739463 ) writes:
  
  I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.
  Yes, the problem is not "data" but "data in proprietary formats" ... and even that is becoming less of a problem. A converter to/from almost anything is usually just a google search away. With VMs and emulators, even proprietary binary programs are easier than ever to deal with. I can run any CP/M or C64 program on my desktop Linux computer using free emulators. This was indeed a "hard problem", but today it is mostly solved.
  - Re: (Score:2)
    
    by Hamsterdan ( 815291 ) writes:
    
    The problem is not just related to the format, but the medium it's stored on. I can still read C64 floppies because I have some drives, but everything I have for my Apple ][ is considered lost until I find both a drive and a working machine.
  - - see Windows 1250 and 1251 (Score:3)
      
      by Doug Merritt ( 3550 ) writes:
      
      Windows 1250 and 1251 do, and possibly others. It sounds familiar, but my memory is fuzzy, so I just looked around.
      https://en.wikipedia.org/wiki/Windows-1250 [wikipedia.org]
- Re: (Score:3)
  
  by fuzzyfuzzyfungus ( 1223518 ) writes:
  
  I think that given MS office and LibreOffice are in XML, it shouldn't be difficult at all to reverse engineer in the future.
  Binary formats were standard for everything up through Office 2003. Office 2007(2003 with optional converter pack and some weird bugs) could output something XML based, though I have the vague memory from the OpenDocument/Open Office XML slugfest that 2007 produced something that deviated from the theoretical ideal of OOXML in some respects, and that full conformity happened at 2010 or 2013. I might be remembering that wrong; but anything before 2003, and a lot from 2003 were definitely binary.
  - Re: (Score:2)
    
    by Why2K ( 29813 ) writes:
    
    They are binary, but at least they are documented: http://msdn.microsoft.com/en-us/library/cc313105(v=office.12).aspx [microsoft.com]
- Maybe. (Score:5, Insightful)
  
  by MrEricSir ( 398214 ) writes: on Tuesday June 04, 2013 @11:03PM (#43910927) Homepage
  
  XML doesn't magically solve everything in this regard. If there's no good documentation for the format, it's unlikely you'll be able to display everything exactly as intended. Likewise, if the format is hideously complex (see: Microsoft Office Open XML) or there's bugs in the de-facto implementation, it's going to be tricky to reverse engineer.
  I'd also point out that MS Office spits out compressed XML. I believe it's based on ZIP, which is very well documented, but that's yet another hurdle to cross. And then you have to deal with the binary format of the XML itself -- ASCII, UTF8, etc.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by viperidaenz ( 2515578 ) writes:
    
    The file being in ZIP format is documented. the character encoding of the XML file is specified in the XML file itself, like all XML files should do.
    From what I've used so far the Open XML formats aren't hideously complex, although i've only been working with XLSX files.
    - Re: (Score:3)
      
      by wonkey_monkey ( 2592601 ) writes:
      
      ZIP format is documented.
      Right now it is. What about the ragtag bunch of misfit librarians who are all that's left after the zombie apocalypse?
      They burned all the books for warmth and to keep the zombies away.
- Re: (Score:3)
  
  by belmolis ( 702863 ) writes:
  
  Both have published specifications, so reverse engineering shouldn't be necessary. However, Microsoft's XML includes things that are not defined in the specification. That was one of the objections to giving it status as an open standard.
- Re:XML? (Score:5, Insightful)
  
  by gweihir ( 88907 ) writes: on Wednesday June 05, 2013 @01:03AM (#43911563)
  
  Have you seen what some people (and MS) do with XML? And what convoluted structures they use? Coded in binary? With compression and other eminently hard to understand stuff? Most of these things will be readable just as long as the applications that created them are around, but not longer.
  Forget XML. Forget Unicode as well. Plain ASCII is the only thing that works. Simple PDF or PostScript will work also, because the standards and open-source tools to read them will still be around. But nothing as complicated as a MS office document will survive. LibreOffice formats may have a chance, because LibreOffice may still be compilable and runnable (being FOSS), but only because of that and I would not bet on it.
  Incidentally, all my decades old LeTeX documents still compile and can also be read directly. So can my 20 year old ASCII-coded measurement data.
  
  Parent Share
  twitter facebook
- Re:XML? (Score:5, Informative)
  
  by Dr_Barnowl ( 709838 ) writes: on Wednesday June 05, 2013 @04:13AM (#43912299)
  
  Not even Microsoft can implement their Office XML "standard" ; from examination it's pretty much a direct name-for-name serialization of their internal binary structs, with some of the more obvious gaffes like explicitly saying "do this like this old version of Word" hastily renamed to placate ISO. It needs you to implement a whole bunch of specific behaviours if you want it to work in the MS software (things like "if you update this bit, you also have to update this other bit just so or it won't work"), but these aren't documented.
  You've got more of a chance, sure, just because the structs are marked and you don't have to infer where their boundaries are, but it's a far cry from ODF which was designed from the outset to be an open XML format rather than just hastily being bunged together to permit large purchasing bodies (like governments) to tick the "Open format" box on their form.
  
  Parent Share
  twitter facebook
My data will be readable (Score:4, Informative)

by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Tuesday June 04, 2013 @10:14PM (#43910607) Homepage Journal

My data will be readable because I use bog-standard formats. If I get really froggy I use HTML, and you can just strip the tags and read that.
If his data won't be readable, that's his problem. Anything you want to save for posterity, export it now.

Share
twitter facebook
- Re:My data will be readable (Score:5, Insightful)
  
  by Bremic ( 2703997 ) writes: on Tuesday June 04, 2013 @10:59PM (#43910897)
  
  Until HTML includes DRM and half the stuff you create ends up being unreadable.
  Well, really we are probably good for anything that can be opened in a text editor for a long long while; but the point is there. Anything can be lost to data format shifts.
  As someone who had to re-type a 80 page document because the company stopped using the software the document was created on, and didn't have a licence for it an no converter found online worked - I can say this does happen.
  How many people are going to shell out $600 for software to open something they want to make an edit on? How many are going to just give up and find someone to rekey it, or just give it up as a loss?
  With more and more systems including format locks, in 50+ years historians will likely have a lot of trouble finding out details from today. Kind of like it is now when we go to look at archival film from WWII and find it's all faded into obscurity. We have the same problems, just with different causes. Then it was lack of preservation of a medium with a limited lifespan. Now it's storing stuff in formats that will go away as they are improved upon, blocked, or just forgotten about.
  Sure if your in your 20s, or even 30s, you probably haven't realized the copy of your grandfathers photos are sitting on a floppy disk in a proprietary format. But when you get older you may encounter these issues.
  
  Parent Share
  twitter facebook
  - Re:My data will be readable (Score:5, Informative)
    
    by Nutria ( 679911 ) writes: on Tuesday June 04, 2013 @11:18PM (#43911031)
    
    Or NASA data from deep space probes that's stored in now-unknown formats on mag tapes from long, long, long gone manufacturers.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by kermidge ( 2221646 ) writes:
      
      Well, there's the problems with the medium itself, then there's the format, as you say (ought to be right up a cryptanalyst's alley, tho), then there's the real blocker: number of tracks, head design, and the circuitry that goes with it. Unless there are good documents for the machine's design and building, or one can be found in working order in a museum, you're SOL. It's a big problem that doesn't get much exposure.
  - Re: (Score:3)
    
    by starburst ( 63061 ) writes:
    
    From a 2002 slashdot story:
    mccalli writes :
    "Thought people might find this amusing. In 1986, the UK compiled an electronic [copy of the] domesday book. They used BBC Master computers to do it, and the result was put on laserdisc. I actually used this project whilst at school. This article states that nothing can now read these merely 15-year old discs. The original, written approx. 1086, is still doing fine thank you very much."
    Sounds like a good candidate for Bruce Sterling's Dead Media Project. (Speaking
    - Re: (Score:3)
      
      by geniice ( 1336589 ) writes:
      
      In fairness they did manage to transfer the stuff off the discs and put the stuff without copyright issues online.
  - Re: (Score:3)
    
    by ganjadude ( 952775 ) writes:
    
    why didnt you OCR and then make the edits? There are numerous OCR options that would have fit that need no?
  - Re: (Score:3)
    
    by Concerned Onlooker ( 473481 ) writes:
    
    "How many people are going to shell out $600 for software to open something they want to make an edit on?"
    The upside to this is that when somebody wants to update that nifty company Flash web site and discovers that Flash now costs an arm and a leg, the site gets re-written in html.
emulation / virtualization (Score:3)

by smash ( 1351 ) writes: on Tuesday June 04, 2013 @10:17PM (#43910619) Homepage Journal

Support emulatorVM developers! Encapsulate your entire machine in a VM and you can run the entire software stack if necessary. Anything you need convenient access to, export to CSV, XML or some other standard format.

Share
twitter facebook
- Re: (Score:2)
  
  by lister king of smeg ( 2481612 ) writes:
  
  and when Unicode and ASCII are replaced?
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    There's a pretty good chance LaTeX will still support them. There's a reason the TeX distribution is like 2 Gigs.....
  - Re: (Score:2)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes:
    
    Unicodes is a bit sprawling; but ASCII is only 128 characters(unless dealing with the wonderful world of nonstandardized non-latin extensions or ad-hoc 8-bit extensions-of-convenience is your problem, in which case I'd advise shirking your duties and drinking heavily), making preserving the whole thing even by chiselling it into stone monuments or other archaic methods potentially viable.
  - Re: (Score:3)
    
    by Mitchell314 ( 1576581 ) writes:
    
    Honestly, reverse engineering ACII plain text files would be trivial. Not to the average person, but to somebody with a bit of background:
    A) We have software that can use something called frequency analysis to decipher something encoded that has a 1-1 correspondence so something we know (ie the english alphabet).
    
    B) Ignoring software, frequency analysis is something that could be (and before the days of computers, was) done by hand. Hell, some things could be picked out by eye. For one, all files would h
    - Re: (Score:2)
      
      by mrsurb ( 1484303 ) writes:
      
      ASCII is even easier than that - because 0-9, a-z and A-Z are represented by sequential binary numbers.
- Re: (Score:2)
  
  by cheater512 ( 783349 ) writes:
  
  How do you encapsulate the VM so it will still work 20 years in the future?
  - Re:emulation / virtualization (Score:5, Funny)
    
    by Anonymous Coward writes: on Tuesday June 04, 2013 @10:46PM (#43910815)
    
    You're very clever, young man, very clever - but it's VMs all the way down!
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by geniice ( 1336589 ) writes:
      
      There are a few industrial setups where that is pretty much what has happened.
  - - - Re:emulation / virtualization (Score:4, Interesting)
        
        by smash ( 1351 ) writes: on Wednesday June 05, 2013 @02:24AM (#43911903) Homepage Journal
        
        err... plus DosBox is running x86 software I have from 198x...which is 30+ years now.
        
        Parent Share
        twitter facebook
We should have listened (Score:5, Insightful)

by Anonymous Coward writes: on Tuesday June 04, 2013 @10:19PM (#43910633)

We're in a difficult spot right now because for years we ignored the warnings about 'proprietary file formats'.
I'm not blaming Microsoft either. We let Microsoft do this to us of our own free ignorance.

Share
twitter facebook
- Re: (Score:2)
  
  by Lehk228 ( 705449 ) writes:
  
  what's this "we" shit all my files are odt, ods, html, tex or txt files. they will be just as accessible in 100 years as they are now.
  - Re: (Score:2)
    
    by Tr3vin ( 1220548 ) writes:
    
    Without developers maintaining editors/viewers, open formats are only slightly more usable than proprietary ones. 100 years is a really long time from now as far as technology goes. I wouldn't be so quick to say that open formats will still be easily accessible.
    - - Re: (Score:3)
        
        by plover ( 150551 ) writes:
        
        Actually, languages have been consolidating and standardizing rapidly with the advent of the printing press, effective and affordable transportation, broadcast media like TV and radio, and the Internet. Diversity of language is rapidly disappearing.
        The way things are going now, there will be only a few dozen languages left at the end of this century, and possibly only a handful after the hundred years that follow.
        Although it's entirely possible that technology will preserve native languages, too. If machine
- Re: (Score:2)
  
  by Wolfling1 ( 1808594 ) writes:
  
  And we're not doing it now with Apple products?
Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

by Narcocide ( 102829 ) writes: on Tuesday June 04, 2013 @10:24PM (#43910655) Homepage

Yes, you're right I have this ASCII text file created in 1997 and I can't find anything to read it...
OH WAIT ACTUALLY FUCKING *EVERYTHING* STILL READS IT.
Stop gargling Microsoft's balls so much and wipe off your chin. Proprietary data formats are THE PROBLEM. Stop trying to redirect public discourse with this thinly veiled bullshit.

Share
twitter facebook
- Re:Yes, backwards compatibility, blah blah blah... (Score:5, Informative)
  
  by Nerdfest ( 867930 ) writes: on Tuesday June 04, 2013 @10:36PM (#43910749)
  
  Odds are that you don't need to convince Vint Cerf or Google in general about the advantages of open formats.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by Nerdfest ( 867930 ) writes:
      
      Yes, the Talk XMPP shutdown and Google Reader are a little disturbing. We're as far as we are with the ubiquity of the internet because of open formats enabling intercommunication and competition between products and services by different providers. That seems to be going away again in favour of platform lock-in with things like iMessage, FaceTime, etc. Google's Hangouts are at least cross platform, but that's really only a mild improvement. You still need to use Google's implementation. I'm just happy I c
- Re: (Score:2)
  
  by cheater512 ( 783349 ) writes:
  
  But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.
  - Re: (Score:3)
    
    by PPH ( 736903 ) writes:
    
    Just Googled "ebcdic to ascii converter"
    About 123,000 results.
    - Re: (Score:3)
      
      by felipekk ( 1007591 ) writes:
      
      Just Googled "oranges to apples converter"
      About 4,780,000 results
  - Re: (Score:3)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes:
    
    But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.
    How deep are your pockets?
    *IBM Consulting*
    - apt-cache search EBCDIC (Score:2)
      
      by Burz ( 138833 ) writes:
      
      Yields 4 results in Ubuntu. You can search reputable open source archives on the web, too.
      How deep are your pockets?
      *IBM Consulting*
      Um, really???
  - Re:Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)
    
    by Anonymous Coward writes: on Tuesday June 04, 2013 @11:31PM (#43911091)
    
    But your EBCDIC documents are absolute rubbish now and the tools to convert them aren't commonplace any more.
    $ printf "\xC5\xC2\xC3\xC4\xC9\xC3\x25" | iconv -f ebcdic-us -t ascii
    EBCDIC
    $ dpkg -S `which iconv`
    libc-bin: /usr/bin/iconv
    $ apt-cache show libc-bin | grep -e Essential -e Priority
    Essential: yes
    Priority: required
    So we got a program that can convert from EBCDIC-US to ASCII (or UTF-8 or whatever you want) and that program is in an Essential/Required package on any Debian-based system and for some reason you say that "aren't commonplace"?
    Are you on crack?
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by aaarrrgggh ( 9205 ) writes:
  
  ...everything except that Zip Drive I saved it on.
Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Tuesday June 04, 2013 @10:31PM (#43910709)

Comment removed based on user account deletion

Share
twitter facebook
- Re:DRM and the digital black hole (Score:5, Interesting)
  
  by jeffasselin ( 566598 ) writes: <cormacolinde@gmai[ ]om ['l.c' in gap]> on Tuesday June 04, 2013 @10:40PM (#43910771) Journal
  
  What about online-only games? Will historians in 100 years be able to play WoW and see what the game was like?
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by Mitchell314 ( 1576581 ) writes:
    
    Luckily for them, no.
  - Re: (Score:3)
    
    by timeOday ( 582209 ) writes:
    
    Nor will they be able to join in World War II to see what that was like. However there is more recorded footage of WoW than WWII for future historians to study.
Don't forget DRM (Score:5, Insightful)

by onyxruby ( 118189 ) writes: <[onyxruby] [at] [comcast.net]> on Tuesday June 04, 2013 @10:33PM (#43910721)

Were living in what could well be a future dark age for archeologists / historians. Hardly anything is put into a nice hard format (stone is incredibly rare and metal gets stolen) for someone to find. What's left suffers from incompatible file formats, acid based paper that decomposes, bit rot, cryptography, incompatible technology for data storage and worst of all DRM. With DRM you have active measures that try to prevent something from being usable.
In the old days people stopped use with armed guards, obfuscation and primitive crypto. Today we have servers that are required for operational functionality for many products. With the advent of the cloud you have reasons for storing things where you have a dependency on a third party. How many services that are cloud / server based have come about and gone tits up?
Even having a large well known brand name doesn't protect you from having a server shut down. Just think of Microsoft's play4sure service that lasted less than a decade. Having a license and a physical disk isn't that helpful when the DRM requires an authentication server that doesn't exist. With the movement to put more and more DRM into the cloud or with SSL certificates (again dependent upon servers and naturally time bombed) this is going to be a problem that will only grow worse.
Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.

Share
twitter facebook
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Learning to break DRM is far more critical than file formats which require nothing more than a conversion tool.
  That is utterly a waste of time. It makes me sick to think of how much good effort is wasted jailbreaking the iPhone, when Apple could merely write a few lines of code and none of that would have been necessary. The entire jailbreak community around Apple is compensating for a few lines of code.
  
  I say that with complete respect for the jailbreakers, but it could be so much better.......
*sigh* (Score:3)

by MrBandersnatch ( 544818 ) writes: on Tuesday June 04, 2013 @10:34PM (#43910729)

Digital archival is one of the HARD problems. Over the last 40 years we have already lost more cultural artifacts that were created for the entirety of human history. A great deal of that is useless garbage of course but the original moon landing tape? 1000s of government emails reavealing exactly what was going on at pivotal times in history?
The truth is, we need systems for hardcopy; digital is too tranient; emulators are a useful stop gap measure but dont protect againt the kinds of catastropic failures that we will likely see over the longer time frame; and we need indexing because someone at somepoint will want to wade through our digital ditritus.

Share
twitter facebook
- real problem is: FEATURE CREEP (Score:3)
  
  by bussdriver ( 620565 ) writes:
  
  I've been part of archival problem planning. We went with DVD. now I am not there, I suspect they are thinking DVD sucks and are moving "forward" when the DVD was more than good enough and those plastic discs will last a century. mpeg-2 files will have open source decoders. Now physical readers will still be a problem... the only solution is to wait as long as possible and then switch to the next long lasting format - but not necessarily the newest one at that time. (which is why moving to blueray is a w
This is news? Nope. Not new... (Score:3)

by flogger ( 524072 ) writes: <non@nonegiven> on Tuesday June 04, 2013 @10:43PM (#43910791) Journal

This has been true of all technology in the past and will continue into the future. Just look at film. How many preserved films from 1915 are still around? Just the ones that were recorded into a new format of film, then a newer format of film, then into a VHS, then into a LaserDisc, then a DVD, then a BlueRay... (Metropolis, I am looking at you.)

Within arms reach, I have Floppy drives that contain files created in AMI Pro work processors.... WHen I say Floppy, I am talking about the 5 1/4 inch floppies.
Technology hardware and software is not stagnant... It will always continue to develop and progress (ignore windows 8). Data that is worth keeping will get converted. Data that isn't will get left behind. I would not be surprised that in about 25 years, there will be "classic" software as there is Classic literature...

Too much typing.. going back to drinking.....

Share
twitter facebook
- - Re: (Score:3)
    
    by Dogtanian ( 588974 ) writes:
    
    You put your finger on it. I'd just add what I had planned on saying- that, in general, it's not always obvious what's going to be "useful" and "of interest" to future generations when it isn't practical to keep everything.
    
    In fact, a lot of things that would be of interest to us- i.e. everyday, mundane life- was never recorded at all, back when film and equipment were quite expensive and the effort and cost would have been saved for documenting "important" occasions. Even at a personal level, if I'd known
Tax Records (Score:3)

by PPH ( 736903 ) writes: on Tuesday June 04, 2013 @10:45PM (#43910811)

The IRS wants to audit me, going back several years. I kept the records as required but they are unreadable now.
Thanks Microsoft!

Share
twitter facebook
- Re: (Score:2)
  
  by yuhong ( 1378501 ) writes:
  
  If it is Word/Excel, try disabling the file blocks using the registry or in 2010 or later using the UI in the Trust Center.
  See http://support.microsoft.com/kb/922849 [microsoft.com]
One would think (Score:2)

by no-body ( 127863 ) writes:

That people in the far future would be getting smarter to accomplish this - probably a tossup - and apart from it, it's very questionable if a far future for humanity even exists, the way "humanity" is behaving this days/years/decades/centuries/millenia....

Maybe there are smarter robots by then babysitting...
Another argument... (Score:2)

by FuzzNugget ( 2840687 ) writes:

For open source. Save your files in open and/or openly defined, standardized formats and there will always be software that can deal with it.
But I guess it's difficult for people to hear you explain that to them with their head up their ass.
- Re: (Score:2)
  
  by wvmarle ( 1070040 ) writes:
  
  Argument for open standards, yes. Open source, no. You don't need open source for open standards. And open source does not necessarily mean open standards.
Google hire me, I solved this problem in 3 seconds (Score:2)

by dicobalt ( 1536225 ) writes:

I would solve this by installing a Windows XP VM with a copy of Office XP. Now that I solved Google's hard problem they must now see I am qualified to work there. Google is on a FUD rampage of which the likes I haven't seen since the great Microsoft FUD storms.
On the PowerPoint 4.0/95 converters... (Score:5, Insightful)

by yuhong ( 1378501 ) writes: <{moc.liamtoh} {ta} {683_oabgnohuy}> on Tuesday June 04, 2013 @11:07PM (#43910965) Homepage

MS removed the PowerPoint 4.0/95 converters completely with Office 2007 for Windows and later, and disabled them by default in Office 2003 SP3 [microsoft.com]. And the PowerPoint 4.0 converter (but not 95) was disabled by default instead of fixed [microsoft.com] with MS09-017.
On the Mac, they removed then even earlier, when they ported Office to Carbon [microsoft.com].
IMO it would be a good idea for MS to package PP4X32 and PP7X32 from PowerPoint 2003 separately, along with a utility to call the converters of course.

Share
twitter facebook
Uh, hello? (Score:5, Funny)

by DogDude ( 805747 ) writes: on Tuesday June 04, 2013 @11:10PM (#43910979)

For a supposedly smart guy, he seems a bit silly:

He could've just downloaded MS's Powerpoint 97 viewer [microsoft.com]

Share
twitter facebook
Wasn't this solved ages ago? (Score:2)

by samantha ( 68231 ) * writes:

I remember over two decades ago there was talk of making data objects, that is data that new how to present an object interface to get at its information. Data self contain its own reader in some ubiquitous language. But wait, we never got a ubiquitous language. Perhaps javascript today? But if you want to solve this problem then this is how to solve it. Or perhaps you could just package a converter to convert format XYZ to BSON as being good enough or at least better than today's breakage.
One thi
I have legible pictures over 150 years old (Score:3)

by the_rajah ( 749499 ) * writes: on Tuesday June 04, 2013 @11:21PM (#43911051) Homepage

Some are glass plate Daguerreotypes. Somehow, I am not too confident that my digital pictures will be legible 150 years from now, unless I make a good quality print on archival paper. Digital files are too easily corrupted and made totally useless. Media formats will change. 8" floppies anyone?

Share
twitter facebook
- Re: (Score:2)
  
  by AK Marc ( 707885 ) writes:
  
  I have gifs and jpegs from much older than the document in question, and have no trouble with any of them (and BMPs from before that). I was listening to MP3 in the 1990s. They work fine now.
  
  What digital picture standard (not raw) do you think you'll have any trouble reading, and roughly when do you think you'd have any trouble with it?
  Media formats will change. 8" floppies anyone?
  So you are worried that your MFM HD from the 1980s won't work, not worried about the .BMP on it not being readable, if you found some way to spin it up? I can't even tell
  - Re: (Score:3)
    
    by jafac ( 1449 ) writes:
    
    yes - this is a real issue - and ARCHIVED data that is important DOES need to be "spun up" and refreshed to new media.
    If it's hard drives, yes. If it's optical media. . . well that depends. Because some optical media just plain degrades over time. Some is written in special proprietary formats (like Apple's early implementations of CD+R) that you're going to have a hard time reading with CURRENT equipment.
    If your data is archived to tape, and more than 10 years old, I'm afraid you're fucked.
No different than cars (Score:5, Interesting)

by HockeyPuck ( 141947 ) writes: on Tuesday June 04, 2013 @11:34PM (#43911109)

We're still able to restore cars from the 80s and earlier as the cars were fully mechanical or hydraulic. No computers.
Fast forward to 20yrs from now, nobody's going to be carrying the computer boards for a 2004 Toyota Pruis or a 2013 Tesla.
However, you'll still be able to restore your grandfather's '57 Chevy...

Share
twitter facebook
- Re:No different than cars (Score:4, Informative)
  
  by AK Marc ( 707885 ) writes: on Wednesday June 05, 2013 @12:21AM (#43911347)
  
  You'll just have to take the Prius ROM on an emulator on your phone, and plug in your phone to drive your car. Easy.
  
  Parent Share
  twitter facebook
Code should accompany data (Score:5, Interesting)

by michaelmalak ( 91262 ) writes: <michael@michaelmalak.com> on Tuesday June 04, 2013 @11:35PM (#43911113) Homepage

I presented a solution [blogspot.com] to this long-standing problem last year to the Denver HTML5 Meetup.
Code should never be separated from data. This is possible with HTML5, JavaScript, and open source.
In the presentation, I steal and repurpose Hofstadter [wikipedia.org]'s analogy of DNA to an LP vinyl record, which is an information bearer, but useless without its information retriever (the record player). Like the cell of an animal, which contains both DNA and the means to "play" it, I ask why not the same with software?
My maxim is: data should always carry the code with it to play itself. It was inspired from the field I've spent 50% of my career in: non-destructive testing where, for example, X-Rays and ultrasounds are performed on safety-critical industrial parts with 50-year service lives. If one of those parts fails and kills someone, you're going to want to go back into the old data and find the earliest indication of the flaw or fault and reinspect every other part in the world like it that is still in service. And maybe you need to go back 50 years. Under such a context, not providing the code with the data could be considered an act of gross neglect.
In my presentation, I use the 1990's era trick of embedding XSL into an XML file, with the addition of the XSL now being able to use HTML5/JavaScript. Sadly, I've only gotten it work with Firefox -- the other browsers consider it a security violation.

Share
twitter facebook
- Re: (Score:2)
  
  by femtobyte ( 710429 ) writes:
  
  From a future data recovery standpoint, how is the "code" any more useful than data? You'd still need to be able to figure out how to execute the code itself --- the code is just an especially complex and capable file format (which likely makes it very difficult to figure out if you've lost the execution instructions). Some file formats are already complete programming languages --- like PostScript. Do you think you could make much sense of a PostScript representation of a document if you started without a
  - Re: (Score:2)
    
    by michaelmalak ( 91262 ) writes:
    
    It's an issue of installed base.
    The installed base of any given NDT system is typically less than Qty. 100, often much less. The installed base of HTML5 interpreters is on the order of a billion. The installed base of PowerPoint 97 at its peak was in the tens of millions. To be honest, I think Vint Cerf is complaining a bit much. Anyone (including him) could download the appropriate VMs, archival operating systems, and archival Microsoft Office systems to read PowerPoint 97 and even convert it to a modern f
- Re:Code should NEVER accompany data! (Score:5, Insightful)
  
  by lahvak ( 69490 ) writes: on Wednesday June 05, 2013 @12:41AM (#43911455) Homepage Journal
  
  No! Fail! You don't get it!
  1) Code is data
  2) Code is data that is especially hard to interpret
  3) One of the main reasons of all this mess ia that in all those proprietary formats, data is intermixed with code, and the whole mess is very hard to parse.
  Data should be kept completely isolated, as far away from code as possible. That way, if you cannot interpret the code any more, you will still be able to analyze and parse the data. You know, it is not that hard to construct a record player.
  
  Parent Share
  twitter facebook
Comment removed (Score:5, Informative)

by account_deleted ( 4530225 ) writes: on Tuesday June 04, 2013 @11:59PM (#43911255)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:3)
  
  by mhotchin ( 791085 ) writes:
  
  To say that MS has a poor record of backwards compatibility is, well, ridiculous. It's only just about *the* most important thing for them, because the majority of their business is with busnesses, and if their FooBar app doesn't run, then they don't upgrade.
  No other OS has near the level of compatibility that the MS sequence does.
  http://www.youtube.com/watch?v=vPnehDhGa14 [youtube.com]
  http://blogs.msdn.com/b/oldnewthing/archive/2006/11/06/999999.aspx [msdn.com]
  http://blogs.msdn.com/b/oldnewthing/archive/2003/08/28/54719.aspx [msdn.com]
  - Re: (Score:3)
    
    by serviscope_minor ( 664417 ) writes:
    
    No other OS has near the level of compatibility that the MS sequence does.
    Somebody's been drinking the kool-aid.
    There's a small, little known company called IBM selling a type of computer called a "mainframe" which might beg to disagree. You can buy a modern mainframe which will still run your unmodified programs which you wrote on an original System 360. In 1964.
    Microsoft have not even existed as long as that chain of backwards compatibility, and you try getting the original digger to run on Windows 8 (or
  - Re: (Score:3)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:3)
    
    by drinkypoo ( 153816 ) writes:
    
    No other OS has near the level of compatibility that the MS sequence does.
    It's called ANSI C on Unix. Pick up a copy of The UNIX Programming Environment and you can still use the examples verbatim on a Linux machine today. And you can even still use Motif apps, if we're talking about GUI programs. They still work just like they did when they were new, except a hell of a lot faster.
    Oh, you want backwards compatibility for closed-source software? Guess what? Plenty of software craps itself when it does anything interesting on the wrong version of windows. In reality, there's only o
This problem isn't new to anyone (Score:2)

by kriston ( 7886 ) writes:

This problem isn't new to anyone. If it's new to you, then you need to get involved in the digital preservation movement.
http://en.wikipedia.org/wiki/Digital_obsolescence [wikipedia.org]
"hard problem" (Score:5, Insightful)

by macraig ( 621737 ) writes: <`moc.liamg' `ta' `giarc.a.kram'> on Wednesday June 05, 2013 @12:29AM (#43911387)

Vint, that's bullshit and you know it. It's nothing more than preserving syntaxes, grammar, file formats. That's not hard, and it only requires someone to create a format conversion ONCE to solve the problem at each stage of the evolution.
The real problem here is proprietary non-public formats and structures. When the structure of data has been a closely guarded secret and requires reverse engineering that may not even yield a perfect result, THAT is hard.

Share
twitter facebook
He should be blaming Microsoft (Score:3)

by gweihir ( 88907 ) writes: on Wednesday June 05, 2013 @12:41AM (#43911459)

My first Latex publications from 20 years back and all my human-readable ASCII scientific data still be read and used without any problem. Human-readable file
formats in the UNIX tradition completely solve this problem.
This problem is only hard if the people making the data formats are either stupid or do not want their formats to be easily accessible to other applications, as Microsoft does. Of course, others are creating just as fundamentally broken formats for either of the same reasons.

Share
twitter facebook
- Re: (Score:3)
  
  by hcs_$reboot ( 1536101 ) writes:
  
  Just hex print the MS 97 file and you have a human readable format:00007b0 5f00 675f 6f6d 5f6e 7473 7261 5f74 005f 00007c0 696c 6362 732e 2e6f 0036 5f5f 7270 6e69 00007d0 6674 635f 6b68 6500 6978 0074 6573 6c74 00007e0 636f 6c61 0065 626d 7472 776f 0063 706f 00007f0 6974 646e 7300 7274 636e 7970 7000 7475 0000800 0073 6177 6e72 0078 5f5f 7473 6361 5f6b 0000810 6863 5f6b 6166 6c69 6900 7773 7270 6e69
the man is out of touch (Score:3)

by stenvar ( 2789879 ) writes: on Wednesday June 05, 2013 @12:49AM (#43911497)

You can get emulators for just about every machine you can imagine: PDP-10, PDP-11, DOS, Atari, Amiga, C64, microcontroller, etc. You can get hardware emulators with FPGAs if you like. Almost any important format is documented or has been reverse engineered. Yes, you can easily read 1997 PowerPoint files, even if his weird choice of Office on Mac can't. And that's only with current technology. Give it a few decades and all that can happen behind the scenes and computers will just automatically perform even the most complicated data conversions behind the scenes. "Computer, scan the 1997 floppy and put the data on screen."

Share
twitter facebook
best safeguard (Score:3)

by hduff ( 570443 ) writes: <hoytduff@gmail.cEEEom minus threevowels> on Wednesday June 05, 2013 @08:04AM (#43913197) Homepage Journal

The best safeguard is the abandonment of all existing proprietary formats to freedom (so anybody can write conversion software) and the proliferation of open formats on an ongoing basis.

Share
twitter facebook
- Re:So? (Score:5, Insightful)
  
  by MrBandersnatch ( 544818 ) writes: on Tuesday June 04, 2013 @10:23PM (#43910649)
  
  I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes:
    
    I think you will find that there's a little known branch of academia called "history" which sometimes takes a curious interest in even the most trivial of past information.....
    Even if you don't care about the historians, I'm sure the lucky people who have the pleasure of handling property deeds at your local governance hive can tell you a story from within the last week or two about needing to pull some rather seriously dusty documents to allow a present-day transaction to go through without incident.
    Many data will, indeed, be of no interest at all, or the same historical interest that neolithic refuse dumps are; but data in the nontrivial-number-of-decades range are still live i
- Re: (Score:2)
  
  by Mitchell314 ( 1576581 ) writes:
  
  Man, fuck the future (that's right you historians-not-yet-born). They have all the flying cars and meal-in-a-pill's and immortality clinics and shit. The hell have they done for us to deserve our sympathy? If that means we can make them have to work that much harder to see how life was now, I say do it.
  
  Now back to my zombie virus work. Anybody got a decent time capsule for me to use?
  - - Re: (Score:2)
      
      by ArhcAngel ( 247594 ) writes:
      
      *spoilers*
- Re: (Score:2)
  
  by codepigeon ( 1202896 ) writes:
  
  Ok, so how do you retrieve your photos that you stored on that 8inch floppy disk... 10 years from now?
  
  That is a gross exageration but is an anaolgy to the point of the article. Without proper protections, all the information, notes, white papers, studies, etc will be useless if there doesn't exist technology that can read it.
  
  In a worst case scenario how would humankind rebuild and not forget what was previously learned (e.g. dark ages we already experienced).
  - Re: (Score:2)
    
    by ganjadude ( 952775 ) writes:
    
    he specifically stated that he re backs up every year. I dont go that far but i have data going back as far as the early 90s that started on large floppys, migrated them to smaller floppies, migrated them to CD-r's and now have them on external hard drives. It isnt too hard to keep formats alive. (also note on the hard drives I keep VMs with older OS's able to read formats that i have not found a way to convert, which isnt many.)
- Re: (Score:2)
  
  by yuhong ( 1378501 ) writes:
  
  I think the user was either using PowerPoint 4.0 for Mac or did not upgrade to Office 97 immediately.
  - Re: (Score:2)
    
    by 0123456 ( 636235 ) writes:
    
    Quite likely. I had some old Word for Mac documents of scientific papers I wrote in the 90s, and the only way I was able to recover them a few years ago was to install a Windows 3.1-era copy of Word for Windows.
    - Re: (Score:3)
      
      by yuhong ( 1378501 ) writes:
      
      Have you tried disabling the file blocks [microsoft.com] first? At least Word for Mac 4.x and 5.x can be read this way.
    - - Re: (Score:2)
        
        by yuhong ( 1378501 ) writes:
        
        WordBasic, actually. What is fun BTW is to unblock Word 6.0/95 formats in 2010 and later and open a file with WordBasic like SCANPROT.DOT.
- Re: (Score:2)
  
  by yuhong ( 1378501 ) writes:
  
  Note they also sometimes drop support for old formats too:
  https://bugs.freedesktop.org/show_bug.cgi?id=59902 [freedesktop.org]
- Re: (Score:2)
  
  by Prof.Phreak ( 584152 ) writes:
  
  By the time YOU care to convert a file and can't... there's no app, and NOBODY but you gives a damn about that file you got.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

XML? (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

see Windows 1250 and 1251 (Score:3)

Re: (Score:3)

Re: (Score:2)

Maybe. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re:XML? (Score:5, Insightful)

Re:XML? (Score:5, Informative)

My data will be readable (Score:4, Informative)

Re:My data will be readable (Score:5, Insightful)

Re:My data will be readable (Score:5, Informative)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

emulation / virtualization (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:emulation / virtualization (Score:5, Funny)

Re: (Score:3)

Re:emulation / virtualization (Score:4, Interesting)

We should have listened (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

Re:Yes, backwards compatibility, blah blah blah... (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

apt-cache search EBCDIC (Score:2)

Re:Yes, backwards compatibility, blah blah blah... (Score:5, Insightful)

Re: (Score:2)

Comment removed (Score:5, Interesting)

Re:DRM and the digital black hole (Score:5, Interesting)

Re: (Score:3)

Re: (Score:3)

Don't forget DRM (Score:5, Insightful)

Re: (Score:2)

*sigh* (Score:3)

real problem is: FEATURE CREEP (Score:3)

This is news? Nope. Not new... (Score:3)

Re: (Score:3)

Tax Records (Score:3)

Re: (Score:2)

One would think (Score:2)

Another argument... (Score:2)

Re: (Score:2)

Google hire me, I solved this problem in 3 seconds (Score:2)

On the PowerPoint 4.0/95 converters... (Score:5, Insightful)

Uh, hello? (Score:5, Funny)

Wasn't this solved ages ago? (Score:2)

I have legible pictures over 150 years old (Score:3)

Re: (Score:2)

Re: (Score:3)

No different than cars (Score:5, Interesting)

Re:No different than cars (Score:4, Informative)

Code should accompany data (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Code should NEVER accompany data! (Score:5, Insightful)

Comment removed (Score:5, Informative)

Re: (Score:3)

Re: (Score:3)

sigh (Score:3)