Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software Microsoft

Office 2007 Fails OOXML Test With 122,000 Errors 430

I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
This discussion has been archived. No new comments can be posted.

Office 2007 Fails OOXML Test With 122,000 Errors

Comments Filter:
  • by Anonymous Coward on Monday April 21, 2008 @04:01PM (#23149736)
    This would be a great thread to put some more negative karma in twitter's sock puppets.
  • by eldavojohn ( 898314 ) * <eldavojohn@gma[ ]com ['il.' in gap]> on Monday April 21, 2008 @04:01PM (#23149738) Journal
    If you can change a vote of "no with comments [slashdot.org]" to "yes" I don't see why you couldn't change "fails with 122,000 errors" to "passes." I mean, when your standard passes through sheer lobbying and politics with little technical analysis, it's going to take a lot to surprise me with how epically it fails.
  • Technical Details (Score:5, Insightful)

    by Enderandrew ( 866215 ) <enderandrew&gmail,com> on Monday April 21, 2008 @04:03PM (#23149776) Homepage Journal
    Technical details mean absolutely nothing in this discussion. I thought we established this.
  • Stop using MiB (Score:3, Insightful)

    by hedleyroos ( 817147 ) on Monday April 21, 2008 @04:06PM (#23149824)
    Men in Black? What happened to good old megabytes? The article says 17MB!
  • Without a reference implementation, how do you know a standard is valid?
  • by Nom du Keyboard ( 633989 ) on Monday April 21, 2008 @04:11PM (#23149916)
    OOXML is such a fraud that it's disgusting that we continue to waste such time on it. If it could win on the merits it wouldn't need such underhanded tactics by its (very few) supporters. It's clearly intended as an ODF-killer by creating an unnecessary parallel "standard".
  • Impressive (Score:5, Insightful)

    by rumith ( 983060 ) on Monday April 21, 2008 @04:12PM (#23149930)

    While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.

    I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?

  • by msh104 ( 620136 ) on Monday April 21, 2008 @04:15PM (#23149992)
    I don't want to destroy the mood that the slashdot editor wanted to create by posting this sensational peace of propaganda. but this is not 122.000 bugs is it? this is a parser generating 122.000 error results. sure it's bad.. but anyone who has ever tried to make code w3c compatible or debug any piece of code will know that just 1 error can result into many many many error results. thus ( despite my will for it to be so ) does not really give you much insight in microsofts compatibility with it's own standard.
  • by HetMes ( 1074585 ) on Monday April 21, 2008 @04:17PM (#23150018)
    ... it's actually worse. We're all agreeing here, it's who comes up with the most ludicrous comparison or the most disturbing details about the case what counts. So, the question is: What can any of us do about this?
  • by jollyreaper ( 513215 ) on Monday April 21, 2008 @04:17PM (#23150028)

    you get a 17 MiB file
    This whole mebibyte thing seems like an April Fool's prank that's been carried on for too many years. I can't believe people are actually using it now.
  • Re:Impressive (Score:3, Insightful)

    by SatanicPuppy ( 611928 ) * <SatanicpuppyNO@SPAMgmail.com> on Monday April 21, 2008 @04:20PM (#23150058) Journal
    If the open standard is bloated and buggy, then people will keep using the closed formats.

    Microsoft has zero percentage in having a good, workable, open format.
  • by dvice_null ( 981029 ) on Monday April 21, 2008 @04:22PM (#23150102)
    > Wha? Valid in what respects?

    Valid as in possible to implement. How could a standard not be possible to implement you ask? Well that is simple. E.g. write a program that follows this standard:
    1. It must print "1" on exit
    2. It must print "2" on exit

    As you can see, it would not be possible to implement a program according to that standard. That is why someone would need to write a reference application implementing the standard to notice errors like this. Before the standard is given to the whole world to be implemented.

    It is better that only one has to wonder the errors of the standards, rather than the whole world.
  • Re:Stop using MiB (Score:3, Insightful)

    by Digi-John ( 692918 ) on Monday April 21, 2008 @04:24PM (#23150148) Journal

    I see a lot of this happening in Wikipedia articles lately, too. Someone let the hyperpedantic nerds out of their basements to confuse every normal person on the fucking planet.

    Similar to the new prevalence of BCE and CE vs. BC and AD. Come on, you must admit that "Anno Domine" is far cooler than "Current/Christian Era". Up next, we change "Wednesday" to "Threeday", because references to Odin are just far too Euro-centric. That is, assuming we stick with that Judeo-Christian concept about Sunday being the seventh day.

  • Re:Stop using MiB (Score:4, Insightful)

    by Yvan256 ( 722131 ) on Monday April 21, 2008 @04:25PM (#23150174) Homepage Journal
    Just because people have been using SI prefixes to redefine that "kilo means 1024" for 40+ years doesn't mean they're right.

    Also, "octet" is the french word for "byte", so it's also 8-bit. :P
  • Re:Impressive (Score:2, Insightful)

    by daveime ( 1253762 ) on Monday April 21, 2008 @04:26PM (#23150192)
    This IS XML we are talking about ... even transmitting a boolean yes or no which should in principle take 1 bit becomes :-

    <xml schema="http:fuckingxml.com">
    <myboolean>
    TRUE
    </myboolean>
    </xml>

    On that basis, 500 bytes per error probably equates to around 1.152 bits of "useful" error information.

    Rather than standardize even more bloated crap, on this occasion I applaud MS for comitting OOXML to the early grave it deserves, by failing to even pass the tests on a standard they effectively created (and paid a lot of money) to get approved.
  • Re:Impressive (Score:5, Insightful)

    by PitaBred ( 632671 ) <slashdot@pitabre d . d y n d n s .org> on Monday April 21, 2008 @04:28PM (#23150234) Homepage
    Except that open standards are usually government mandated. Microsoft would have otherwise ignored it completely, going with the lock-in you describe since they "own" the office landscape. They submitted OOXML because they didn't want to be locked out of new gov't initiatives requiring more accessible data formats, so they forced their crap through trying to call it open, while not really being so.
  • by Schraegstrichpunkt ( 931443 ) on Monday April 21, 2008 @04:29PM (#23150238) Homepage

    And why is that an issue? The job of ISO is to develop the standard in an implementable fashion. Top down.

    That explains why OSI is such a trainwreck compared to IP.

    Not a bottom up

    So why was ODF approved, then? Or ISO C?

    adopt the lowest common denominator of whats already out there

    "Lowest common denominator" is not equivalent to bottom-up design.

  • Up with mebibytes! (Score:5, Insightful)

    by JustinOpinion ( 1246824 ) on Monday April 21, 2008 @04:37PM (#23150372)
    Ha!

    Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).

    Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
    1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
    2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.

    Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)

    For what it's worth, many software products now use the binary prefix [wikipedia.org] notation (e.g. Konqueror).
  • Not a bottom up, adopt the lowest common denominator of whats already out there
    Sure, the ISO does that a lot, and it's a fine approach. But that takes time, which is why the fast-track process was designed for standards which have already been implemented.
  • Re:HTML (Score:5, Insightful)

    by pembo13 ( 770295 ) on Monday April 21, 2008 @04:47PM (#23150538) Homepage
    And see how well that turned out.
  • by davidkv ( 302725 ) on Monday April 21, 2008 @04:51PM (#23150576)
    There's a fundamental difference between the IETF and ISO. IETF makes standards of stuff that has been proven to work (or at least be implementable), whereas ISO wants to write specs to tell people what should work.

    A bit like comparing tcp/ip and whatsitsname (x400?). It doesn't really matter how nice something looks on paper if there's no good implementation of it.
  • Re:Curiousity (Score:3, Insightful)

    by Feyr ( 449684 ) on Monday April 21, 2008 @04:55PM (#23150624) Journal
    microsoft was more than happy to play that game,
    until some governments stepped in and said any documents submitted to them in the coming years has to be an open standard.

    so they bought their way to one and voila. their documents still dont conform in practice, but in theory it's an open standard
  • by Anonymous Coward on Monday April 21, 2008 @04:55PM (#23150634)
    "In theory, theory and practice are the same. In practice, they are not."
  • Re:HTML (Score:4, Insightful)

    by Schraegstrichpunkt ( 931443 ) on Monday April 21, 2008 @04:57PM (#23150664) Homepage

    The current HTML specs are trainwrecks for the same reason. That's what HTML 5 is attempting to fix.

    Incidentally, the W3C specs are actually called "Recommendations". There's probably a reason for that.

  • Re:Curiousity (Score:2, Insightful)

    by Fast Thick Pants ( 1081517 ) <fastthickpants@gmail . c om> on Monday April 21, 2008 @04:58PM (#23150676)

    You've fallen victim to Microsoft's water-muddying strategy -- They gave their new file spec the ridiculous name of "Office Open XML" (abbreviated OOXML) just so it would be conflated with the OpenOffice.org's software and file formats.

    So this is not a case of a third-party compliance test like the Acid tests for web browsers; this is Microsoft failing to conform to their own standard.

  • Re:Stop using MiB (Score:4, Insightful)

    by psychodelicacy ( 1170611 ) <bstcbn@gmail.com> on Monday April 21, 2008 @05:06PM (#23150782)

    To be fair, we don't use "hour" to mean "sixty minutes" in every context except computing, where it means "fifty-eight and a half minutes". The rationality lies in the removal of confusion, as much as in the units themselves.

  • by Xtifr ( 1323 ) on Monday April 21, 2008 @05:07PM (#23150806) Homepage

    Does anyone know if Open Office is compliant with the Open Document Format? Just curious.
    I don't know, but if none of the multiple (big difference already) vendors behind ODF haven't implemented it properly yet, then that just means that it shouldn't have been on the fast-track.

    Oh wait! It wasn't!

    The fast-track is for de-facto standards which are already so widespread (i.e. supported by multiple vendors) and consistent that there's little point in trying to push a divergent standard out, even though a divergent standard might be better. Something like TCP/IP would be a good example of the sort of thing where the fast track might be appropriate. ODF wasn't fast-tracked, so the standards committee came up with the best standard, irrespective of what might actually be out there in the wild. Now it's up to the vendors to catch up. That's the usual way this is done (i.e. the C++ standard, where most vendors took a few years to catch up, or the C standard where most vendors took a few months to catch up, and MS took a few years).

    Of course, if MSOOXML had gone through the regular track, it probably would have taken years to finish (since it's so large, complex, and poorly defined), and MS couldn't afford to wait. So instead they bought themselves a standards committee or twelve.
  • Re:Stop using MiB (Score:5, Insightful)

    by Yvan256 ( 722131 ) on Monday April 21, 2008 @05:16PM (#23150932) Homepage Journal
    If language is defined by usage, does that mean that copyright infringement now equals theft? ;-)

    You have never seen the confusion of metric users entering the CS field, have you? Ever seen a teacher struggle with the very same point we're having right now?

    As I said, in the rest of the world, kilo means 1000, not 1024. And here you're saying it becomes something else because a particular field has abused it for 40 years?

    Also note that both hard drive manufacturers and digital telecommunications, in a computing context, use 1000 for kilo.

    So your argument becomes "if you're in a computing context BUT not talking about hard drives OR telecommunications, then kilo means 1024"...

    I'd rather use KiB=1024, thank you very much. :-)
  • Re:Curiousity (Score:5, Insightful)

    by MightyMartian ( 840721 ) on Monday April 21, 2008 @05:17PM (#23150948) Journal
    And that's what's been going on. However, a lot of governments and other organizations are now realizing that leveraging all that data they've been gathering for the better part of two decades on a closed, proprietary standard could lead to disaster. That's the whole point of trying to get an internationally recognized open standard that anyone can implement. ODF is supposed to fulfill the function of a published, implementable office document standard so that, theoritically, in 2100AD, when someone needs to open a document created in 2010, it's in a openly available format that, at the very worst, someone has to reimplement, but at least has clear, concise documentation that isn't thousands of pages long and doesn't include references to proprietary standards.

    The problem with that is that an open document format standard is a direct threat to Microsoft's near-monopoly in the office app department. If anyone can implement a document format that's cross-compatible, then they can easily implement a competitor to Office, and if they decide to undercut Office or (as with OO.org) give the damn thing away, then Microsoft's monopoly is one breath from collapse, and believe me, if Microsoft loses Office, they're in serious, serious trouble within five years. So, OOXML, a "standard" that not even Microsoft can implement, is pushed through the ISO using all sorts of peculiar and ultimately nefarious methods now means Microsoft and its partners can go around telling Small Town, USA that Office saves in an ISO standard, but in reality, the poor bastard in 2100AD who needs to open this file is going to be spending many months trying to figure out this monster, which is in direct violation of the whole notion of an open standard.

    That you have no problems is irrelevant. That's not what the point of an open standard is.
  • by walterbyrd ( 182728 ) on Monday April 21, 2008 @05:23PM (#23151030)
    I thought the idea behind the fast-track was a have less-fussy way of ratifying standards, when those standards were already widely used.

    If that is correct, then how does the MSOOXML standard qualify? This is a "standard" that is used by absolutely nobody, not even the creator of the standard uses this standard.

    Do I not understand the idea behind the fast-track process?
  • by inTheLoo ( 1255256 ) * on Monday April 21, 2008 @05:29PM (#23151142) Journal

    The point of the article is that there are no conforming implementations. There never will be a conforming implementation and everyone knows it.

  • by CodeBuster ( 516420 ) on Monday April 21, 2008 @05:40PM (#23151264)
    which is why it doesn't really matter. The standards which can actually be implemented and have an open source reference implementation, such as the Open Document Format (ODF) [wikipedia.org], will become the de-facto standards at least for archive and long term storage. Also, there will be tremendous pressure on Microsoft to at least implement ODF for their Office products and probably to make that the default save format as well. However, it would be nice if the standards could allow for optional extensions which are not required (I believe that the TIFF format for images allows this) but could be used by programs which want to add enhancements, but allow readability and editing in other programs which only meet the minimum standards. Perhaps this is already a feature or could someone with more detailed knowledge about ODF comment?
  • Re:Stop using MiB (Score:4, Insightful)

    by gbjbaanb ( 229885 ) on Monday April 21, 2008 @06:44PM (#23152074)

    Also note that both hard drive manufacturers and digital telecommunications, in a computing context, use 1000 for kilo.
    Also note that both hard drive manufacturers and digital telecommunications, in a marketing context, use 1000 for kilo.

    There, fixed that for you.
  • Re:Stop using MiB (Score:1, Insightful)

    by Anonymous Coward on Monday April 21, 2008 @06:53PM (#23152190)
    No, it doesn't, and didn't.

    The problem is not with the people who use the data true, the problem is that the SI standards of Kilo, Mega etc... were here first and apply to device construction. The problem is in networking and to a degree harddrives, not software. If you ask me to build a fibre system using a laser that transmits at x KB/s, I'm clear on that, it's 1000 bytes. Not 1024, because I was trained as a physical scientist, and hardware as made by engineers and scientists requires clear specification. Everyone knows that when asked to put 44K of fuel into a plane you don't put 44K pounds of fuel in an airplane, you put 44K kilograms because the world is on SI, except the US and liberia. So some idiot puts 44k pounds of fuel in a plane and it has to glide into the azores rather than crossing the atlantic. So you have to clearly specify.

    This isn't, and wasn't particlarly a problem when we were working in Kilobytes or even really megabytes, and getting kibibytes confused with kilobytes is a relatively small error, that becomes a problem when you're talking about 'Petabytes', where 1.126 * 10^15 and 1.0 * 10^15 are two rather different things, and which one is your hardware using?

    As to why it was never clear, the wikipedia article is remarkably helpful: the 1.44 MB floppy, in which the 'M' as used is neither mibibytes nor megabtyes but one * the other. Confused yet? Good, that's why we have international bodies to standardize language, so you won't be anymore.
  • by danskal ( 878841 ) on Monday April 21, 2008 @07:42PM (#23152664)

    "B) Economy Stunting Taxation ... "
    BZZZZZ.... wrong!!! There's nothing stunted about the scandinavian economies (other than the US economy & subprime crisis dragging them down slightly at the moment), and they have some of the highest tax rates in the world.
    If tax money is used to lubricate the wheels of commerce, by ensuring a fit, well-educated, flexible, motivated work force, and by ensuring that infrastructure just works, that monopolies aren't abused etc.. Then there is no reason for taxation, within reason, to be a problem. I guess the logic is that sometimes, an intelligent government, voted for by the people and working for the people, can spend/invest the people's money more wisely then they can themselves.
  • by Anonymous Coward on Monday April 21, 2008 @08:07PM (#23152846)
    The compat. pack wont install on my BeOS desktop, what am I doing wrong?
  • by Kalriath ( 849904 ) * on Monday April 21, 2008 @08:43PM (#23153144)
    I've heard elsewhere in this Slashdot discussion that apparently there is a point where OO.o blatantly violates the specification - using the exact opposite value for hidden text as it's meant to. So it's almost valid.
  • by fbjon ( 692006 ) on Monday April 21, 2008 @08:44PM (#23153158) Homepage Journal
    The real problem is not with how much taxes are collected, it's the "intelligent government" part. I think a part of the problem is that the larger the government or governing structure is (in terms of people and country size, not legislation), the more it becomes an inefficient sieve rather than funnel.


    On one hand, a person should indeed be free to live as one sees fit, including spending. But on the other hand, people are stupid, so electing smart people and raising taxes seems like a win to me. That just leaves the "election" part, then. Now what to do about that.....

  • by Anonymous Coward on Monday April 21, 2008 @09:04PM (#23153294)
    Bah, why not just print "12"? The standard didn't say the 1 and 2 had to be printed on their own did it? ;)
  • by walterbyrd ( 182728 ) on Monday April 21, 2008 @09:31PM (#23153476)
    WTF does the acid3 test have to do with any of this?

    However firefox does with the acid3 has nothing to do with ISO corruption, does it?
  • Re:HTML (Score:3, Insightful)

    by AdamKG ( 1004604 ) <slashdot&adamgomaa,com> on Monday April 21, 2008 @10:07PM (#23153688) Homepage

    Okay, help me out here. Do you mean "how well that turned out" in the sense that HTML has been a huge success (you know, what with being the medium that we're using to display our comments right now ...) or in the sense of being a huge disaster?

    I mean, I can sympathize with both views. I'm just wondering which one I should sympathize with in the context of your post.

  • by Allador ( 537449 ) on Tuesday April 22, 2008 @12:32AM (#23154734)

    You either conform to a standard or you don't.
    Thats a nice theory but not really practical. ISO OOXML (strict version) was created between MS product releases.

    How long should it have taken for MS to release a version that matched ISO OOXML strict? One hour? One day? One year? More?

    Companies dont have the magical ability to instantly create a released product the day that the standards group settles on something. Thats just absurd.

    A standard that allows non conforming versions is no standard.
    Standards dont allow or disallow implementations. Thats now how it works. Standards exist. Implementations try to be compliant to them.

    According to TFA, Office 2007 OOXML is very conformant to ISO OOXML Transitional. But its not very comformant to ISO OOXML Strict.

    This should not be a surprise. For examle, the Strict version removes VML as a vector graphics markup. But MS has a decade or more of investment in VML, and their currently released products use VML. It will take a while for MS to change Office to not use VML (assuming they do choose to).

    If it would take 2 to 4 years for M$ to properly implement and document their crappy little standard, it should take 2 to 4 years for people to believe they had a standard worthy of ISO approval.
    I agree that it shouldnt have been fast tracked. That was a bit of an abomination. But lets be clear that MS didnt create a new standard, and then implement it. They just continued to develop their existing implementation, and documented what they already had. The OOXML is not a fresh creation ... its a documentation of something that has existed and been evolving for 10-15 years.

    Standards that come from mature, crufty old de-facto standards (ie, OOXML) are always going to be uglier than standards that were created to be a standard from day one (ie, ODF). Thats just reality. Expecting it to be clean and pretty is not reasonable.

    But the world where OOXML and the previous binary .doc .xls, etc formats are documented (ie, the world we're in now) is better than the one we were in before, where none of it was documented.

    PS, thank you Twitter for being reasonably coherent and making a post that, littered with the M$ nonsense that it is, at least was a reasonable discussion.
  • Comment removed (Score:4, Insightful)

    by account_deleted ( 4530225 ) on Tuesday April 22, 2008 @01:25AM (#23155006)
    Comment removed based on user account deletion
  • by h4rm0ny ( 722443 ) on Tuesday April 22, 2008 @04:00AM (#23155636) Journal

    I'd be happy to continue this via e-mail.

    Please continue things like this on Slashdot. Many of us come here mainly so we can read the debates that go on and it's a shame if an interesting one retires to private email discussion. That was a fascinating post.
  • by aug24 ( 38229 ) on Tuesday April 22, 2008 @06:53AM (#23156234) Homepage

    in case anyone claims that ODF doesn't have the same sort of problem

    FFS, ODF isn't a fast-track ('multiply implemented and widespread') standard. It's perfectly acceptable for a proposed standard to be ahead of current implementations - it's only proposed after all. Implementations should be expect to be playing catch-up.

    OOXML on the other hand is claimed to be already implemented and widespread and thus eligible for fast track. So it is a big deal if it turns out it isn't. Not to mention that you're selectively pointing out that the transitional version nearly works, blithely ignoring the fact (in the same blog) that strict is well fucked. So the strict version of the 'standard' should be thrown out even harder that then the transitional.

    I'm beginning to wonder if this concept is just too hard to grasp for many slashdotters or if there're just too many people drinking Norway brand Kool-aid.

    Justin.

8 Catfish = 1 Octo-puss

Working...