Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Software Microsoft

Office 2007 Fails OOXML Test With 122,000 Errors 430

I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
This discussion has been archived. No new comments can be posted.

Office 2007 Fails OOXML Test With 122,000 Errors

Comments Filter:
  • by eldavojohn ( 898314 ) * <eldavojohn@@@gmail...com> on Monday April 21, 2008 @04:01PM (#23149738) Journal
    If you can change a vote of "no with comments [slashdot.org]" to "yes" I don't see why you couldn't change "fails with 122,000 errors" to "passes." I mean, when your standard passes through sheer lobbying and politics with little technical analysis, it's going to take a lot to surprise me with how epically it fails.
    • by Finallyjoined!!! ( 1158431 ) on Monday April 21, 2008 @04:09PM (#23149888)
      OOXML: "The best Standard money can buy"
      • by CodeBuster ( 516420 ) on Monday April 21, 2008 @05:40PM (#23151264)
        which is why it doesn't really matter. The standards which can actually be implemented and have an open source reference implementation, such as the Open Document Format (ODF) [wikipedia.org], will become the de-facto standards at least for archive and long term storage. Also, there will be tremendous pressure on Microsoft to at least implement ODF for their Office products and probably to make that the default save format as well. However, it would be nice if the standards could allow for optional extensions which are not required (I believe that the TIFF format for images allows this) but could be used by programs which want to add enhancements, but allow readability and editing in other programs which only meet the minimum standards. Perhaps this is already a feature or could someone with more detailed knowledge about ODF comment?
  • by notaprguy ( 906128 ) * on Monday April 21, 2008 @04:02PM (#23149760) Journal
    the Open Document Format? Just curious.
    • by EmbeddedJanitor ( 597831 ) on Monday April 21, 2008 @04:05PM (#23149792)
      You just use this conversion tool called Open Office
    • by Xtifr ( 1323 ) on Monday April 21, 2008 @05:07PM (#23150806) Homepage

      Does anyone know if Open Office is compliant with the Open Document Format? Just curious.
      I don't know, but if none of the multiple (big difference already) vendors behind ODF haven't implemented it properly yet, then that just means that it shouldn't have been on the fast-track.

      Oh wait! It wasn't!

      The fast-track is for de-facto standards which are already so widespread (i.e. supported by multiple vendors) and consistent that there's little point in trying to push a divergent standard out, even though a divergent standard might be better. Something like TCP/IP would be a good example of the sort of thing where the fast track might be appropriate. ODF wasn't fast-tracked, so the standards committee came up with the best standard, irrespective of what might actually be out there in the wild. Now it's up to the vendors to catch up. That's the usual way this is done (i.e. the C++ standard, where most vendors took a few years to catch up, or the C standard where most vendors took a few months to catch up, and MS took a few years).

      Of course, if MSOOXML had gone through the regular track, it probably would have taken years to finish (since it's so large, complex, and poorly defined), and MS couldn't afford to wait. So instead they bought themselves a standards committee or twelve.
  • Technical Details (Score:5, Insightful)

    by Enderandrew ( 866215 ) <[enderandrew] [at] [gmail.com]> on Monday April 21, 2008 @04:03PM (#23149776) Homepage Journal
    Technical details mean absolutely nothing in this discussion. I thought we established this.
  • Stop using MiB (Score:3, Insightful)

    by hedleyroos ( 817147 ) on Monday April 21, 2008 @04:06PM (#23149824)
    Men in Black? What happened to good old megabytes? The article says 17MB!
    • by Anonymous Coward on Monday April 21, 2008 @04:11PM (#23149914)

      Men in Black? What happened to good old megabytes? The article says 17MB!
      Maybe, but I make this shit look GOOD.
    • by Richard Steiner ( 1585 ) <rsteiner@visi.com> on Monday April 21, 2008 @04:14PM (#23149964) Homepage Journal
      Shh... The submitter is trying to impose those trendy "base 2" SI prefixes on us in spite of 40+ years of prior art to the contrary. Another case of ivory tower types not being sophisticated enough to grok current industry usage, methinks...

      And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people. Really.

      Sheesh! :-) :-)
      • Re:Stop using MiB (Score:4, Insightful)

        by Yvan256 ( 722131 ) on Monday April 21, 2008 @04:25PM (#23150174) Homepage Journal
        Just because people have been using SI prefixes to redefine that "kilo means 1024" for 40+ years doesn't mean they're right.

        Also, "octet" is the french word for "byte", so it's also 8-bit. :P
        • Re:Stop using MiB (Score:5, Informative)

          by Richard Steiner ( 1585 ) <rsteiner@visi.com> on Monday April 21, 2008 @04:46PM (#23150522) Homepage Journal
          Language is typically defined by usage, not the other way around. Unless you're the French, perhaps. :-)

          Remember that "kilo" *did* (and does) mean 1024 in a computing context. Everybody understood that who was involved on a technical level. Everybody. There was no miscommunication in the general case ... except when it came to laypeople who largely didn't understand what was described in the first place. When that happened, we just told them that bigger is better and moved on...

          Your comment about octet confuses and annoys me. Go away. :-)
          • Re:Stop using MiB (Score:5, Insightful)

            by Yvan256 ( 722131 ) on Monday April 21, 2008 @05:16PM (#23150932) Homepage Journal
            If language is defined by usage, does that mean that copyright infringement now equals theft? ;-)

            You have never seen the confusion of metric users entering the CS field, have you? Ever seen a teacher struggle with the very same point we're having right now?

            As I said, in the rest of the world, kilo means 1000, not 1024. And here you're saying it becomes something else because a particular field has abused it for 40 years?

            Also note that both hard drive manufacturers and digital telecommunications, in a computing context, use 1000 for kilo.

            So your argument becomes "if you're in a computing context BUT not talking about hard drives OR telecommunications, then kilo means 1024"...

            I'd rather use KiB=1024, thank you very much. :-)
          • Re:Stop using MiB (Score:5, Interesting)

            by benwaggoner ( 513209 ) <ben.waggoner@m[ ... m ['icr' in gap]> on Monday April 21, 2008 @07:16PM (#23152420) Homepage
            Except "computing" isn't a clear-cut domain. For example, in my field of compression. Does that count as "computing" (power of 2) or telecommunications (power of 10)? Unclear?

            So, we had a problem where different tools and formats defined it different ways. For a number of years, QuickTime used K=1024, while Windows Media and RealMedia used K=1000. Unless you were using Sorenson Squeeze, which "corrected" its Windows Media and RealMedia values by 1.024 so they matched the QuickTime files sizes!


            Fortunately, the compression world has standardized on power-of-10 numbers, since that's what the MPEG standards and, well, all the professionals use.

            So, now we have to do with complainsts about the mismatch between encoding a file that should be "4 GB" but doesn't fill up "4 GB" of drive space...

            Sorry, 1024's got to be a KiB. No other feasible solution at this point, unless we decide to stop having computers talk to each other...
            • Re: (Score:3, Interesting)

              by fbjon ( 692006 )
              In fact, do we even need to express filesizes in powers of 2 at all? Is there any reason to continue this practice other than tradition?
      • Re: (Score:2, Funny)

        by hardburn ( 141468 )

        More like fixing 40+ years of hard drive manufacturers lieing to us about storage space.

      • Note to self: b != 8. :-)
      • Re:Stop using MiB (Score:5, Informative)

        by Schraegstrichpunkt ( 931443 ) on Monday April 21, 2008 @04:53PM (#23150604) Homepage

        40+ years of prior art to the contrary

        "1 MW" has always meant 1,000,000 watts. "9.6 kbps" has always meant 9,600 bits per second. A "500 GB" hard drive still means 500,000,000,000 bytes.

        There are relatively few places where this is screwed up, most of which fall into these categories:

        • RAM or things derived from RAM (e.g. page sizes) where the physical layout imply powers of 2
        • Microsoft

        The latter doesn't even get it consistent. "1.44 MB" floppies are actually 1440 * 1024 bytes.

        Another case of ivory tower types not being sophisticated enough to grok current industry usage, methinks...

        "Current industry usage" is to be ambiguous; 17 MB means "somewhere between 16 and 18 megabytes". The people you call "ivory tower types", including the IEC [www.iec.ch], are trying to use more precise language.

        And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people.

        The term "octet" does exactly the same thing that the binary prefixes do: They indicate more precisely what is being talked about.

        As someone else in this thread said, "just because some people made the mistake, decades ago, of choosing to equal kilo to 1024 doesn't mean they were right."

  • by llamafirst ( 666868 ) * on Monday April 21, 2008 @04:06PM (#23149832)

    In a blog posting this week, Alex Brown, leader of the International Organization for Standardization (ISO) group in charge of maintaining the Office Open XML (OOXML) standard, revealed that Microsoft Office 2007 documents do not meet the latest specifications of the ISO OOXML draft standard. "Word documents generated by today's version of Microsoft Office 2007 do not conform to ISO/IEC 29500," said Brown in a blog post recounting the process of testing a document against the "strict" and "transitional" schema defined in the standard.

    Ahem. Let me be the first to say:
    Brownie, you're doing a heck of a job! [wikipedia.org]

  • Duh (Score:4, Funny)

    by Arreez ( 1252440 ) on Monday April 21, 2008 @04:09PM (#23149874)
    Seriously......anyone not see it coming? Office 2007 being submitted to this test is like submitting to a "Will it float?" test with your hands tied and the good ol' cement shoes strapped on.
    • Re: (Score:2, Funny)

      by jnik ( 1733 )
      Great. Now I just want to know...will it blend?
    • by Dunbal ( 464142 )
      Well, that all depends on the density of the liquid you plan on throwing it in...? You can't change the laws of physics!

      OK ok I agree, usually it's water, and no it doesn't float; but on slashdot, who knows!
  • by voislav98 ( 1004117 ) on Monday April 21, 2008 @04:09PM (#23149884)
    which is that it's the standard that's deficient. I'm sure that the standard will soon be "improved" so it conforms with Office 2007
  • by Nom du Keyboard ( 633989 ) on Monday April 21, 2008 @04:11PM (#23149916)
    OOXML is such a fraud that it's disgusting that we continue to waste such time on it. If it could win on the merits it wouldn't need such underhanded tactics by its (very few) supporters. It's clearly intended as an ODF-killer by creating an unnecessary parallel "standard".
    • Re: (Score:3, Interesting)

      by BearRanger ( 945122 )
      No. It's intended to sway governments that have passed laws requiring all documents to be created using open standards. This is all about Microsoft being able to sell Office to European countries and (soon) California.
  • Impressive (Score:5, Insightful)

    by rumith ( 983060 ) on Monday April 21, 2008 @04:12PM (#23149930)

    While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.

    I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?

    • Re: (Score:3, Insightful)

      If the open standard is bloated and buggy, then people will keep using the closed formats.

      Microsoft has zero percentage in having a good, workable, open format.
    • by Yvan256 ( 722131 )
      What if their goal was to promote "open formats" as being incredibly difficult to be compatible with, that all open format documents (and their content) were at risk, and that closed, controlled proprietary formats were the only sane choice?

    • Re: (Score:2, Insightful)

      by daveime ( 1253762 )
      This IS XML we are talking about ... even transmitting a boolean yes or no which should in principle take 1 bit becomes :-

      <xml schema="http:fuckingxml.com">

      On that basis, 500 bytes per error probably equates to around 1.152 bits of "useful" error information.

      Rather than standardize even more bloated crap, on this occasion I applaud MS for comitting OOXML to the early grave it deserves, by failing to even pass the tests on a standard they effectively c
  • HTML (Score:5, Interesting)

    by WK2 ( 1072560 ) on Monday April 21, 2008 @04:12PM (#23149942) Homepage
    It's not a fast-tracked ISO standard, but HTML and CSS have no conforming implementations. I'm not sure, but links might conform to HTML.
  • by msh104 ( 620136 ) on Monday April 21, 2008 @04:15PM (#23149992)
    I don't want to destroy the mood that the slashdot editor wanted to create by posting this sensational peace of propaganda. but this is not 122.000 bugs is it? this is a parser generating 122.000 error results. sure it's bad.. but anyone who has ever tried to make code w3c compatible or debug any piece of code will know that just 1 error can result into many many many error results. thus ( despite my will for it to be so ) does not really give you much insight in microsofts compatibility with it's own standard.
  • ... it's actually worse. We're all agreeing here, it's who comes up with the most ludicrous comparison or the most disturbing details about the case what counts. So, the question is: What can any of us do about this?
  • you get a 17 MiB file
    This whole mebibyte thing seems like an April Fool's prank that's been carried on for too many years. I can't believe people are actually using it now.
    • Up with mebibytes! (Score:5, Insightful)

      by JustinOpinion ( 1246824 ) on Monday April 21, 2008 @04:37PM (#23150372)

      Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).

      Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
      1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
      2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.

      Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)

      For what it's worth, many software products now use the binary prefix [wikipedia.org] notation (e.g. Konqueror).
      • by menace3society ( 768451 ) on Monday April 21, 2008 @05:36PM (#23151226)
        You're forgetting one thing: people have already adapted to the "old" usage. Dictionaries already exist saying that "mega-" can mean a factor of 1048576 units of computer data. If we change the system now, what will not happen is that everything disambiguates itself, and the hard disk companies stop lying to customers. What will happen is that

        1) Seagate et al. will continue to market their products in terms of GB and TB.
        2) Users will be outraged that their 232GiB hard disk only has 231 or so GiBs of usable space due to formatting, thus leaving the problem unsolved.
        3) People will lose good slang abbreviations like Meg and Gig to Kib, Mib, Gib (or Jib), Tib, and Pib, which not only sound stupid but will also be hard to distinguish in normal conversation.
        4) PHBs will misuse the binary-only versions as if they were base ten, especially if it catches on that "mebi-" is more than "mega-".
        Techie: Hey boss we've got new computers with 100 mebibytes of L1 cache.
        PHB: How much is a mebibytes?
        Techie: 1048576 bytes.
        PHB: Oh, so it's about a million then. Cool.
        Next Day
        PHB: Hey guys, we shipped nearly 2 mebi-units of dongles this quarter.
        Board: What's mebi-units?
        PHB: Well, it's.... Proceed into incorrect explanation that convinced Board of Directors that Boss is "with it"
        5) As a corollary to 4), people will start using those prefixes to refer to everything in a computer. The new chip is 3.2 GiHz, it draw 25 kiW of power, it weighs 21 Kig, etc.
        6) People will always think you are a douchebag.

        And that's not even getting into the confusion caused by having two different sets of prefixes for slightly different multipliers, maybe, during the transition.

        Ask any Brit: How much is a trillion?
  • I Remember When... (Score:3, Interesting)

    by Nom du Keyboard ( 633989 ) on Monday April 21, 2008 @04:20PM (#23150078)
    I remember when back in the good old days of the IBM EGA (640x350 6-bit color) adapter, when semi-clone cards were made they were all rounded up and tested against the IBM "standard". The IBM card had a couple flaws at the time, two of the bottom scan lines were interchanged, and it interfered with the computer's (IBM PC) ability to Warm Boot. Each card was given a percentage rating of how well it compared to the IBM Standard, and comments on whether or not the bugs in the original were fixed, or kept for compatibility reasons. Also, for less money, all of the clone cards came with the maximum 256KB of memory, while the IBM EGA only had 64KB standard, with the rest able to be added through a daughter card.

    What most made me smile was that the IBM EGA card was included in the matrix of results, showing a rating of 100% compatibility with itself.

  • by dominator ( 61418 ) on Monday April 21, 2008 @04:21PM (#23150084) Homepage
    Speaking as an OOX implementer, this is pretty bad. But it's not quite as bad as the headline makes it seem - the meat of the story [griffinbrown.co.uk] is linked a few blogs deep:

    The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

    Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type.

    <m:degHide m:val="on"/> where "val's" values are supposed to be "true|false".


    Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)

    In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.

    And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org]. This one is a show-stopper.
    • Re: (Score:3, Insightful)

      by aug24 ( 38229 )

      in case anyone claims that ODF doesn't have the same sort of problem

      FFS, ODF isn't a fast-track ('multiply implemented and widespread') standard. It's perfectly acceptable for a proposed standard to be ahead of current implementations - it's only proposed after all. Implementations should be expect to be playing catch-up.

      OOXML on the other hand is claimed to be already implemented and widespread and thus eligible for fast track. So it is a big deal if it turns out it isn't. Not to mention that you

  • It would be ironic if it were not completely expected. I think it would be interesting to see M$ try to spin this one, that at least one of two things must be true: 1) OOXML sucks 2) their software sucks because it can't even follow a standard they themselves created. Probably something along the lines of: "the standard is a significant improvement over Office 2007 which we will implement in our new version." or "We tried to make OOXML a great standard but our efforts were thwarted by outside forces" [i
  • hmmm... 122k errors (Score:3, Interesting)

    by SlshSuxs ( 1089647 ) on Monday April 21, 2008 @04:28PM (#23150222)
    After the first error, are the remaining errors meaningful (i.e. false positives)? I believe most errors after the first are false positives relative to the first error.
  • We have met the enemy and he is us. -- Pogo (Walt Kelly)
  • uhhhhh (Score:3, Funny)

    by niteice ( 793961 ) <icefragment@gmail.com> on Monday April 21, 2008 @04:59PM (#23150688) Journal

    Most of the problems reportedly relate to the serialization/deserialization code.

    Isn't that what file formats do?
  • by walterbyrd ( 182728 ) on Monday April 21, 2008 @05:23PM (#23151030)
    I thought the idea behind the fast-track was a have less-fussy way of ratifying standards, when those standards were already widely used.

    If that is correct, then how does the MSOOXML standard qualify? This is a "standard" that is used by absolutely nobody, not even the creator of the standard uses this standard.

    Do I not understand the idea behind the fast-track process?
  • well... (Score:4, Interesting)

    by sentientbrendan ( 316150 ) on Monday April 21, 2008 @05:58PM (#23151496)
    >How many other fast-tracked ISO standards have no conforming implementations?


    Try out the "export" keyword next time you write any C++.
  • At least one other (Score:4, Informative)

    by ribuck ( 943217 ) on Monday April 21, 2008 @06:15PM (#23151668) Homepage

    How many other fast-tracked ISO standards have no conforming implementations?

    ISO 25436 describes a version of the Eiffel programming language that has never been fully implemented. The standard contains lots of "blue-sky" "would-be-nice-to-have" sections which are planned to be implemented in the future.

    ECMA gives the document author a lot of control, so things can become ECMA standards that would not become ISO standards. But then the fast track ISO process (for existing ECMA standards) makes it easier for them to become ISO standards.

  • by flyingfsck ( 986395 ) on Monday April 21, 2008 @06:16PM (#23151686)
    I don't see any problem. Under the standard, proprietary extensions are allowed...

Perfection is acheived only on the point of collapse. - C. N. Parkinson