Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Office 2007 Fails OOXML Test With 122,000 Errors

Posted by ScuttleMonkey on Monday April 21, @04:00PM
from the money-greases-the-wheels dept.
I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"

Related Stories

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If you can change a vote of "no with comments [slashdot.org]" to "yes" I don't see why you couldn't change "fails with 122,000 errors" to "passes." I mean, when your standard passes through sheer lobbying and politics with little technical analysis, it's going to take a lot to surprise me with how epically it fails.
  • the Open Document Format? Just curious.
  • Technical details mean absolutely nothing in this discussion. I thought we established this.
  • by llamafirst (666868) * on Monday April 21, @04:06PM (#23149832)

    In a blog posting this week, Alex Brown, leader of the International Organization for Standardization (ISO) group in charge of maintaining the Office Open XML (OOXML) standard, revealed that Microsoft Office 2007 documents do not meet the latest specifications of the ISO OOXML draft standard. "Word documents generated by today's version of Microsoft Office 2007 do not conform to ISO/IEC 29500," said Brown in a blog post recounting the process of testing a document against the "strict" and "transitional" schema defined in the standard.

    Ahem. Let me be the first to say:
    Brownie, you're doing a heck of a job! [wikipedia.org]

  • by voislav98 (1004117) on Monday April 21, @04:09PM (#23149884)
    which is that it's the standard that's deficient. I'm sure that the standard will soon be "improved" so it conforms with Office 2007
  • by Nom du Keyboard (633989) on Monday April 21, @04:11PM (#23149916)
    OOXML is such a fraud that it's disgusting that we continue to waste such time on it. If it could win on the merits it wouldn't need such underhanded tactics by its (very few) supporters. It's clearly intended as an ODF-killer by creating an unnecessary parallel "standard".
  • Impressive (Score:5, Insightful)

    by rumith (983060) on Monday April 21, @04:12PM (#23149930)

    While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.

    I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?

      • Except that open standards are usually government mandated. Microsoft would have otherwise ignored it completely, going with the lock-in you describe since they "own" the office landscape. They submitted OOXML because they didn't want to be locked out of new gov't initiatives requiring more accessible data formats, so they forced their crap through trying to call it open, while not really being so.
  • HTML (Score:5, Interesting)

    by WK2 (1072560) on Monday April 21, @04:12PM (#23149942)
    It's not a fast-tracked ISO standard, but HTML and CSS have no conforming implementations. I'm not sure, but links might conform to HTML.
  • by msh104 (620136) on Monday April 21, @04:15PM (#23149992)
    I don't want to destroy the mood that the slashdot editor wanted to create by posting this sensational peace of propaganda. but this is not 122.000 bugs is it? this is a parser generating 122.000 error results. sure it's bad.. but anyone who has ever tried to make code w3c compatible or debug any piece of code will know that just 1 error can result into many many many error results. thus ( despite my will for it to be so ) does not really give you much insight in microsofts compatibility with it's own standard.
  • Speaking as an OOX implementer, this is pretty bad. But it's not quite as bad as the headline makes it seem - the meat of the story [griffinbrown.co.uk] is linked a few blogs deep:

    The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

    Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type.

    <m:degHide m:val="on"/> where "val's" values are supposed to be "true|false".

    [snip]

    Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)


    In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.

    And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org]. This one is a show-stopper.
    • Without a reference implementation, how do you know a standard is valid?
        • > Wha? Valid in what respects?

          Valid as in possible to implement. How could a standard not be possible to implement you ask? Well that is simple. E.g. write a program that follows this standard:
          1. It must print "1" on exit
          2. It must print "2" on exit

          As you can see, it would not be possible to implement a program according to that standard. That is why someone would need to write a reference application implementing the standard to notice errors like this. Before the standard is given to the whole world to be implemented.

          It is better that only one has to wonder the errors of the standards, rather than the whole world.
        • You need at least one coded reference implementation or else you'll end up with something in the standard which is difficult/impossible to implement. Especially in a 6,000+ page standard.

          ISO would be well advised to take the method the IETF uses, which is to have two independent teams implement the standard based on the documentation before an RFC can reach a Draft Standard status. I suspect ODF would have only benefited from this process by cutting down its rough edges, while OOXML would have been so cumbersome that it would be simply dropped.

    • by EmbeddedJanitor (597831) on Monday April 21, @04:17PM (#23150026)
      Developing a standard without having a working example is very foolish. Stuff that looks cool in a standard often does not work out well in real life (theory != practice). Technically, it is far better to survey the landscape for things that work well and standardise those. There are problems with this approach: the companies that have implemented the winning standards often have a competitive advantage,lobbying can wreck the process and the standards might be burdened with patents (and standards users need to pay royalties to the patent holders).

      For one example where this has worked well, consider vehicle networking. Bosch invented/designed the Control Area Network (CAN). This was standardised by SAE as part of the in vehicle networking specification. ISO then just adopted the SAE stuff and extended it in some new areas. The stuff all works well and is based on proven technology (ie. the technology existed before the standards).

    • And why is that an issue? The job of ISO is to develop the standard in an implementable fashion. Top down.

      That explains why OSI is such a trainwreck compared to IP.

      Not a bottom up

      So why was ODF approved, then? Or ISO C?

      adopt the lowest common denominator of whats already out there

      "Lowest common denominator" is not equivalent to bottom-up design.

    • Not a bottom up, adopt the lowest common denominator of whats already out there
      Sure, the ISO does that a lot, and it's a fine approach. But that takes time, which is why the fast-track process was designed for standards which have already been implemented.
    • Up with mebibytes! (Score:5, Insightful)

      by JustinOpinion (1246824) on Monday April 21, @04:37PM (#23150372)
      Ha!

      Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).

      Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
      1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
      2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.

      Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)

      For what it's worth, many software products now use the binary prefix [wikipedia.org] notation (e.g. Konqueror).
        • Re:Stop using MiB (Score:5, Informative)

          by Richard Steiner (1585) <rsteiner@visi.com> on Monday April 21, @04:46PM (#23150522) Homepage Journal
          Language is typically defined by usage, not the other way around. Unless you're the French, perhaps. :-)

          Remember that "kilo" *did* (and does) mean 1024 in a computing context. Everybody understood that who was involved on a technical level. Everybody. There was no miscommunication in the general case ... except when it came to laypeople who largely didn't understand what was described in the first place. When that happened, we just told them that bigger is better and moved on...

          Your comment about octet confuses and annoys me. Go away. :-)