Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Office 2007 Fails OOXML Test With 122,000 Errors

Posted by ScuttleMonkey on Mon Apr 21, 2008 03:00 PM
from the money-greases-the-wheels dept.
I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by eldavojohn (898314) * <my/.username@@@gmail.com> on Monday April 21 2008, @03:01PM (#23149738) Homepage Journal
    If you can change a vote of "no with comments [slashdot.org]" to "yes" I don't see why you couldn't change "fails with 122,000 errors" to "passes." I mean, when your standard passes through sheer lobbying and politics with little technical analysis, it's going to take a lot to surprise me with how epically it fails.
    • by Finallyjoined!!! (1158431) on Monday April 21 2008, @03:09PM (#23149888)
      Repost.
      OOXML: "The best Standard money can buy"
        • by Bu11etmagnet (1071376) on Monday April 21 2008, @05:38PM (#23152000)

          The standards which can actually be implemented and have an open source reference implementation, such as the Open Document Format (ODF), will become the de-facto standards at least for archive and long term storage.
          I find your lack of realism...disturbing
      • Diebold voting machines run Windows CE.
        • All I have to say is that it's a good thing Microsoft isn't running the 2008 Presidential Election!
          Diebold voting machines run Windows CE.
          Please press any key to start voting!

          >> [Enter]

          Are you sure you want to vote today?
          (Allow/Deny)

          >> Allow

          *An anthropomorphic paper clip appears*
          "Hi! I'm Clippy, I see you're trying to vote!"
          "Let me help you with that! Which of these do you enjoy the most:"
          A) Fear Mongering
          B) Economy Stunting Taxation ...

          Yeah, I can't wait to vote this year ...
            • by fbjon (692006) on Monday April 21 2008, @07:44PM (#23153158) Homepage Journal
              The real problem is not with how much taxes are collected, it's the "intelligent government" part. I think a part of the problem is that the larger the government or governing structure is (in terms of people and country size, not legislation), the more it becomes an inefficient sieve rather than funnel.


              On one hand, a person should indeed be free to live as one sees fit, including spending. But on the other hand, people are stupid, so electing smart people and raising taxes seems like a win to me. That just leaves the "election" part, then. Now what to do about that.....

  • by notaprguy (906128) * on Monday April 21 2008, @03:02PM (#23149760) Journal
    the Open Document Format? Just curious.
    • by EmbeddedJanitor (597831) on Monday April 21 2008, @03:05PM (#23149792)
      You just use this conversion tool called Open Office
    • by Xtifr (1323) on Monday April 21 2008, @04:07PM (#23150806) Homepage

      Does anyone know if Open Office is compliant with the Open Document Format? Just curious.
      I don't know, but if none of the multiple (big difference already) vendors behind ODF haven't implemented it properly yet, then that just means that it shouldn't have been on the fast-track.

      Oh wait! It wasn't!

      The fast-track is for de-facto standards which are already so widespread (i.e. supported by multiple vendors) and consistent that there's little point in trying to push a divergent standard out, even though a divergent standard might be better. Something like TCP/IP would be a good example of the sort of thing where the fast track might be appropriate. ODF wasn't fast-tracked, so the standards committee came up with the best standard, irrespective of what might actually be out there in the wild. Now it's up to the vendors to catch up. That's the usual way this is done (i.e. the C++ standard, where most vendors took a few years to catch up, or the C standard where most vendors took a few months to catch up, and MS took a few years).

      Of course, if MSOOXML had gone through the regular track, it probably would have taken years to finish (since it's so large, complex, and poorly defined), and MS couldn't afford to wait. So instead they bought themselves a standards committee or twelve.
        • by makomk (752139) on Monday April 21 2008, @05:25PM (#23151824) Journal
          As far as I know, Open Office produces valid ODF documents (with the odd extension for things like spelling and grammar checker options that are application-dependent), but it doesn't necessarily implement 100% of the latest version of the ODF spec. (In fact, IIRC sometimes other word processors add support for new ODF features before it does.) Since ODF is a committee-developed standard not based on what any one word processor does, this really shouldn't be surprising.
  • Technical Details (Score:5, Insightful)

    by Enderandrew (866215) <enderandrew@@@gmail...com> on Monday April 21 2008, @03:03PM (#23149776) Homepage Journal
    Technical details mean absolutely nothing in this discussion. I thought we established this.
  • by llamafirst (666868) * on Monday April 21 2008, @03:06PM (#23149832)

    In a blog posting this week, Alex Brown, leader of the International Organization for Standardization (ISO) group in charge of maintaining the Office Open XML (OOXML) standard, revealed that Microsoft Office 2007 documents do not meet the latest specifications of the ISO OOXML draft standard. "Word documents generated by today's version of Microsoft Office 2007 do not conform to ISO/IEC 29500," said Brown in a blog post recounting the process of testing a document against the "strict" and "transitional" schema defined in the standard.

    Ahem. Let me be the first to say:
    Brownie, you're doing a heck of a job! [wikipedia.org]

  • Duh (Score:4, Funny)

    by Arreez (1252440) on Monday April 21 2008, @03:09PM (#23149874)
    Seriously......anyone not see it coming? Office 2007 being submitted to this test is like submitting to a "Will it float?" test with your hands tied and the good ol' cement shoes strapped on.
  • by voislav98 (1004117) on Monday April 21 2008, @03:09PM (#23149884)
    which is that it's the standard that's deficient. I'm sure that the standard will soon be "improved" so it conforms with Office 2007
  • by Nom du Keyboard (633989) on Monday April 21 2008, @03:11PM (#23149916)
    OOXML is such a fraud that it's disgusting that we continue to waste such time on it. If it could win on the merits it wouldn't need such underhanded tactics by its (very few) supporters. It's clearly intended as an ODF-killer by creating an unnecessary parallel "standard".
  • Impressive (Score:5, Insightful)

    by rumith (983060) on Monday April 21 2008, @03:12PM (#23149930)

    While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.

    I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?

      • Re:Impressive (Score:5, Insightful)

        by PitaBred (632671) <slashdot@@@pitabred...dyndns...org> on Monday April 21 2008, @03:28PM (#23150234) Homepage
        Except that open standards are usually government mandated. Microsoft would have otherwise ignored it completely, going with the lock-in you describe since they "own" the office landscape. They submitted OOXML because they didn't want to be locked out of new gov't initiatives requiring more accessible data formats, so they forced their crap through trying to call it open, while not really being so.
  • HTML (Score:5, Interesting)

    by WK2 (1072560) on Monday April 21 2008, @03:12PM (#23149942) Homepage
    It's not a fast-tracked ISO standard, but HTML and CSS have no conforming implementations. I'm not sure, but links might conform to HTML.
  • by msh104 (620136) on Monday April 21 2008, @03:15PM (#23149992)
    I don't want to destroy the mood that the slashdot editor wanted to create by posting this sensational peace of propaganda. but this is not 122.000 bugs is it? this is a parser generating 122.000 error results. sure it's bad.. but anyone who has ever tried to make code w3c compatible or debug any piece of code will know that just 1 error can result into many many many error results. thus ( despite my will for it to be so ) does not really give you much insight in microsofts compatibility with it's own standard.
  • by dominator (61418) on Monday April 21 2008, @03:21PM (#23150084) Homepage
    Speaking as an OOX implementer, this is pretty bad. But it's not quite as bad as the headline makes it seem - the meat of the story [griffinbrown.co.uk] is linked a few blogs deep:

    The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

    Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type.

    <m:degHide m:val="on"/> where "val's" values are supposed to be "true|false".

    [snip]

    Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)


    In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.

    And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org]. This one is a show-stopper.
  • by walterbyrd (182728) on Monday April 21 2008, @04:23PM (#23151030)
    I thought the idea behind the fast-track was a have less-fussy way of ratifying standards, when those standards were already widely used.

    If that is correct, then how does the MSOOXML standard qualify? This is a "standard" that is used by absolutely nobody, not even the creator of the standard uses this standard.

    Do I not understand the idea behind the fast-track process?
    • Without a reference implementation, how do you know a standard is valid?
        • by dvice_null (981029) on Monday April 21 2008, @03:22PM (#23150102)
          > Wha? Valid in what respects?

          Valid as in possible to implement. How could a standard not be possible to implement you ask? Well that is simple. E.g. write a program that follows this standard:
          1. It must print "1" on exit
          2. It must print "2" on exit

          As you can see, it would not be possible to implement a program according to that standard. That is why someone would need to write a reference application implementing the standard to notice errors like this. Before the standard is given to the whole world to be implemented.

          It is better that only one has to wonder the errors of the standards, rather than the whole world.
        • You need at least one coded reference implementation or else you'll end up with something in the standard which is difficult/impossible to implement. Especially in a 6,000+ page standard.

          ISO would be well advised to take the method the IETF uses, which is to have two independent teams implement the standard based on the documentation before an RFC can reach a Draft Standard status. I suspect ODF would have only benefited from this process by cutting down its rough edges, while OOXML would have been so cumbersome that it would be simply dropped.

          • by davidkv (302725) on Monday April 21 2008, @03:51PM (#23150576)
            There's a fundamental difference between the IETF and ISO. IETF makes standards of stuff that has been proven to work (or at least be implementable), whereas ISO wants to write specs to tell people what should work.

            A bit like comparing tcp/ip and whatsitsname (x400?). It doesn't really matter how nice something looks on paper if there's no good implementation of it.
    • by EmbeddedJanitor (597831) on Monday April 21 2008, @03:17PM (#23150026)
      Developing a standard without having a working example is very foolish. Stuff that looks cool in a standard often does not work out well in real life (theory != practice). Technically, it is far better to survey the landscape for things that work well and standardise those. There are problems with this approach: the companies that have implemented the winning standards often have a competitive advantage,lobbying can wreck the process and the standards might be burdened with patents (and standards users need to pay royalties to the patent holders).

      For one example where this has worked well, consider vehicle networking. Bosch invented/designed the Control Area Network (CAN). This was standardised by SAE as part of the in vehicle networking specification. ISO then just adopted the SAE stuff and extended it in some new areas. The stuff all works well and is based on proven technology (ie. the technology existed before the standards).

    • And why is that an issue? The job of ISO is to develop the standard in an implementable fashion. Top down.

      That explains why OSI is such a trainwreck compared to IP.

      Not a bottom up

      So why was ODF approved, then? Or ISO C?

      adopt the lowest common denominator of whats already out there

      "Lowest common denominator" is not equivalent to bottom-up design.

    • Not a bottom up, adopt the lowest common denominator of whats already out there
      Sure, the ISO does that a lot, and it's a fine approach. But that takes time, which is why the fast-track process was designed for standards which have already been implemented.
    • by Anonymous Coward on Monday April 21 2008, @03:11PM (#23149914)

      Men in Black? What happened to good old megabytes? The article says 17MB!
      Maybe, but I make this shit look GOOD.
    • by Richard Steiner (1585) <rsteiner@visi.com> on Monday April 21 2008, @03:14PM (#23149964) Homepage Journal
      Shh... The submitter is trying to impose those trendy "base 2" SI prefixes on us in spite of 40+ years of prior art to the contrary. Another case of ivory tower types not being sophisticated enough to grok current industry usage, methinks...

      And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people. Really.

      Sheesh! :-) :-)
      • Re:Stop using MiB (Score:4, Insightful)

        by Yvan256 (722131) on Monday April 21 2008, @03:25PM (#23150174) Homepage Journal
        Just because people have been using SI prefixes to redefine that "kilo means 1024" for 40+ years doesn't mean they're right.

        Also, "octet" is the french word for "byte", so it's also 8-bit. :P
        • Re:Stop using MiB (Score:5, Informative)

          by Richard Steiner (1585) <rsteiner@visi.com> on Monday April 21 2008, @03:46PM (#23150522) Homepage Journal
          Language is typically defined by usage, not the other way around. Unless you're the French, perhaps. :-)

          Remember that "kilo" *did* (and does) mean 1024 in a computing context. Everybody understood that who was involved on a technical level. Everybody. There was no miscommunication in the general case ... except when it came to laypeople who largely didn't understand what was described in the first place. When that happened, we just told them that bigger is better and moved on...

          Your comment about octet confuses and annoys me. Go away. :-)
          • Re:Stop using MiB (Score:5, Insightful)

            by Yvan256 (722131) on Monday April 21 2008, @04:16PM (#23150932) Homepage Journal
            If language is defined by usage, does that mean that copyright infringement now equals theft? ;-)

            You have never seen the confusion of metric users entering the CS field, have you? Ever seen a teacher struggle with the very same point we're having right now?

            As I said, in the rest of the world, kilo means 1000, not 1024. And here you're saying it becomes something else because a particular field has abused it for 40 years?

            Also note that both hard drive manufacturers and digital telecommunications, in a computing context, use 1000 for kilo.

            So your argument becomes "if you're in a computing context BUT not talking about hard drives OR telecommunications, then kilo means 1024"...

            I'd rather use KiB=1024, thank you very much. :-)
          • Except "computing" isn't a clear-cut domain. For example, in my field of compression. Does that count as "computing" (power of 2) or telecommunications (power of 10)? Unclear?

            So, we had a problem where different tools and formats defined it different ways. For a number of years, QuickTime used K=1024, while Windows Media and RealMedia used K=1000. Unless you were using Sorenson Squeeze, which "corrected" its Windows Media and RealMedia values by 1.024 so they matched the QuickTime files sizes!

            Horrible.

            Fortunately, the compression world has standardized on power-of-10 numbers, since that's what the MPEG standards and, well, all the professionals use.

            So, now we have to do with complainsts about the mismatch between encoding a file that should be "4 GB" but doesn't fill up "4 GB" of drive space...

            Sorry, 1024's got to be a KiB. No other feasible solution at this point, unless we decide to stop having computers talk to each other...
      • Re:Stop using MiB (Score:5, Informative)

        by Schraegstrichpunkt (931443) on Monday April 21 2008, @03:53PM (#23150604) Homepage

        40+ years of prior art to the contrary

        "1 MW" has always meant 1,000,000 watts. "9.6 kbps" has always meant 9,600 bits per second. A "500 GB" hard drive still means 500,000,000,000 bytes.

        There are relatively few places where this is screwed up, most of which fall into these categories:

        • RAM or things derived from RAM (e.g. page sizes) where the physical layout imply powers of 2
        • Microsoft

        The latter doesn't even get it consistent. "1.44 MB" floppies are actually 1440 * 1024 bytes.

        Another case of ivory tower types not being sophisticated enough to grok current industry usage, methinks...

        "Current industry usage" is to be ambiguous; 17 MB means "somewhere between 16 and 18 megabytes". The people you call "ivory tower types", including the IEC [www.iec.ch], are trying to use more precise language.

        And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people.

        The term "octet" does exactly the same thing that the binary prefixes do: They indicate more precisely what is being talked about.

        As someone else in this thread said, "just because some people made the mistake, decades ago, of choosing to equal kilo to 1024 doesn't mean they were right."

    • Up with mebibytes! (Score:5, Insightful)

      by JustinOpinion (1246824) on Monday April 21 2008, @03:37PM (#23150372)
      Ha!

      Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).

      Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
      1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
      2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.

      Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)

      For what it's worth, many software products now use the binary prefix [wikipedia.org] notation (e.g. Konqueror).
      • by menace3society (768451) on Monday April 21 2008, @04:36PM (#23151226)
        You're forgetting one thing: people have already adapted to the "old" usage. Dictionaries already exist saying that "mega-" can mean a factor of 1048576 units of computer data. If we change the system now, what will not happen is that everything disambiguates itself, and the hard disk companies stop lying to customers. What will happen is that

        1) Seagate et al. will continue to market their products in terms of GB and TB.
        2) Users will be outraged that their 232GiB hard disk only has 231 or so GiBs of usable space due to formatting, thus leaving the problem unsolved.
        3) People will lose good slang abbreviations like Meg and Gig to Kib, Mib, Gib (or Jib), Tib, and Pib, which not only sound stupid but will also be hard to distinguish in normal conversation.
        4) PHBs will misuse the binary-only versions as if they were base ten, especially if it catches on that "mebi-" is more than "mega-".
        Techie: Hey boss we've got new computers with 100 mebibytes of L1 cache.
        PHB: How much is a mebibytes?
        Techie: 1048576 bytes.
        PHB: Oh, so it's about a million then. Cool.
        Next Day
        PHB: Hey guys, we shipped nearly 2 mebi-units of dongles this quarter.
        Board: What's mebi-units?
        PHB: Well, it's.... Proceed into incorrect explanation that convinced Board of Directors that Boss is "with it"
        5) As a corollary to 4), people will start using those prefixes to refer to everything in a computer. The new chip is 3.2 GiHz, it draw 25 kiW of power, it weighs 21 Kig, etc.
        6) People will always think you are a douchebag.

        And that's not even getting into the confusion caused by having two different sets of prefixes for slightly different multipliers, maybe, during the transition.

        Ask any Brit: How much is a trillion?
    • by Ungrounded Lightning (62228) on Monday April 21 2008, @03:59PM (#23150696) Journal
      The referenced article claims that "the English had imposed GMT on the rest of the world by force when Britain was a big colonial power", which is bogus.

      The English had a major sea trading infrastructure, at a time when improvements in clocks finally made accurate determination of longitude by celestial navigation practical for trans-Atlantic voyages.

      They established an observatory at a major port (Grenwich) to provide a time-hack for ships in port (both military and commercial) to set their clocks, and distributed navigational charts with that observatory's latitude as the basis for the coordinate system (thus simplifying navigational calculations).

      This quickly became the defacto standard on a voluntary basis among commercial shipping, along with the cities that grew up around major seaports (with multiples-of-an-hour offsets to approximate local noon - typically multiples of an hour, sometimes of a half- or quarter-hour), just as the coordinate system became the standard for shoreline mapping in other locations (to simplify navigation near shores by ships using the Grenwich meridian for their ocean charts). Then when railroads drove time standardization it spread from the seaport cities to inland locations.

      Of course the empire's military and government used it internally. But the rest of the world adopted it voluntarily.
      • Re:Curiousity (Score:5, Insightful)

        by MightyMartian (840721) on Monday April 21 2008, @04:17PM (#23150948) Journal
        And that's what's been going on. However, a lot of governments and other organizations are now realizing that leveraging all that data they've been gathering for the better part of two decades on a closed, proprietary standard could lead to disaster. That's the whole point of trying to get an internationally recognized open standard that anyone can implement. ODF is supposed to fulfill the function of a published, implementable office document standard so that, theoritically, in 2100AD, when someone needs to open a document created in 2010, it's in a openly available format that, at the very worst, someone has to reimplement, but at least has clear, concise documentation that isn't thousands of pages long and doesn't include references to proprietary standards.

        The problem with that is that an open document format standard is a direct threat to Microsoft's near-monopoly in the office app department. If anyone can implement a document format that's cross-compatible, then they can easily implement a competitor to Office, and if they decide to undercut Office or (as with OO.org) give the damn thing away, then Microsoft's monopoly is one breath from collapse, and believe me, if Microsoft loses Office, they're in serious, serious trouble within five years. So, OOXML, a "standard" that not even Microsoft can implement, is pushed through the ISO using all sorts of peculiar and ultimately nefarious methods now means Microsoft and its partners can go around telling Small Town, USA that Office saves in an ISO standard, but in reality, the poor bastard in 2100AD who needs to open this file is going to be spending many months trying to figure out this monster, which is in direct violation of the whole notion of an open standard.

        That you have no problems is irrelevant. That's not what the point of an open standard is.
          • by Tony Hoyle (11698) <tmh@nodomain.org> on Monday April 21 2008, @04:26PM (#23151090) Homepage
            Do ya think?

            Governments started demanding documents in open formats.. that threatened their monopoly, so they paid to get their XML schema called one.. now governments go back to buying exclusively Office again... MS Wins.

            End users don't give a shit about open. Governments do but only on paper.. once it comes down to the buying decision all they need is a checkmark on a list. It doesn't actually have to mean anything (cf. Posix compatibility in NT4.. damned near useless but it was a requirement at the time).
              • by Allador (537449) on Monday April 21 2008, @05:49PM (#23152134)
                What is this 'upgrade treadmill' you're referring to?

                Most .gov orgs at least here in the US that I've seen are using everything from Office 97 to Office 2003, but none are using Office 2007.

                That suggests to me that there is no 'forced upgrade' or 'upgrade treadmill'.

                What is it that you're seeing that indicates otherwise to you?
              • by Allador (537449) on Monday April 21 2008, @05:46PM (#23152110)
                I wouldnt agree with your statement.

                The point of the article is that MS Office isnt conformant to the STRICT version. This shouldnt come as a surprise, as the change from the original OOXML to the strict version happened, but no new versions of MS Office have been released. The best thing anyone could reasonably expect of a company is that they would update it in the next Office 2007 service pack.

                Office comes in a 2-4 year release cycle, and the change in ISO from the transitional version to the strict version happened after Office 2007 SP1 was already done.

                How could MS have known in advance the changes that would happen to the standard? They cant see into the future.

                Dont forget here that the STRICT version is NOT representative of what any version of office produces. We already knew that.

                It was an ISO evolution of the submitted version (the transitional one). The vendor would need some time and a release cycle to adapt their products to it.

                What _will_ be interesting is how/when/if MS does conform to the strict format.

                On the other hand, the MS Word conformance to the transitional format seems reasonable. TFA only noted one problem, where an attribute value was using on/off rather than true/false. This is minor and easily fixed and/or recorded as a known issue.