Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software Microsoft

Office 2007 Fails OOXML Test With 122,000 Errors 430

I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
This discussion has been archived. No new comments can be posted.

Office 2007 Fails OOXML Test With 122,000 Errors

Comments Filter:
  • by notaprguy ( 906128 ) * on Monday April 21, 2008 @04:02PM (#23149760) Journal
    the Open Document Format? Just curious.
  • HTML (Score:5, Interesting)

    by WK2 ( 1072560 ) on Monday April 21, 2008 @04:12PM (#23149942) Homepage
    It's not a fast-tracked ISO standard, but HTML and CSS have no conforming implementations. I'm not sure, but links might conform to HTML.
  • I Remember When... (Score:3, Interesting)

    by Nom du Keyboard ( 633989 ) on Monday April 21, 2008 @04:20PM (#23150078)
    I remember when back in the good old days of the IBM EGA (640x350 6-bit color) adapter, when semi-clone cards were made they were all rounded up and tested against the IBM "standard". The IBM card had a couple flaws at the time, two of the bottom scan lines were interchanged, and it interfered with the computer's (IBM PC) ability to Warm Boot. Each card was given a percentage rating of how well it compared to the IBM Standard, and comments on whether or not the bugs in the original were fixed, or kept for compatibility reasons. Also, for less money, all of the clone cards came with the maximum 256KB of memory, while the IBM EGA only had 64KB standard, with the rest able to be added through a daughter card.

    What most made me smile was that the IBM EGA card was included in the matrix of results, showing a rating of 100% compatibility with itself.

  • by dominator ( 61418 ) on Monday April 21, 2008 @04:21PM (#23150084) Homepage
    Speaking as an OOX implementer, this is pretty bad. But it's not quite as bad as the headline makes it seem - the meat of the story [griffinbrown.co.uk] is linked a few blogs deep:

    The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

    Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type.

    <m:degHide m:val="on"/> where "val's" values are supposed to be "true|false".

    [snip]

    Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)


    In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.

    And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org]. This one is a show-stopper.
  • You need at least one coded reference implementation or else you'll end up with something in the standard which is difficult/impossible to implement. Especially in a 6,000+ page standard.

    ISO would be well advised to take the method the IETF uses, which is to have two independent teams implement the standard based on the documentation before an RFC can reach a Draft Standard status. I suspect ODF would have only benefited from this process by cutting down its rough edges, while OOXML would have been so cumbersome that it would be simply dropped.

  • hmmm... 122k errors (Score:3, Interesting)

    by SlshSuxs ( 1089647 ) on Monday April 21, 2008 @04:28PM (#23150222)
    After the first error, are the remaining errors meaningful (i.e. false positives)? I believe most errors after the first are false positives relative to the first error.
  • by Trails ( 629752 ) on Monday April 21, 2008 @05:25PM (#23151060)
    It was also encouraged in some part by the fact that the first clock which worked reliably on a ship, which you refer to, was invented by the Englishman John Harrison. This book [amazon.com], which discusses the inventor and his invention, is quite interesting and worth a read.

    The original mean went through Paris, but shifted to Greenwich as a result of the aforementioned invention, and the naval political pull the British earned as a result.

  • Yes, I think so. (Score:3, Interesting)

    by inTheLoo ( 1255256 ) * on Monday April 21, 2008 @05:44PM (#23151312) Journal

    ODF is the tip of a very big iceberg [theregister.co.uk]. It's an important and public facing tip but it is a small part of both government and business wasting money on the upgrade treadmill and all the intentional waste of M$ Office. It's all downhill from here.

  • well... (Score:4, Interesting)

    by sentientbrendan ( 316150 ) on Monday April 21, 2008 @05:58PM (#23151496)
    >How many other fast-tracked ISO standards have no conforming implementations?

    C++?

    Try out the "export" keyword next time you write any C++.
  • by BearRanger ( 945122 ) on Monday April 21, 2008 @06:03PM (#23151552)
    No. It's intended to sway governments that have passed laws requiring all documents to be created using open standards. This is all about Microsoft being able to sell Office to European countries and (soon) California.
  • by makomk ( 752139 ) on Monday April 21, 2008 @06:25PM (#23151824) Journal
    As far as I know, Open Office produces valid ODF documents (with the odd extension for things like spelling and grammar checker options that are application-dependent), but it doesn't necessarily implement 100% of the latest version of the ODF spec. (In fact, IIRC sometimes other word processors add support for new ODF features before it does.) Since ODF is a committee-developed standard not based on what any one word processor does, this really shouldn't be surprising.
  • by Allador ( 537449 ) on Monday April 21, 2008 @06:46PM (#23152110)
    I wouldnt agree with your statement.

    The point of the article is that MS Office isnt conformant to the STRICT version. This shouldnt come as a surprise, as the change from the original OOXML to the strict version happened, but no new versions of MS Office have been released. The best thing anyone could reasonably expect of a company is that they would update it in the next Office 2007 service pack.

    Office comes in a 2-4 year release cycle, and the change in ISO from the transitional version to the strict version happened after Office 2007 SP1 was already done.

    How could MS have known in advance the changes that would happen to the standard? They cant see into the future.

    Dont forget here that the STRICT version is NOT representative of what any version of office produces. We already knew that.

    It was an ISO evolution of the submitted version (the transitional one). The vendor would need some time and a release cycle to adapt their products to it.

    What _will_ be interesting is how/when/if MS does conform to the strict format.

    On the other hand, the MS Word conformance to the transitional format seems reasonable. TFA only noted one problem, where an attribute value was using on/off rather than true/false. This is minor and easily fixed and/or recorded as a known issue.
  • Re:Stop using MiB (Score:5, Interesting)

    by benwaggoner ( 513209 ) <ben,waggoner&microsoft,com> on Monday April 21, 2008 @07:16PM (#23152420) Homepage
    Except "computing" isn't a clear-cut domain. For example, in my field of compression. Does that count as "computing" (power of 2) or telecommunications (power of 10)? Unclear?

    So, we had a problem where different tools and formats defined it different ways. For a number of years, QuickTime used K=1024, while Windows Media and RealMedia used K=1000. Unless you were using Sorenson Squeeze, which "corrected" its Windows Media and RealMedia values by 1.024 so they matched the QuickTime files sizes!

    Horrible.

    Fortunately, the compression world has standardized on power-of-10 numbers, since that's what the MPEG standards and, well, all the professionals use.

    So, now we have to do with complainsts about the mismatch between encoding a file that should be "4 GB" but doesn't fill up "4 GB" of drive space...

    Sorry, 1024's got to be a KiB. No other feasible solution at this point, unless we decide to stop having computers talk to each other...
  • Re:Stop using MiB (Score:3, Interesting)

    by fbjon ( 692006 ) on Monday April 21, 2008 @09:14PM (#23153370) Homepage Journal
    In fact, do we even need to express filesizes in powers of 2 at all? Is there any reason to continue this practice other than tradition?
  • Re:Really? (Score:3, Interesting)

    by willyhill ( 965620 ) <pr8wak@gm[ ].com ['ail' in gap]> on Tuesday April 22, 2008 @01:41AM (#23155076) Homepage Journal
    For someone with a 1.2M+ UIN and a grand total of five posts, you sure are versed in Slashdot lore.

    You created this account as a clever variation on westlake [slashdot.org], just like your Mactrope [slashdot.org] troll account was intended to be confused with Macthorpe [slashdot.org].

    That makes it six sockpuppet accounts so far. To repeat what I've been asking you [slashdot.org], how long do you figure this can go on?

  • Re:Stop using MiB (Score:3, Interesting)

    by Jesus_666 ( 702802 ) on Tuesday April 22, 2008 @07:58AM (#23156516)
    Where does 1024 follow from a byte having eight bits? 1024 is not a power of eight. It's divisible by eight, but so are more reaonable numbers like 512, 4096 or 32768.

    But still, if we assume a byte to be one unit we can as well use powers of ten.


    Of course you could argue that the tendency of certain things (like RAM chips) to have sizes that are powers of two might imply using a power of two in language usage. But then again, lots of other things don't use power of two (e.g. most storage media and almost everything transmission-related). Who prevails? Do we follow RAM usage and have non-fitting storage and transmission? Do we follow storage/transmission and have non-fitting RAM? Do we follow xkcd and settle on 1012 bytes per kilobyte?

    Or, of course, we just use unambiguous prefixes so people know which base we use. If you don't like "kibibyte" you can lobby IEC to instead adopt "computer science (not storage) kilobyte (CSkB)" and "general standard kilobyte (GSkB)".
  • by QuoteMstr ( 55051 ) <dan.colascione@gmail.com> on Tuesday April 22, 2008 @09:22AM (#23157098)
    Thank you for your fascinating post. I find myself wondering though, why you are a "hardcore libertarian" despite your solid grasp of the economics. Clearly, a very high top tax rate, strong corporate regulation, and an extensive public welfare system lead to an equitable society. What is the downside, and why would you oppose these kinds of regulations?

Be careful when a loop exits to the same place from side and bottom.

Working...