Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software Microsoft

Office 2007 Fails OOXML Test With 122,000 Errors 430

I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
This discussion has been archived. No new comments can be posted.

Office 2007 Fails OOXML Test With 122,000 Errors

Comments Filter:
  • by EmbeddedJanitor ( 597831 ) on Monday April 21, 2008 @04:17PM (#23150026)
    Developing a standard without having a working example is very foolish. Stuff that looks cool in a standard often does not work out well in real life (theory != practice). Technically, it is far better to survey the landscape for things that work well and standardise those. There are problems with this approach: the companies that have implemented the winning standards often have a competitive advantage,lobbying can wreck the process and the standards might be burdened with patents (and standards users need to pay royalties to the patent holders).

    For one example where this has worked well, consider vehicle networking. Bosch invented/designed the Control Area Network (CAN). This was standardised by SAE as part of the in vehicle networking specification. ISO then just adopted the SAE stuff and extended it in some new areas. The stuff all works well and is based on proven technology (ie. the technology existed before the standards).

  • by bhtooefr ( 649901 ) <[gro.rfeoothb] [ta] [rfeoothb]> on Monday April 21, 2008 @04:24PM (#23150142) Homepage Journal
    Diebold voting machines run Windows CE.
  • Re:Stop using MiB (Score:5, Informative)

    by Richard Steiner ( 1585 ) <rsteiner@visi.com> on Monday April 21, 2008 @04:46PM (#23150522) Homepage Journal
    Language is typically defined by usage, not the other way around. Unless you're the French, perhaps. :-)

    Remember that "kilo" *did* (and does) mean 1024 in a computing context. Everybody understood that who was involved on a technical level. Everybody. There was no miscommunication in the general case ... except when it came to laypeople who largely didn't understand what was described in the first place. When that happened, we just told them that bigger is better and moved on...

    Your comment about octet confuses and annoys me. Go away. :-)
  • Re:Stop using MiB (Score:5, Informative)

    by Schraegstrichpunkt ( 931443 ) on Monday April 21, 2008 @04:53PM (#23150604) Homepage

    40+ years of prior art to the contrary

    "1 MW" has always meant 1,000,000 watts. "9.6 kbps" has always meant 9,600 bits per second. A "500 GB" hard drive still means 500,000,000,000 bytes.

    There are relatively few places where this is screwed up, most of which fall into these categories:

    • RAM or things derived from RAM (e.g. page sizes) where the physical layout imply powers of 2
    • Microsoft

    The latter doesn't even get it consistent. "1.44 MB" floppies are actually 1440 * 1024 bytes.

    Another case of ivory tower types not being sophisticated enough to grok current industry usage, methinks...

    "Current industry usage" is to be ambiguous; 17 MB means "somewhere between 16 and 18 megabytes". The people you call "ivory tower types", including the IEC [www.iec.ch], are trying to use more precise language.

    And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people.

    The term "octet" does exactly the same thing that the binary prefixes do: They indicate more precisely what is being talked about.

    As someone else in this thread said, "just because some people made the mistake, decades ago, of choosing to equal kilo to 1024 doesn't mean they were right."

  • by Ungrounded Lightning ( 62228 ) on Monday April 21, 2008 @04:59PM (#23150696) Journal
    The referenced article claims that "the English had imposed GMT on the rest of the world by force when Britain was a big colonial power", which is bogus.

    The English had a major sea trading infrastructure, at a time when improvements in clocks finally made accurate determination of longitude by celestial navigation practical for trans-Atlantic voyages.

    They established an observatory at a major port (Grenwich) to provide a time-hack for ships in port (both military and commercial) to set their clocks, and distributed navigational charts with that observatory's latitude as the basis for the coordinate system (thus simplifying navigational calculations).

    This quickly became the defacto standard on a voluntary basis among commercial shipping, along with the cities that grew up around major seaports (with multiples-of-an-hour offsets to approximate local noon - typically multiples of an hour, sometimes of a half- or quarter-hour), just as the coordinate system became the standard for shoreline mapping in other locations (to simplify navigation near shores by ships using the Grenwich meridian for their ocean charts). Then when railroads drove time standardization it spread from the seaport cities to inland locations.

    Of course the empire's military and government used it internally. But the rest of the world adopted it voluntarily.
  • Re:Stop using MiB (Score:1, Informative)

    by Anonymous Coward on Monday April 21, 2008 @05:03PM (#23150740)
    Pedantry, but it's Anno Domini ('in [the] year of [the ] Lord'). Anno Domine means 'In [the] year, O Lord'.
  • Re:Stop using MiB (Score:2, Informative)

    by k33l0r ( 808028 ) on Monday April 21, 2008 @05:03PM (#23150742) Homepage Journal

    Uh, to quote Wikipedia (all hail the omniscience of it.):

    In Jewish and Christian tradition, the first day of the seven day week is Sunday.
  • Re:Stop using MiB (Score:3, Informative)

    by BKX ( 5066 ) on Monday April 21, 2008 @05:19PM (#23150970) Journal
    Two mistakes:

    1. It's "Common Era".
    2. Replace Judeo-Christian with Christian. The Jews aligned their calendar with the Romans such that the Sabbath (Saturday) fell on the last day of the week (which, according to the Romans was Saturday.). The Christians decided that they would celebrate their new prophet on the first day of the week (Since most early Christians were originally of the Roman religion rather than Jewish, they equated Jesus with Sol, their Sun god, who was worshipped weekly on Sunday.). Later that celebration merged with the Sabbath concept, but the day of Sunday stuck, only now most people erroneously think of it as the end of the week.
  • by Tony Hoyle ( 11698 ) <tmh@nodomain.org> on Monday April 21, 2008 @05:26PM (#23151090) Homepage
    Do ya think?

    Governments started demanding documents in open formats.. that threatened their monopoly, so they paid to get their XML schema called one.. now governments go back to buying exclusively Office again... MS Wins.

    End users don't give a shit about open. Governments do but only on paper.. once it comes down to the buying decision all they need is a checkmark on a list. It doesn't actually have to mean anything (cf. Posix compatibility in NT4.. damned near useless but it was a requirement at the time).
  • by willyhill ( 965620 ) <`moc.liamg' `ta' `kaw8rp'> on Monday April 21, 2008 @05:49PM (#23151372) Homepage Journal
    Anyone posting on this thread should be aware that "inTheLoo", "gnutoo" and "westbake" are sockpuppet accounts of the person who posted the original troll comment, twitter.

    twitter now has six known accounts on Slashdot, three of which have negative or near-zero karma.

  • At least one other (Score:4, Informative)

    by ribuck ( 943217 ) on Monday April 21, 2008 @06:15PM (#23151668)

    How many other fast-tracked ISO standards have no conforming implementations?

    ISO 25436 describes a version of the Eiffel programming language that has never been fully implemented. The standard contains lots of "blue-sky" "would-be-nice-to-have" sections which are planned to be implemented in the future.

    ECMA gives the document author a lot of control, so things can become ECMA standards that would not become ISO standards. But then the fast track ISO process (for existing ECMA standards) makes it easier for them to become ISO standards.

  • by dominator ( 61418 ) on Monday April 21, 2008 @06:15PM (#23151674) Homepage
    I'm aware of at least one - see AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org].

    Both AbiWord and OpenOffice.org support hidden text. According to the ODF spec, if you say 'text:display="true"', you're supposed to see the text. However, OpenOffice.org uses "true" to mean "hide the text" and "none" to mean "show the text". Or, the inverse of its correct meaning (or what you'd expect from the CSS && specs). This will supposedly be corrected in OO.o 3.0, which is due out soonish. However, this leaves a problem with a bunch of documents that won't render "as intended" (either by the user or by the ODF spec).
  • Re:well... (Score:5, Informative)

    by david_thornley ( 598059 ) on Monday April 21, 2008 @06:41PM (#23152032)

    C++ wasn't fast-tracked.

  • by Allador ( 537449 ) on Monday April 21, 2008 @06:49PM (#23152134)
    What is this 'upgrade treadmill' you're referring to?

    Most .gov orgs at least here in the US that I've seen are using everything from Office 97 to Office 2003, but none are using Office 2007.

    That suggests to me that there is no 'forced upgrade' or 'upgrade treadmill'.

    What is it that you're seeing that indicates otherwise to you?
  • Facts? (Score:4, Informative)

    by argent ( 18001 ) <peter@slashdot . ... t a r o nga.com> on Monday April 21, 2008 @07:09PM (#23152360) Homepage Journal
    Facts? Try this fact: this is not an external standard that Microsoft is supposed to bring their software into line with, this standard was presented by Microsoft as accurately describing what their software actually did. That's the whole reason it was "fast tracked", because it was supposed to be a description of a conforming implementation.

    If it's not, then it shouldn't have been "fast tracked", it should have gone through the same process as current HTML standards... you know, the ones Acid3 are testing...

    That is, the issue is not whether Office conforms to the standard, but that Microsoft lied about its status.
  • by Allador ( 537449 ) on Monday April 21, 2008 @07:43PM (#23152674)
    Easy solution.

    1. Tell them to send you .doc or .pdf.

    or

    2. Install the free, simple, easy to install compat pack [microsoft.com] from MS.

    Nothing you've said here translates into MS forcing you to upgrade. In fact they've given you tools that make it easy and simple to NOT upgrade, and made them free to download.

    This is not to suggest that MS doesnt WANT you to upgrade, of course they do. But many, many businesses and orgs are still running quite successfully on older versions of MS Office.
  • by Erris ( 531066 ) * on Tuesday April 22, 2008 @12:51AM (#23154852) Homepage Journal

    According to TFA, Office 2007 OOXML is very conformant to ISO OOXML Transitional. But its not very comformant to ISO OOXML Strict.

    You must have read a different article. The one I read was quoted in the summary,

    Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema.

    If you consider somewhat less of a miserable failure to be "very conformant" you might think OOXML is a worthy standard. The rest of the world thinks it's train wreck.

  • by Allador ( 537449 ) on Tuesday April 22, 2008 @01:15AM (#23154962)
    Your quote was from the GrokLaw summary, which used some creative editing of the original blog posting to create drama and brouhaha. It's important to go to the actual article that GrokLaw was quoting and get the information from the source.

    Based on the actual root article, the results from the transitional version was nearly perfect, with 84 instances of the same (very minor) class of error.

    From TFA [griffinbrown.co.uk]:

    The TRANSITIONAL conformance model is quite a bit closer to the original Ecma 376. Countries at the BRM (rather more than Ecma, as it happened) were very keen to keep compatibilty with Ecma 376 and to preserve XML structures at which legacy Office features could be targetted. The expectation is therefore that an MS Office 2007 document should be pretty close to valid according to the TRANSITIONAL schema.

    Sure enough (again) the result is as expected: relatively few messages (84) are emitted and they are all of the same type complaining e.g. of the element:

    <m:degHide m:val="on"/>
    since the allowed attribute values for val are now "true", "false", etc. this was one of the many tidying-up exercices performed at the BRM.
    This is a simple (and very common in this sort of thing) error, and not too surprising or worrisome. It's basically a very minor errata.

    This is actually quite impressive, given that the transitional version is not the same as what MS originally proposed, and so there was also little expectation that a document format created in the past would be conformant. It looks like the groups went to some effort to make sure that the transitional version was nearly 100% compatible with what MS Office 2007 actually emits.

    And it shouldnt be surprising to anyone that Office 2007 doesnt conform to the strict version. The strict version was semi-major surgery on what MS proposed. And it was developed long after Office 2007 was released.

    More from TFA:

    Validating against the STRICT model

    The STRICT conformance model is quite a bit different from Ecma 376, essentially because most of that format's most notorious features (non ISO dates, compatibility settings like autospacewotnot, VML, etc.) have been removed. Thus the expectation is that existing Office 2007 documents might be some distance away from being valid according to the strict schemas.

    Sure enough, jing emitted 17MB (around 122,000) of invalidity messages when validating in this scenario. Most of them seem to involve unrecognised attributes or attribute values: I would expect a document which exercised a wider range of features to generate a more diverse set of error message.
    Again, to restate. The strict version of ISO OOXML (what causes all the errors in validation) is NOT based on the current version of MS Office 2007. Therefore there is no reason to expect that Office 2007 docs would be fully compliant. The strict version did not exist when Office 2007 was created, therefore it was not possible for them to be conformant to it.

    To do so would have required them to predict into the future the path that ISO would take.

    Now the interesting question will be whether MS aligns with the strict ISO OOXML in a future Office 2007 Service Pack, or even if they clean up that one minor issue found here (on/off vs. true/false in attributes).

    The strict version breaks alot of backwards compatibility with legacy documents that were created in much older versions of office and forward converted. Given that, I'll be interested to see what MS does with this over the next year or two as their releases catch up to the ISO standards.

  • by Knuckles ( 8964 ) <knuckles@@@dantian...org> on Tuesday April 22, 2008 @04:54AM (#23155850)
    Of course the compat pack only covers features that are shared between the different Office versions. If someone sends you an *.xlsm file with 66,000 rows, you are out of luck even with the compat pack.
  • by GauteL ( 29207 ) on Tuesday April 22, 2008 @05:04AM (#23155888)

    Next, the middle class does not have more money than the top 5%. You are falsely stating this as fact. In fact, the top 1% holds 33% of all wealth and, the top 20% holds 51% of all wealth. The middle and lower class - the 80% of the country - hold just 16% of the wealth.
    I want to preempt anyone complaining about your maths. What you mean is that the "rest of the top 20%, apart from the top 1%" holds 51% of all wealth. Oherwise you'd be very wrong in adding the 33% to the 51% to get 84%.

    But the figures I assume you cite (*), does indeed support that the bottom 80% owns only 16%.

    (*) Edward N. Wolff at New York University (2004) [ucsc.edu].

    In my opinion democracy is an illusion as long as 20% of the people own 84% of the wealth.

    The bottom 80% simply have no way of making informed opinions based on sources that aren't owned by the top 20%.

Intel CPUs are not defective, they just act that way. -- Henry Spencer

Working...