Slashdot Log In
Office 2007 Fails OOXML Test With 122,000 Errors
Posted by
ScuttleMonkey
on Mon Apr 21, 2008 03:00 PM
from the money-greases-the-wheels dept.
from the money-greases-the-wheels dept.
I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
Related Stories
Submission: Office 2007 Fails OOXML Test With 122,000 Errors by Anonymous Coward
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
What's the Problem? (Score:5, Insightful)
Re:What's the Problem? (Score:5, Funny)
OOXML: "The best Standard money can buy"
Parent
Re:What's the Problem? (Score:5, Funny)
Parent
Re:What's the Problem? (Score:5, Informative)
Parent
Re:What's the Problem? (Score:5, Funny)
>> [Enter]
Are you sure you want to vote today?
(Allow/Deny)
>> Allow
*An anthropomorphic paper clip appears*
"Hi! I'm Clippy, I see you're trying to vote!"
"Let me help you with that! Which of these do you enjoy the most:"
A) Fear Mongering
B) Economy Stunting Taxation
Yeah, I can't wait to vote this year
Parent
Re:What's the Problem? (Score:5, Insightful)
On one hand, a person should indeed be free to live as one sees fit, including spending. But on the other hand, people are stupid, so electing smart people and raising taxes seems like a win to me. That just leaves the "election" part, then. Now what to do about that.....
Parent
Does anyone know if Open Office is compliant with (Score:5, Interesting)
So are most MS Word files (Score:5, Funny)
Parent
ODF wasn't fast-tracked (Score:5, Insightful)
Oh wait! It wasn't!
The fast-track is for de-facto standards which are already so widespread (i.e. supported by multiple vendors) and consistent that there's little point in trying to push a divergent standard out, even though a divergent standard might be better. Something like TCP/IP would be a good example of the sort of thing where the fast track might be appropriate. ODF wasn't fast-tracked, so the standards committee came up with the best standard, irrespective of what might actually be out there in the wild. Now it's up to the vendors to catch up. That's the usual way this is done (i.e. the C++ standard, where most vendors took a few years to catch up, or the C standard where most vendors took a few months to catch up, and MS took a few years).
Of course, if MSOOXML had gone through the regular track, it probably would have taken years to finish (since it's so large, complex, and poorly defined), and MS couldn't afford to wait. So instead they bought themselves a standards committee or twelve.
Parent
Re:Does anyone know if Open Office is compliant wi (Score:5, Interesting)
Parent
Technical Details (Score:5, Insightful)
A heck of a job, Brownie! (Score:5, Funny)
In a blog posting this week, Alex Brown, leader of the International Organization for Standardization (ISO) group in charge of maintaining the Office Open XML (OOXML) standard, revealed that Microsoft Office 2007 documents do not meet the latest specifications of the ISO OOXML draft standard. "Word documents generated by today's version of Microsoft Office 2007 do not conform to ISO/IEC 29500," said Brown in a blog post recounting the process of testing a document against the "strict" and "transitional" schema defined in the standard.
Ahem. Let me be the first to say:
Brownie, you're doing a heck of a job! [wikipedia.org]
Duh (Score:4, Funny)
You're missing the point... (Score:5, Funny)
OOXML is such a Fraud! (Score:5, Insightful)
Impressive (Score:5, Insightful)
While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.
I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?
Re:Impressive (Score:5, Insightful)
Parent
HTML (Score:5, Interesting)
Re:HTML (Score:5, Insightful)
Parent
Re:HTML (Score:4, Insightful)
The current HTML specs are trainwrecks for the same reason. That's what HTML 5 is attempting to fix.
Incidentally, the W3C specs are actually called "Recommendations". There's probably a reason for that.
Parent
122,000 errors sure but... (Score:5, Insightful)
Validates better against the TRANSITIONAL spec (Score:5, Interesting)
In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.
And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359 [abisource.com]/OpenOffice bug 64237 [openoffice.org]. This one is a show-stopper.
What it the idea behind the "fast track" process? (Score:4, Insightful)
If that is correct, then how does the MSOOXML standard qualify? This is a "standard" that is used by absolutely nobody, not even the creator of the standard uses this standard.
Do I not understand the idea behind the fast-track process?
Re:You're missing the point of an ISO standard (Score:5, Insightful)
Parent
Re:You're missing the point of an ISO standard (Score:5, Insightful)
Valid as in possible to implement. How could a standard not be possible to implement you ask? Well that is simple. E.g. write a program that follows this standard:
1. It must print "1" on exit
2. It must print "2" on exit
As you can see, it would not be possible to implement a program according to that standard. That is why someone would need to write a reference application implementing the standard to notice errors like this. Before the standard is given to the whole world to be implemented.
It is better that only one has to wonder the errors of the standards, rather than the whole world.
Parent
Re:You're missing the point of an ISO standard (Score:5, Funny)
1. It must print "1" on exit
2. It must print "2" on exit
print("1");
print("2");
}
What's so hard about that?
Parent
Re:You're missing the point of an ISO standard (Score:5, Funny)
Actually, what am I saying. A M$ program exiting cleanly.... ha ha
Parent
Re:You're missing the point of an ISO standard (Score:5, Interesting)
You need at least one coded reference implementation or else you'll end up with something in the standard which is difficult/impossible to implement. Especially in a 6,000+ page standard.
ISO would be well advised to take the method the IETF uses, which is to have two independent teams implement the standard based on the documentation before an RFC can reach a Draft Standard status. I suspect ODF would have only benefited from this process by cutting down its rough edges, while OOXML would have been so cumbersome that it would be simply dropped.
Parent
Re:You're missing the point of an ISO standard (Score:5, Insightful)
A bit like comparing tcp/ip and whatsitsname (x400?). It doesn't really matter how nice something looks on paper if there's no good implementation of it.
Parent
You're doubly missing the point (Score:5, Informative)
For one example where this has worked well, consider vehicle networking. Bosch invented/designed the Control Area Network (CAN). This was standardised by SAE as part of the in vehicle networking specification. ISO then just adopted the SAE stuff and extended it in some new areas. The stuff all works well and is based on proven technology (ie. the technology existed before the standards).
Parent
Re:You're missing the point of an ISO standard (Score:5, Insightful)
That explains why OSI is such a trainwreck compared to IP.
So why was ODF approved, then? Or ISO C?
"Lowest common denominator" is not equivalent to bottom-up design.
Parent
Re:You're missing the point of an ISO standard (Score:5, Insightful)
Parent
Re:Stop using MiB (Score:4, Funny)
Parent
Re:Stop using MiB (Score:4, Funny)
And don't even get me started on folks who assume a byte is always eight (b) bits. There's a reason folks in the Real World use the term "octet", people. Really.
Sheesh!
Parent
Re:Stop using MiB (Score:4, Insightful)
Also, "octet" is the french word for "byte", so it's also 8-bit.
Parent
Re:Stop using MiB (Score:5, Informative)
Remember that "kilo" *did* (and does) mean 1024 in a computing context. Everybody understood that who was involved on a technical level. Everybody. There was no miscommunication in the general case
Your comment about octet confuses and annoys me. Go away.
Parent
Re:Stop using MiB (Score:5, Insightful)
You have never seen the confusion of metric users entering the CS field, have you? Ever seen a teacher struggle with the very same point we're having right now?
As I said, in the rest of the world, kilo means 1000, not 1024. And here you're saying it becomes something else because a particular field has abused it for 40 years?
Also note that both hard drive manufacturers and digital telecommunications, in a computing context, use 1000 for kilo.
So your argument becomes "if you're in a computing context BUT not talking about hard drives OR telecommunications, then kilo means 1024"...
I'd rather use KiB=1024, thank you very much.
Parent
Re:Stop using MiB (Score:5, Interesting)
So, we had a problem where different tools and formats defined it different ways. For a number of years, QuickTime used K=1024, while Windows Media and RealMedia used K=1000. Unless you were using Sorenson Squeeze, which "corrected" its Windows Media and RealMedia values by 1.024 so they matched the QuickTime files sizes!
Horrible.
Fortunately, the compression world has standardized on power-of-10 numbers, since that's what the MPEG standards and, well, all the professionals use.
So, now we have to do with complainsts about the mismatch between encoding a file that should be "4 GB" but doesn't fill up "4 GB" of drive space...
Sorry, 1024's got to be a KiB. No other feasible solution at this point, unless we decide to stop having computers talk to each other...
Parent
Re:Stop using MiB (Score:5, Informative)
"1 MW" has always meant 1,000,000 watts. "9.6 kbps" has always meant 9,600 bits per second. A "500 GB" hard drive still means 500,000,000,000 bytes.
There are relatively few places where this is screwed up, most of which fall into these categories:
The latter doesn't even get it consistent. "1.44 MB" floppies are actually 1440 * 1024 bytes.
"Current industry usage" is to be ambiguous; 17 MB means "somewhere between 16 and 18 megabytes". The people you call "ivory tower types", including the IEC [www.iec.ch], are trying to use more precise language.
The term "octet" does exactly the same thing that the binary prefixes do: They indicate more precisely what is being talked about.
As someone else in this thread said, "just because some people made the mistake, decades ago, of choosing to equal kilo to 1024 doesn't mean they were right."
Parent
Re:Stop using MiB (Score:4, Insightful)
To be fair, we don't use "hour" to mean "sixty minutes" in every context except computing, where it means "fifty-eight and a half minutes". The rationality lies in the removal of confusion, as much as in the units themselves.
Parent
Re:122,000 errors... (Score:4, Funny)
Parent
Up with mebibytes! (Score:5, Insightful)
Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).
Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.
Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)
For what it's worth, many software products now use the binary prefix [wikipedia.org] notation (e.g. Konqueror).
Parent
Re:Up with mebibytes! (Score:5, Funny)
1) Seagate et al. will continue to market their products in terms of GB and TB.
2) Users will be outraged that their 232GiB hard disk only has 231 or so GiBs of usable space due to formatting, thus leaving the problem unsolved.
3) People will lose good slang abbreviations like Meg and Gig to Kib, Mib, Gib (or Jib), Tib, and Pib, which not only sound stupid but will also be hard to distinguish in normal conversation.
4) PHBs will misuse the binary-only versions as if they were base ten, especially if it catches on that "mebi-" is more than "mega-".
Techie: Hey boss we've got new computers with 100 mebibytes of L1 cache.
PHB: How much is a mebibytes?
Techie: 1048576 bytes.
PHB: Oh, so it's about a million then. Cool.
Next Day
PHB: Hey guys, we shipped nearly 2 mebi-units of dongles this quarter.
Board: What's mebi-units?
PHB: Well, it's.... Proceed into incorrect explanation that convinced Board of Directors that Boss is "with it"
5) As a corollary to 4), people will start using those prefixes to refer to everything in a computer. The new chip is 3.2 GiHz, it draw 25 kiW of power, it weighs 21 Kig, etc.
6) People will always think you are a douchebag.
And that's not even getting into the confusion caused by having two different sets of prefixes for slightly different multipliers, maybe, during the transition.
Ask any Brit: How much is a trillion?
Parent
Referenced article promotes a bogosity. (Score:5, Informative)
The English had a major sea trading infrastructure, at a time when improvements in clocks finally made accurate determination of longitude by celestial navigation practical for trans-Atlantic voyages.
They established an observatory at a major port (Grenwich) to provide a time-hack for ships in port (both military and commercial) to set their clocks, and distributed navigational charts with that observatory's latitude as the basis for the coordinate system (thus simplifying navigational calculations).
This quickly became the defacto standard on a voluntary basis among commercial shipping, along with the cities that grew up around major seaports (with multiples-of-an-hour offsets to approximate local noon - typically multiples of an hour, sometimes of a half- or quarter-hour), just as the coordinate system became the standard for shoreline mapping in other locations (to simplify navigation near shores by ships using the Grenwich meridian for their ocean charts). Then when railroads drove time standardization it spread from the seaport cities to inland locations.
Of course the empire's military and government used it internally. But the rest of the world adopted it voluntarily.
Parent
Re:Curiousity (Score:5, Insightful)
The problem with that is that an open document format standard is a direct threat to Microsoft's near-monopoly in the office app department. If anyone can implement a document format that's cross-compatible, then they can easily implement a competitor to Office, and if they decide to undercut Office or (as with OO.org) give the damn thing away, then Microsoft's monopoly is one breath from collapse, and believe me, if Microsoft loses Office, they're in serious, serious trouble within five years. So, OOXML, a "standard" that not even Microsoft can implement, is pushed through the ISO using all sorts of peculiar and ultimately nefarious methods now means Microsoft and its partners can go around telling Small Town, USA that Office saves in an ISO standard, but in reality, the poor bastard in 2100AD who needs to open this file is going to be spending many months trying to figure out this monster, which is in direct violation of the whole notion of an open standard.
That you have no problems is irrelevant. That's not what the point of an open standard is.
Parent
Re:That is an improvement (Score:5, Informative)
Governments started demanding documents in open formats.. that threatened their monopoly, so they paid to get their XML schema called one.. now governments go back to buying exclusively Office again... MS Wins.
End users don't give a shit about open. Governments do but only on paper.. once it comes down to the buying decision all they need is a checkmark on a list. It doesn't actually have to mean anything (cf. Posix compatibility in NT4.. damned near useless but it was a requirement at the time).
Parent
Re:Yes, I think so. (Score:5, Informative)
Most
That suggests to me that there is no 'forced upgrade' or 'upgrade treadmill'.
What is it that you're seeing that indicates otherwise to you?
Parent
Re:Yes, I think so. (Score:5, Informative)
Parent
Re:That is an improvement (Score:5, Interesting)
The point of the article is that MS Office isnt conformant to the STRICT version. This shouldnt come as a surprise, as the change from the original OOXML to the strict version happened, but no new versions of MS Office have been released. The best thing anyone could reasonably expect of a company is that they would update it in the next Office 2007 service pack.
Office comes in a 2-4 year release cycle, and the change in ISO from the transitional version to the strict version happened after Office 2007 SP1 was already done.
How could MS have known in advance the changes that would happen to the standard? They cant see into the future.
Dont forget here that the STRICT version is NOT representative of what any version of office produces. We already knew that.
It was an ISO evolution of the submitted version (the transitional one). The vendor would need some time and a release cycle to adapt their products to it.
What _will_ be interesting is how/when/if MS does conform to the strict format.
On the other hand, the MS Word conformance to the transitional format seems reasonable. TFA only noted one problem, where an attribute value was using on/off rather than true/false. This is minor and easily fixed and/or recorded as a known issue.
Parent
Re:well... (Score:5, Informative)
C++ wasn't fast-tracked.
Parent