W3C launches Binary XML Packaging 239
Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
Uhhh... (Score:5, Informative)
Re:Uhhh... (Score:1, Informative)
Of course, this is Slashdot.
Thank you! (Score:4, Informative)
Could have been simpler (Score:1, Informative)
I know it's easy, because that's exactly what I did, hacked the gnome libxml library and it worked nicely, was easy to code (yank/paste from CDATA) and best of all it was *fast* without consuming resources like base64 does (tried that too originally).
Critiques (Score:5, Informative)
First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)
Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.
Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.
Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
Re:nothing else to work on? (Score:2, Informative)
RTFA - Re:Binary... XML... Nah! (Score:3, Informative)
Re:nothing else to work on? (Score:2, Informative)
You, and whoever modded you up as "interesting", are an idiot.
This standard is not about representing XML in binary format.
This standard is about representing binary content in an XML document in binary format.
See, previously, if one wanted to include binary data in an XML file one had to Base64 encode it. This takes space and processor time.
This standard moves the bloated Base64 content into a pure binary MIME object.
Maybe you should have RTFA first, eh?
Attachments, Not Binary XML (Score:3, Informative)
Re:base64? (Score:4, Informative)
And you're right, most people don't want to include huge binary stuff in their XML. But sometimes you DO need to combine XML with huge amounts of binary data. So far, the alternatives have been non-standard wrappers (including people doing more or less what this standard does, by using MIME multipart documents) or base64 or some other space wasting encoding inside the XML document, or wrapping everything in an archival format (like OpenOffice does, for instance).
All this does is define a standard way of letting you keep a document and associated raw binary data together, while allowing you to treat it as if it is inlined in the XML if you so choose.
The principles are exactly the same as for sending an HTML e-mail containing images (or other data) as attachments and referring to them with url's of the format "cid:foo" (they refer to the MIME element with the matching "Content-ID: foo" header.
Re:Uhhh... (Score:1, Informative)
With XOP you can define that your packaging format is zip or tar or jar or whatever suits your application.
Re:nothing else to work on? (Score:4, Informative)
Then of course you have the problem that your data wants to be variable length. Then you want to have the deliminator actually in the data, so you have to invent escape codes. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup. Then you want to be sure that any data users enter is of the correct format, so you write a verifier. Then you are basically back at XML again.
XML isn't that great. However take at face value, it saves time and programming errors, the same way I wouldn't expect to have to wite my own doubly-linked-list, or hash table. Neither are complicated, but my language should come with one pre-written which is safer and faster than one I could knock together.
Re:More bloat! (Score:1, Informative)
And n is smaller for binary data; in a best-case situation (XML document consists largely of tags rather than text, tags are 10-20 characters but can be reduced to single bytes in a binary encoding), that would mean that switching to binary data would give you O(n/10) parsing, i.e. an order of magnitude faster.
Ergo, binary XML could theoretically give you a considerable performance enhancement.
I've lost count of the number of binary formats I've seen that in hex dump had vast numbers of zero bytes and were thus highly inefficient. The people who work at a "high level" designing such file formats without checking such simple things are poor programmers.
You demonstrate your ignorance once more.