Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

W3C launches Binary XML Packaging

Posted by CowboyNeal on Thu Jan 27, 2005 10:52 PM
from the huge-config-files-scared dept.
Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Dancin_Santa (265275) <DancinSanta@gmail.com> on Thursday January 27 2005, @10:55PM (#11500247) Journal
    I was drownding in debt. There was no where to turn. My wife left me, my friends all left me. Even my dog, he left me too. I had to do something.

    That's when I found Binary XML. They were able to help with the debt. They got the creditors off my back and got me back on my feet.

    Thanks Binary XML!

    (I thought this was going to be about a standardization of compressing XML files that got rid of the excess bloat in the markup.)
    • More bloat! (Score:5, Insightful)

      by Hobart (32767) on Thursday January 27 2005, @11:20PM (#11500387) Homepage Journal
      I thought this was going to be about a standardization of compressing XML files that got rid of the excess bloat in the markup.
      So did I. Then I looked at that example [w3.org] and my heart sank. What the hell! 12 lines of bloated crap text turned into 46+ lines of worse bloated crap!

      And they're going to do what, say "gzip it" ? The amount of bandwidth and CPU time this wastes is abysmal.

      Someone needs to stop these people.
      • Re:More bloat! (Score:5, Insightful)

        by Anonymous Coward on Friday January 28 2005, @12:07AM (#11500595)

        So did I. Then I looked at that example [w3.org] and my heart sank. What the hell! 12 lines of bloated crap text turned into 46+ lines of worse bloated crap!

        The examples given in the article haven't included the binary data for berevity. The problem that exists now is that binary data has to be encoded into a form compatible with the charset of the document, which usually means base64. This increases the size of binary documents enourmously (think twice), and also requires CPU cycles to encode it.

        Being able to send the binary data in a seperate MIME payload means it doesn't need to be encoded in this manner which is a big help for any reasonable sized binary resources. It also means they become first class MIME objects and can have associated headers which provides additional benefits.


      • What the hell is wrong with just gzipping it?
        It's just another encoding that happens to be source-language agnostic and provide redundancy elimination.

        You have no problem with the overhead of parsing binary XML, but dictionary lookups and tree rotations involved in decoding a compressed file.. that's out of the question?

        Not to mention the added benefit that a standard compression layer shrinks not just the tags, but the content as well.

        Look, stop thinking of gzip (or bzip, or whatever), as a "compression
        • "You have no problem with the overhead of parsing binary XML, but dictionary lookups and tree rotations involved in decoding a compressed file.. that's out of the question?"

          huh... last time I checked binary was the language the computer natively understood and it didn't need to be parsed or processed in anyway by software.

          Also, it seems to me that he did have a problem with the parsing of the XML part.
          • Re:More bloat! (Score:3, Insightful)

            by Laxitive (10360)
            Uhm, you still need to parse the XML structure.

            Technically, ASCII is binary, too. 'A' is 65, which is 01000001. Binary XML will not do away with parsing. The tags will still be there, the content will still be there. Only the restriction that the tags must be an alphanumeric string will be lifted.

            Making things "binary" doesn't magically remove the burden of parsing. You know the binary executables you run? The system loader loads it.. and parses it, and arranges it in memory the way it needs to be a
            • Re:More bloat! (Score:3, Insightful)

              by interiot (50685)
              You're talking about two different kinds of parsing. Breaking out opcodes is hugely different from counting '<' and '>' characters 30,000 times in a row, just to find one little bit of information burried in the middle of text, but you just don't know where [slashdot.org].

              Why are databases fast? Indexes. What do all XML databases do? Store XML internally in a way that machines read much faster, but makes it a pain for humans to update. Indexes. So if you have all these computer programs passing around data,

              • Re:More bloat! (Score:3, Insightful)

                by Unordained (262962)
                Databases (typically relational) aren't just fast because of indices; they can also assume more about the structure of the data. A table (relation) is a set of tuples, each one with the same attributes, each one with a single value (however complex it might happen to be, in spite of what OODBMS people think.) When you've got that, you only need to store the meta-structure of the relation once, in the relation header. Then you can assume all sorts of stuff about what's to follow, and you can optimize the hel
              • Re:More bloat! (Score:4, Insightful)

                by bit01 (644603) on Friday January 28 2005, @02:40AM (#11501113)

                If you can't comprehend that binary is much faster to parse than XML theres nothing I can do.

                Where is your numerical proof that binary is much faster to parse than text? It is amateurish to just assume this is true. Good parsers are damn fast and can operate in O(n) time.

                Of course binary may be faster. I doubt that it will be much faster when compared to a decent parser and when you realise that the binary format should be platform agnostic for word size, endianness and forward and backward compatibility.

                For instance, gzip'ed text files can sometimes be much faster to access than uncompressed binary files because it reduces the amount of file IO. e.g. 64 bits of binary to encode the number 1 rather than 8 bits of text.

                While compression increases the CPU usage because the disk is so much slower and because the CPU might otherwise be idle waiting for the disk it can lead to an overall win. The same may apply to a slow network link. Unless you measure it is difficult to know. I've lost count of the number of binary formats I've seen that in hex dump had vast numbers of zero bytes and were thus highly inefficient. The people who work at a "high level" designing such file formats without checking such simple things are poor programmers. Even when using indexes the saving of a single extra random disk/network access can sometimes justify a huge amount of CPU usage.

                ---

                Don't be a programmer-bureaucrat; someone who substitutes marketing buzzwords and software bloat for verifiable improvements.

    • My wife left me, my friends all left me. Even my dog, he left me too. I had to do something.

      Your life is a country song. For better results, try playing it backwards.

      I got my wife back, my car back, my house back, and a full bottle of whiskey at the end!
  • by seanadams.com (463190) * on Thursday January 27 2005, @10:59PM (#11500268) Homepage
    The tech industry seems really starved for ideas lately.

    Binary file formats are hard.
    Let's use XML because it's easier.
    No wait... let's represent that XML in a more efficeint binary format.
    Ah yeah that's the ticket - the best of both worlds!

    Now let me just fire up my code-morphing processor which, through emulation ahieves x86 compatibility with "low" power consumption. Never mind it's slower overall and has worse MIPS/mW than an underclocked x86 - look Ma, we *inveted* something!!!!

    There are some real technical problems out there... why are people chasing non-problems like XML?
    • by tomhudson (43916) <<ac.nortoediv> <ta> <nosduh>> on Thursday January 27 2005, @11:13PM (#11500352) Homepage Journal
      There are some real technical problems out there... why are people chasing non-problems like XML?
      Because they're hacks more into buzzword bingo and "selling the next big thing"?

      Whatever happened to the virtues of simplicity, like a file containing a header record detailing the field names, and rows containing the data in either fixed-length or delimited form? Damn fast to implement, debug, read from and write to. Parsing? What parsing? Read the first line, split it to get your headers, and read 1 line per record.

      Ideal for data exchange. Easy to manipulate via javascript on the client. Simple to display and manipulate via the DOM (Document Object Model). Not resource-hungry. Handles both text and binary data. Dirt easy on the server.

      I ran a test to compare, and I'm able to select, format, and serve 1000 records this way in less time than 100 records in simple HTML, never mind xml. By doing this, the client can page through, say, 25 records at a time without having to hit the server every few seconds to see the next/prev pages.

      • What you are talking about is CSV. CSV is great, but it's only any good for table structured data. You can't implement a tree or any arbitrary nested structure like you can in XML.
      • by Chris_Jefferson (581445) on Friday January 28 2005, @06:09AM (#11501841) Homepage
        Whatever happened to the virtues of simplicity, like a file containing a header record detailing the field names, and rows containing the data in either fixed-length or delimited form? Damn fast to implement, debug, read from and write to. Parsing? What parsing? Read the first line, split it to get your headers, and read 1 line per record.

        Then of course you have the problem that your data wants to be variable length. Then you want to have the deliminator actually in the data, so you have to invent escape codes. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup. Then you want to be sure that any data users enter is of the correct format, so you write a verifier. Then you are basically back at XML again.

        XML isn't that great. However take at face value, it saves time and programming errors, the same way I wouldn't expect to have to wite my own doubly-linked-list, or hash table. Neither are complicated, but my language should come with one pre-written which is safer and faster than one I could knock together.

        • All the problems you mention were solved decades ago.
          1. data wants to be variable length
            Not a big deal. You don't necessarily need embedded escape codes (though they work well) - you can also use overflow buckets like databases have used for, say, 30 years
          2. Then you want to have the deliminator actually in the data, so you have to invent escape codes.
            regexes make this easy to implement.
          3. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup
            Not
        • Re:Noscript (Score:3, Interesting)

          by tomhudson (43916)
          I'd tell them to switch to firefox :-)

          It's time to stop thinking of "web sites" and start thinking along the lines of "web apps" - not the old-style form-based "web app", but more along the lines of gmail - heavily client-side-scripted, nice presentation and data manipulation.

          What I see is very few pages (or even just 1 page) as the UI, data exchanged between server and client w/o page refreshes (can be done just w. javascript by sticking the data in iframes with a width and height of 0px, and reading/w

  • by noidentity (188756) on Thursday January 27 2005, @11:05PM (#11500307)
    Here's my binary XML-like file format which gives the best of both text and binary file formats. It's human readable and efficient at the same time! Finally, an end to the text-versus-binary wars. Here's an example file:

    The following data is in binary.
    UH)(&T^( @#t79nui**&tb x9#@ $Y*_@$ji[P{O@JIOHXIOU$HIIU#$hiuoHOP$UJ [etc.]
  • This seems like it would be an ideal fit for services such as Flickr as it would allow for image (or other binary media files) to be sent with xml data - in a compressed binary format.
    • Exactly. It seems like a way to have a "text" file that is easily parsed (all the XML info -- in this case possibly a description, comments, image meta-data, etc.), yet binary info (a jpeg compressed image) fits along-side for when you want it. One file with all the goodies.

      How this is different than simply base64 encoding the image inside a tag is yet to be seen. Perhaps because it's a standard?
      • I would assume because base64 coding of binary data bloats its size (I think up to 40% additional size over the uncoded binary) and takes time to encode/decode. If you were to be able to put a marker in an element that says "binary blob 100 goes here" and include binary blob 100 in some other area that is pure binary then you would have the binary data without encoding overhead.
  • by Anonymous Coward on Thursday January 27 2005, @11:09PM (#11500322)
    As a software developer I find this particularly good.

    While I myself would prefer to write a binary protocol and send the data through a TCP socket I can no longer do that.

    When we land big contracts at work that deal in government and health the key thing they need now is interoperability with others. What does this mean? XML. Whether or not you like it, XML is here to stay. Its what everyone is pushing.

    Therefore we had to adapt and start using it. Not just for B2B, our rich desktop clients now communicate with the server using XML web services.

    The problem we've encountered is sending binary data. Right now we have to encode the data in base64 XML which uses lots of resources. I will give more look at this but it looks particularly good.
    • Yes, but this is what ASN1 encoding is for. It's a structured, self-describing encoding scheme that works very well for structured data. What advantages does this binary XML have over ASN1? Both require external descriptions to attach meaning to the data.

      In your case, ASN1 is what you should be using, not XML in the first place.
  • Uhhh... (Score:5, Informative)

    by Phexro (9814) on Thursday January 27 2005, @11:10PM (#11500327)
    Unless I'm horribly misreading the specification, it appears to be a way to package up XML documents and binary data that they reference into a neat package with MIME - not a way to convert a (text) XML document into a binary one.
  • I'm a bit confused... reading the document, it seems that the difference between XML and SOP is just where the data is:

    XML:

    <mylabel>(text)</mylabel>
    <mydata>(stuff in binary)</mydata>

    XOP:

    <mylabel>(text)</mylabel>
    <mydata>"hey, there's stuff in binary here, id 1!"</mydata>
    ---- MIME ---
    Binary ID 1: (stuff in binary)

    Is this right? So the benefit is just standardizing the binary representation using MIME? But that doesn't make the tags less verbose... so how is

      • Um, I believe that the data was encoded in the same manner as the "optimized" format, it's just that now you have to put it at the end in a MIME encapsulated formatting. Zero doesn't factor into this in the first place. Now, concerns of accidentally closing a tag might play into things, but the likelihood of this actually happening is slim to none as the coincidence of "" in a base 64 stream is going to be an astronomical feat. However, I buy into that as a reason (can't have accidental data loss that wa
        • Re:a bit confused (Score:3, Interesting)

          by MassacrE (763)
          Incorrect.

          XML, being a text format, required proper text encoding. In particular, XML does not allow most of the codepoints (speaking in unicode terms) between 0 and 31 (tab and newline excluded). If you use UTF-8, you cannot use byte values beyond 126 as those are used for forming higher-value unicode characters. In addition, the five main XML markup characters (< > and &) can only be used in some places.

          So, to make a long story short, you base64 everything. For every three bytes you have, yo
  • by IHateSlashDot (823890) on Thursday January 27 2005, @11:22PM (#11500399)
    I can't believe all of the replies making fun of this because they think it's a binary representation of XML. Didn't anyone read the RFC that was referenced in the summary?

    This is simply a way to reference binary data from within an XML document and to have that binary data included in the same payload (using MIME).

    Passing binary data in XML is a big problem. Everybody just invents their own method of doing it (although most are just variations on the theme presented here).

    There is a need for this specicification but it is not ground breaking or even particularly /. newsworth.

  • Critiques (Score:5, Informative)

    by Effugas (2378) * on Thursday January 27 2005, @11:29PM (#11500432) Homepage
    Ummm...it's "OK". This is probably the least ambitious Binary XML spec imaginable. That may actually be good, but I don't know. Lets see what's up here...

    First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)

    Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.

    Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.

    Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
      • Hmm. Threw together some code to play with this.

        #include <stdio.h>

        int main(int argc, char **argv)
        {
        char *mixed = "0.123124 12345";
        double foo;
        int bar;

        int i;

        for(i=0; i<1000000; i++){
        sscanf(mixed, "%f %i", &foo, &bar);
        }
        }

        Results:

        $ time ./bench.exe

        real 0m2.541s
        user 0m2.393s
        sys 0m0.010s

        So we're looking at maybe 787K symbols per second on my machine, at 100% CPU. How does this translate to XML parsers? You're right, this is something I should look into.

  • by WasterDave (20047) <davep&zedkep,com> on Thursday January 27 2005, @11:36PM (#11500464)
    "Remember the recent discussion on Binary XML? Well, this has nothing to do with it, but we are proud to present a standard for larding out XML even more before attaching it to an email."

    I, for one, welcome our new bandwidth eating plaintext overlords.

    Dave
  • For those who didn't RTFA:

    The main application of this XML-referencing-to-binary-attachments is SOAP, and that means web services.

    In other words, you can simplify your God-help-me-XML-handling-and-parsing-code into something maybe 10% simpler. This means leaving the binary stuff OUT OF THE XML PARSER, putting it into the upper levels or processing. Cleaner, faster.

    Also, it helps adaptive compression (gzip) by tightening up the textual data - remember web services are about information transfer, not stora
  • by Camel Pilot (78781) on Friday January 28 2005, @12:27AM (#11500685) Homepage Journal
    I am currently writing a xul client/server application. I am using the xmlhttprequest function. however instead of processing xml data which is very slow, especially when you need to parse a data set several times a second, I started sending data stuctures in javascript code instead. This I believe is what Google Suggest does also.

    In addition the server code is written in perl so for storing status and configuration information, I used serialized perl data strucures processing requirements fell dramatically. With serialized scipt you still have the clear text editing and inspection capabilities without the speed and space issues. for example instead of
    <container>
    <title name="title">
    <item><name>Name1</name>
    <item><name>Name2</name>
    <description>Bla bla</description>
    </container>

    You have:

    {
    title=>"title",
    item=>[ { name=>"Name1" }, { name=>"Name2" } ],
    description=>"Bla bla"
    }
    It seems like serialized script code, in either perl, python, java provides the benefits of xml without the headaches.
  • by phunqe (592716) on Friday January 28 2005, @02:17AM (#11501059)
    Reminds me of a meeting I had a couple of years ago with some representatives for one of the largest market making houses in the US.
    Bascially we were promoting an automated trading system and the first question I get is...

    "Does it use XML?"

    There you have it.
  • by Kopretinka (97408) on Friday January 28 2005, @03:28AM (#11501275) Homepage
    These specs (XOP and MTOM) were created becase Web Services people wanted to be able to add binary attachments to XML messages (in SOAP). Initially the attachment technologies (like SOAP with Attachments [w3.org]) worked by just slapping the binary data alongside the XML message, without a clearly defined processing model for the receiver. Now with XOP attachments are logically in the XML document, but physically transported outside without the bloat of base64 or other XML-safe encodings. It's important to notice that XOP is just an optimization of the situation where binary data is put inside an XML document.
  • by Drog (114101) on Friday January 28 2005, @10:04AM (#11503382) Homepage
    There's a lot of XML-bashing going on here from people talking about how XML is just a buzzword and how XML is not necessary. Sure it's a buzzword, and sure it's unnecessary in some situations. But that doesn't make it useless.

    I create data-driven web apps for a living (i.e. data-driven graphics, UI and text via SVG and HTML), and I firmly believe that XML is the way to go for such creations. It offers a hierarchical structure that is excellent for temporarily storing data pulled from a database, which can then be converted to HTML or SVG or some UI markup (XUL, XForms, or your own thing) via XSLT.

    I don't really care that XML is human-readable--I like the fact that because it is extremely well structured, it is therefore easy to create with authoring applications as well as being easy to manipulate real-time by with script (i.e. manipulating its DOM).

    I have long wished for a true binary XML spec to make the transmission and parsing/decoding quicker, and this spec isn't it. But I think one day we'll have it, and that won't mean that we've "come full circle" and therefore XML is useless. It just means that we'll have the best of both worlds--speed plus standardized, hierarchical data structures.

    • by metalhed77 (250273) <[andrewvc] [at] [gmail.com]> on Thursday January 27 2005, @11:01PM (#11500275) Homepage
      except SQL isn't very useful when it comes to technologies like RSS say......

      if you're using an XML file in a place where you need a high performance SQL database then you're doing something wrong. If you're using XML as datastorage for some small webapp who cares so long as it's fast enough for that particular application.
      • if you're using an XML file in a place where you need a high performance SQL database then you're doing something wrong. If you're using XML as datastorage for some small webapp who cares so long as it's fast enough for that particular application.

        As you point out, it is the wrong tool for the job, much like using tables to layout HTML pages (as the CSS religionists like to point out).

        My 64 million dollar question is why they put an acronym inside another acronym: XOP stands for XMLOP? WTF??!!

        They REA

        • AIM = AOL IM
          GNU = GNU's Not Unix (and many other recursive acronyms)

          To name a few :)
          • While I'm sure there are some people out there who think the idea of an acronym that means itself is just the funniest thing ever, the gnu (the "g" is silent; it's pronounced "nu") is one of two species of African antelopes. They have big heads, long tails and horns. They smell fairly terrible, but they're really very beautiful ...as long as they're downwind.

            Like I say, I'm sure there are some people who think the old "it's an acronym" joke is a real knee-slapper. But it's kind of a shame that the people w
          • it is the wrong tool for the job, much like using tables to layout HTML pages (as the CSS religionists like to point out) CSS religionists like to deny the existence of the 90% browser, whose CSS implementation has too many bugs and deficiencies to make it a complete replacement for some forms of table based layout. Personally, I prefer using a hybrid of tables and CSS on sites that I develop.

            Be fair...(and I am a Firefox guy) NO [ right...NO] browser fully conforms to CSS standards. I am a CSS relig
            • NO [ right...NO] browser fully conforms to CSS standards

              Which kinda makes one question whether having such so-called "standards" is really worth all the trouble.

              Amazingly, HTML compatibility was easier before it was "standards" this and "standards" that. There were certain constructs that only worked in certain browsers, sure, but we didn't have the god-awful mess of supposed-tos and should-nots that we have today.

              It seems to me, from a distant perspective, that the problem with Web standards isn't that
              • by PaulBu (473180) on Friday January 28 2005, @01:38AM (#11500934) Homepage
                I've been thinking about the shortcomings of HTML (and everything else that followed it!) from the position of a computer scientist for YEARS... Those standards ARE shitty, big time.

                Conmtrast this to IEEE standards -- they get developed when a bunch of companies are ready to invest several mega$$ for a chip spin -- and they just want to choose the best course, arguing with each other about technical merit of this or that approach. And in the whole HT|X/ML world there can be (almost) no competition on technical merits, just a bunch of guys arguing if it should be or BAR .

                I wish I'd have the time on my hands and their budgets to actually try something revolutionary. Leke the original WWW, which was NOT designed by a committee...

                Paul B.
              • by DrSkwid (118965) on Friday January 28 2005, @03:08AM (#11501198) Homepage Journal
                Amazingly, HTML compatibility was easier before it was "standards" this and "standards" that.

                Are you *sure* about that ?

                <blink >
                <marquee >
                <object >
                <bgsound >

                No-one forces you to validate your html (unless you work for me =). Why I come from it's comformance first, compatibility second.

                So, You're Against Innovation? [pantos.org]

                A common misconception is that folks who advocate HTML validation are retro-thinking, "backwater unix geeks" who stubbornly oppose innovation. It's true that many advocates of HTML validation are indeed seasoned computer professionals, who have learned the hard way that portability and compatibility are key elements to ensuring the longevity of any software product (including Web pages).
    • Really? You can't see the benefit?

      You mention SVG, but then you fail to see the benefit of reducing the size of, say, a large SVG file in a standards compliant way so that it can be transferred and take up less bandwidth. A good binary standard will DEFINITELY be smaller than the verbosity that is XML. Sure you can compress it, but when you compress a whole bunch of unneeded crap, you still have a whole bunch of unneeded crap... just compressed. If this standard reduces the amount of space it takes to
    • It is not binary XML. It is a method to extract binary data that is embeded in XML (e.g. CDATA) and store it outside the XML, but in the same document. It is NOT a method to reduce the text encoding (overhead) of XML to a binary format.

    • I think everyone that's posted to this thread to this point has missed the point here. This XOP optimization has nothing to do with making XML more compact or anything. It has to do with delaying latency for large payload transfers and allowing the client application to decide if it wants the large binary payload.

      Seriously, you guys need to re-read the article again.

      The problem with XML binary payloads now is that you find out that you have a large chunk of payload too late in the game and can't avoid it
    • Re:base64? (Score:4, Informative)

      by vidarh (309115) <vidar@hokstad.com> on Friday January 28 2005, @05:55AM (#11501772) Homepage Journal
      Duh. Read the spec. Most people who include binary in XML DO base64 encode it. But base64 wastes a lot of space. If you want to include larger amounts of binary data, this standard lets you save space by using a MIME wrapper and referencing a MIME part containing the raw binary data from the document instead of inlining it directly.

      And you're right, most people don't want to include huge binary stuff in their XML. But sometimes you DO need to combine XML with huge amounts of binary data. So far, the alternatives have been non-standard wrappers (including people doing more or less what this standard does, by using MIME multipart documents) or base64 or some other space wasting encoding inside the XML document, or wrapping everything in an archival format (like OpenOffice does, for instance).

      All this does is define a standard way of letting you keep a document and associated raw binary data together, while allowing you to treat it as if it is inlined in the XML if you so choose.

      The principles are exactly the same as for sending an HTML e-mail containing images (or other data) as attachments and referring to them with url's of the format "cid:foo" (they refer to the MIME element with the matching "Content-ID: foo" header.