Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming The Internet Technology

W3C launches Binary XML Packaging 239

Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
This discussion has been archived. No new comments can be posted.

W3C launches Binary XML Packaging

Comments Filter:
  • by metalhed77 ( 250273 ) <`andrewvc' `at' `gmail.com'> on Friday January 28, 2005 @12:01AM (#11500275) Homepage
    except SQL isn't very useful when it comes to technologies like RSS say......

    if you're using an XML file in a place where you need a high performance SQL database then you're doing something wrong. If you're using XML as datastorage for some small webapp who cares so long as it's fast enough for that particular application.
  • by FrankHaynes ( 467244 ) on Friday January 28, 2005 @12:05AM (#11500306)
    if you're using an XML file in a place where you need a high performance SQL database then you're doing something wrong. If you're using XML as datastorage for some small webapp who cares so long as it's fast enough for that particular application.


    As you point out, it is the wrong tool for the job, much like using tables to layout HTML pages (as the CSS religionists like to point out).

    My 64 million dollar question is why they put an acronym inside another acronym: XOP stands for XMLOP? WTF??!!

    They REALLY have too much time on their hands!

  • by bryanrieger ( 854070 ) on Friday January 28, 2005 @12:07AM (#11500318)
    This seems like it would be an ideal fit for services such as Flickr as it would allow for image (or other binary media files) to be sent with xml data - in a compressed binary format.
  • by tomhudson ( 43916 ) <barbara.hudson@b ... m ['son' in gap]> on Friday January 28, 2005 @12:13AM (#11500352) Journal
    There are some real technical problems out there... why are people chasing non-problems like XML?
    Because they're hacks more into buzzword bingo and "selling the next big thing"?

    Whatever happened to the virtues of simplicity, like a file containing a header record detailing the field names, and rows containing the data in either fixed-length or delimited form? Damn fast to implement, debug, read from and write to. Parsing? What parsing? Read the first line, split it to get your headers, and read 1 line per record.

    Ideal for data exchange. Easy to manipulate via javascript on the client. Simple to display and manipulate via the DOM (Document Object Model). Not resource-hungry. Handles both text and binary data. Dirt easy on the server.

    I ran a test to compare, and I'm able to select, format, and serve 1000 records this way in less time than 100 records in simple HTML, never mind xml. By doing this, the client can page through, say, 25 records at a time without having to hit the server every few seconds to see the next/prev pages.

  • Comment removed (Score:5, Insightful)

    by account_deleted ( 4530225 ) on Friday January 28, 2005 @12:20AM (#11500387)
    Comment removed based on user account deletion
  • by IHateSlashDot ( 823890 ) on Friday January 28, 2005 @12:22AM (#11500399)
    I can't believe all of the replies making fun of this because they think it's a binary representation of XML. Didn't anyone read the RFC that was referenced in the summary?

    This is simply a way to reference binary data from within an XML document and to have that binary data included in the same payload (using MIME).

    Passing binary data in XML is a big problem. Everybody just invents their own method of doing it (although most are just variations on the theme presented here).

    There is a need for this specicification but it is not ground breaking or even particularly /. newsworth.

  • by a_karbon_devel_005 ( 733886 ) on Friday January 28, 2005 @12:33AM (#11500452)
    Really? You can't see the benefit?

    You mention SVG, but then you fail to see the benefit of reducing the size of, say, a large SVG file in a standards compliant way so that it can be transferred and take up less bandwidth. A good binary standard will DEFINITELY be smaller than the verbosity that is XML. Sure you can compress it, but when you compress a whole bunch of unneeded crap, you still have a whole bunch of unneeded crap... just compressed. If this standard reduces the amount of space it takes to write:
    <someLongTagWithANameLikeThis>1</someLongTagWithAN ameLikeThis>

    ... that will help a lot. People have been clamoring for a STANDARDIZED recommendation for XML for a LONG time.
  • Re:More bloat! (Score:5, Insightful)

    by Anonymous Coward on Friday January 28, 2005 @01:07AM (#11500595)

    So did I. Then I looked at that example [w3.org] and my heart sank. What the hell! 12 lines of bloated crap text turned into 46+ lines of worse bloated crap!

    The examples given in the article haven't included the binary data for berevity. The problem that exists now is that binary data has to be encoded into a form compatible with the charset of the document, which usually means base64. This increases the size of binary documents enourmously (think twice), and also requires CPU cycles to encode it.

    Being able to send the binary data in a seperate MIME payload means it doesn't need to be encoded in this manner which is a big help for any reasonable sized binary resources. It also means they become first class MIME objects and can have associated headers which provides additional benefits.

  • by Anonymous Coward on Friday January 28, 2005 @01:15AM (#11500629)
    id, parentid, tag, text
    1, -, root, yes you can
    2, 1, child, it's simple
    3, 1, child, do it like you would in actual code
    4, 3, grandchild, you don't really think memory has magical trees in it do you?
    5, 4, answer, it doesn't
    6, 1, child, you can create trees in CSV

    ==

    <root>yes you can
    <child>it's simple</child>
    <child>do it like you would in actual code
    <grandchild>you don't really think memory has magical trees in it do you?
    <answer>it doesn't</answer>
    </grandchild>
    </child>
    <child>you can create trees in CSV</child>
    </root>

    Which one is *really* easier to parse? (Not necessarily read.)
  • by caseih ( 160668 ) on Friday January 28, 2005 @01:34AM (#11500715)
    Yes, but this is what ASN1 encoding is for. It's a structured, self-describing encoding scheme that works very well for structured data. What advantages does this binary XML have over ASN1? Both require external descriptions to attach meaning to the data.

    In your case, ASN1 is what you should be using, not XML in the first place.
  • Re:More bloat! (Score:3, Insightful)

    by Laxitive ( 10360 ) on Friday January 28, 2005 @01:50AM (#11500779) Journal
    Uhm, you still need to parse the XML structure.

    Technically, ASCII is binary, too. 'A' is 65, which is 01000001. Binary XML will not do away with parsing. The tags will still be there, the content will still be there. Only the restriction that the tags must be an alphanumeric string will be lifted.

    Making things "binary" doesn't magically remove the burden of parsing. You know the binary executables you run? The system loader loads it.. and parses it, and arranges it in memory the way it needs to be arranged, and tells the cpu "ok, start executing the code located here". Anything with structure needs to be parsed if you want to manipulate and query that structure in any meaningful way.. and XML is all about structure.

    -Laxitive
  • Full Circle (Score:2, Insightful)

    by roman_mir ( 125474 ) on Friday January 28, 2005 @02:08AM (#11500837) Homepage Journal
    The circle is complete. We started with binary format, moved to XML for readability purposes and then switched XML back to binary for speed.

    Obviously someone needs a knock on the head - when you design your application, don't you think about such things as a balance between performance and maintainability first and then implement what is suited better for your specific case? Obviously not! Just a little while ago everyone and their grandmother switched to XML for whatever reason but then they realized: -OMFG, XML processing is processor intensive! I probably shouldn't have handled every single internal type as an XML string that needs parsing and typecasting for every operation. I probably should have used more suitable memory structures for those data structures that are used within the same application on the same computer and not used a gigantic XML just because I can! What to do what to do? OH, I KNOW! Let's change this char based XML into a binary XML, that will make it faster! (It won't make it more human readable, that's for sure.)

    So what's next? A char based XML that wraps around a binary XML for readability? A binary wrapper for a char based XML wrapper to a binary wrapper around a char based XML wrapper for recursive processing?

  • Re:More bloat! (Score:3, Insightful)

    by interiot ( 50685 ) on Friday January 28, 2005 @02:21AM (#11500881) Homepage
    You're talking about two different kinds of parsing. Breaking out opcodes is hugely different from counting '<' and '>' characters 30,000 times in a row, just to find one little bit of information burried in the middle of text, but you just don't know where [slashdot.org].

    Why are databases fast? Indexes. What do all XML databases do? Store XML internally in a way that machines read much faster, but makes it a pain for humans to update. Indexes. So if you have all these computer programs passing around data, each with their own different deserialized structure, why don't we just standard that side of things too? Indexes. Computers MUST have them to work efficiently, but no human updates them by hand. Ergo, one form easy for humans (plaintext XML) and one form quick for computers (indexes), with an easy way to convert back and forth.

  • by nightcrawler77 ( 644839 ) on Friday January 28, 2005 @02:57AM (#11501003)

    XML has become at least two things since its evolution:

    1. an abstract structure consisting of (possibly-nested) elements and their corresponding attributes.
    2. a human-readable representation of that structure

    The interesting part of the story is that #2 came first. Since then, the W3C has recommended the Infoset [w3.org] abstract concept.

    For the developers out there, think of how often you parse the "angle brackets" yourself. Most everyone these days (yes, I know there are exceptions) uses an API which presents elements and attributes in a wire-format-agnostic way.

    As a developer, I would love to have the option to flip a switch in my code to permit Binary XML. If I can read and use the Infoset in exactly the same way, why would I object to the wire format being binary instead of text? My API is the same, but the transport is more compact and efficient.

    Human-readable wire formats are great for debugging during development, but provide no real advantage in production systems (provided there are utilities available to produce human-readable XML from the binary wire format.)

  • Re:More bloat! (Score:3, Insightful)

    by Unordained ( 262962 ) <unordained_slashdotNOSPAM@csmaster.org> on Friday January 28, 2005 @02:58AM (#11501004)
    Databases (typically relational) aren't just fast because of indices; they can also assume more about the structure of the data. A table (relation) is a set of tuples, each one with the same attributes, each one with a single value (however complex it might happen to be, in spite of what OODBMS people think.) When you've got that, you only need to store the meta-structure of the relation once, in the relation header. Then you can assume all sorts of stuff about what's to follow, and you can optimize the hell out of the data (CSV is so much more efficient than XML for transmitting relational data, it's amazing. And it's not even good.)

    On the other hand, when you've got a document in which structure fluctuates wildly from item to item, and you have to look at the item to know what it might be (let alone
    whether or not it's what you're looking for, what you can do with it, etc.) ... things take a little longer. XML just doesn't have shortcuts for "and now some relational data" that can speed that up.

    Then again, when you're sending data, it seems redundant to send indices -- that's the sort of thing each receiving party should recreate as desired. But nobody said XML was about efficiency in reading/writing/searching/manipulating. It was about a standard, do-almost-anything, here's-a-prewritten-library-for-you, read-file-until-you-grok-it format. Efficiency's not in the list.
  • Re:More bloat! (Score:4, Insightful)

    by bit01 ( 644603 ) on Friday January 28, 2005 @03:40AM (#11501113)

    If you can't comprehend that binary is much faster to parse than XML theres nothing I can do.

    Where is your numerical proof that binary is much faster to parse than text? It is amateurish to just assume this is true. Good parsers are damn fast and can operate in O(n) time.

    Of course binary may be faster. I doubt that it will be much faster when compared to a decent parser and when you realise that the binary format should be platform agnostic for word size, endianness and forward and backward compatibility.

    For instance, gzip'ed text files can sometimes be much faster to access than uncompressed binary files because it reduces the amount of file IO. e.g. 64 bits of binary to encode the number 1 rather than 8 bits of text.

    While compression increases the CPU usage because the disk is so much slower and because the CPU might otherwise be idle waiting for the disk it can lead to an overall win. The same may apply to a slow network link. Unless you measure it is difficult to know. I've lost count of the number of binary formats I've seen that in hex dump had vast numbers of zero bytes and were thus highly inefficient. The people who work at a "high level" designing such file formats without checking such simple things are poor programmers. Even when using indexes the saving of a single extra random disk/network access can sometimes justify a huge amount of CPU usage.

    ---

    Don't be a programmer-bureaucrat; someone who substitutes marketing buzzwords and software bloat for verifiable improvements.

  • by DrSkwid ( 118965 ) on Friday January 28, 2005 @04:08AM (#11501198) Journal
    Amazingly, HTML compatibility was easier before it was "standards" this and "standards" that.

    Are you *sure* about that ?

    <blink >
    <marquee >
    <object >
    <bgsound >

    No-one forces you to validate your html (unless you work for me =). Why I come from it's comformance first, compatibility second.

    So, You're Against Innovation? [pantos.org]

    A common misconception is that folks who advocate HTML validation are retro-thinking, "backwater unix geeks" who stubbornly oppose innovation. It's true that many advocates of HTML validation are indeed seasoned computer professionals, who have learned the hard way that portability and compatibility are key elements to ensuring the longevity of any software product (including Web pages).
  • by tomhudson ( 43916 ) <barbara.hudson@b ... m ['son' in gap]> on Friday January 28, 2005 @10:08AM (#11502780) Journal
    All the problems you mention were solved decades ago.
    1. data wants to be variable length
      Not a big deal. You don't necessarily need embedded escape codes (though they work well) - you can also use overflow buckets like databases have used for, say, 30 years
    2. Then you want to have the deliminator actually in the data, so you have to invent escape codes.
      regexes make this easy to implement.
    3. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup
      Not necessarily. Master/detail records handle this problem nicely.
    4. Then you want to be sure that any data users enter is of the correct format, so you write a verifier. Then you are basically back at XML again.
      No - it's a lot easier to write a function to verify that a postal code is formatted correctly when passed a string, than it is to interface wit a general-purpose verifier. Besides, you still have to verify it independently on the server before saving it (I don't want to run an xml parser every time someone clicks submit).
  • by Drog ( 114101 ) on Friday January 28, 2005 @11:04AM (#11503382) Homepage
    There's a lot of XML-bashing going on here from people talking about how XML is just a buzzword and how XML is not necessary. Sure it's a buzzword, and sure it's unnecessary in some situations. But that doesn't make it useless.

    I create data-driven web apps for a living (i.e. data-driven graphics, UI and text via SVG and HTML), and I firmly believe that XML is the way to go for such creations. It offers a hierarchical structure that is excellent for temporarily storing data pulled from a database, which can then be converted to HTML or SVG or some UI markup (XUL, XForms, or your own thing) via XSLT.

    I don't really care that XML is human-readable--I like the fact that because it is extremely well structured, it is therefore easy to create with authoring applications as well as being easy to manipulate real-time by with script (i.e. manipulating its DOM).

    I have long wished for a true binary XML spec to make the transmission and parsing/decoding quicker, and this spec isn't it. But I think one day we'll have it, and that won't mean that we've "come full circle" and therefore XML is useless. It just means that we'll have the best of both worlds--speed plus standardized, hierarchical data structures.

  • Re:More bloat! (Score:3, Insightful)

    by pomakis ( 323200 ) <pomakis@pobox.com> on Friday January 28, 2005 @01:41PM (#11505146) Homepage
    Good parsers are damn fast and can operate in O(n) time.

    Even a horrendously slow XML parser operates in O(n) time.

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...