Forgot your password?
typodupeerror

Tim Bray Says RELAX 180

Posted by ScuttleMonkey
from the holy-war-schema-2.7 dept.
twofish writes to tell us that Sun's Tim Bray (co-editor of XML and the XML namespace specifications) has posted a blog entry suggesting RELAX NG be used instead of the W3C XML Schema. From the blog: "W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs."
This discussion has been archived. No new comments can be posted.

Tim Bray Says RELAX

Comments Filter:
  • by Anonymous Coward on Monday December 04, 2006 @11:27PM (#17108948)
    When you want to come.
  • by antonyb (913324) on Monday December 04, 2006 @11:36PM (#17108994)
    My experience with XML Schema is exactly that; hard to write in the first place, hard to maintain, and regular interop problems between different implementations that make the theory of web services a practical nightmare (idrefs are the first example that spring to mind).


    On the other hand, RELAX NG "just works".

    (all IME of course...:)

    ant.

    • Re: (Score:3, Funny)

      by camperdave (969942)
      RELAXiNG works for me too.
      • by caluml (551744)
        RELAXiNG works for me too.
        Your comment is even funnier with your sig: "Wake up, Zeke! The day ain't gonna waste itself."
    • by SimHacker (180785) on Tuesday December 05, 2006 @12:53AM (#17109478) Homepage Journal

      Tim Bray is right, and he couldn't have put it better: W3C XML Schemas (XSD) suck. The reason Relax NG is so much cleaner and more powerful than committee-designed XML Schemas, is that it's based on a sound mathematical foundation (tree regular expressions, or "hedge automata theory"). While XML-Schemas suffer from ad-hoc design, committee-burn, lack of focus, and half-baked attempts to solve too many unrelated problems.

      Here's some interesting stuff from my blog [donhopkins.com] about the design and development of Relax NG [oasis-open.org].

      -Don

      James Clark [oasis-open.org] wrote about maximizing composability:

      First, a little digression. In general, I have made it a design principle in TREX to maximize "composability". It's a little bit hard to describe. The idea is that a language provides a number of different kinds of atomic thing, and a number different ways to compose new things out of other things. Maximizing composability means minimizing restrictions on which ways to compose things can be applied to which kinds of thing. Maximizing composability tends to improve the ratio between functionality on the one hand and simplicity/ease of use/ease of learning on the other.

      Clark [oasis-open.org] describes the derivative algorithm's lazy approach to automaton construction:

      I don't agree that <interleave> makes automation-based implementations impossible; it just means you have to construct automatons lazily. (In fact, you can view the "derivative"-based approach in JTREX as lazily constructing a kind of automaton where states are represented by a canonical representative of the patterns that match the remaining input.)

      The Relax NG derivative algorithm [thaiopensource.com] is implemented in a few hundred elegent declarative functional lines of Haskel [thaiopensource.com], and also in tens of thousands of lines and hundreds of classes of highly abstract complex Java code [thaiopensource.com].

      Clark's Java implementation of Relax NG is called "jing [thaiopensource.com]", which is a Thai word meaning truthful, real, serious, no-nonsense, and ending with "ng".

      Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell [haskell.org] really is. The Java code must explicitly model and simulate many Haskel features like first order functions [wikipedia.org], memoization [wikipedia.org], pattern matching [wikipedia.org], partial evaluation [wikipedia.org], lazy evaluation [wikipedia.org], declarative programming [wikipedia.org], and functional programming [wikipedia.org]. That requires many abstract interfaces, [wikipedia.org], concrete classes [wikipedia.org] and brittle [wikipedia.org] lines of code [wikipedia.org].

      While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise. Haskell is an excellent design language, a vehicle for exploring complex problem spaces, designing and testing ingenious solutions, performing practical experiments, weighin

      • Re: (Score:2, Funny)

        by heinousjay (683506)
        Thanks for the Java flame. I was worried that there wouldn't be any offtopic ranting in this story, but you eased my worries just a few comments into it.
      • by drew (2081)
        That's an awful lot of cutting and pasting just to take a worthless jab at the Java language. While I haven't even looked at the code, and I don't really know all that much about either language, I can guess just from your description that the reason that the Java version is so complex is that the Haskell version was written first, and then somebody tried to write the Java version using exactly the same logic as the Haskell version, and therefore ended up reimplementing half of Haskell in the process. As
        • Re: (Score:3, Informative)

          by John Whitley (6067)

          That's an awful lot of cutting and pasting just to take a worthless jab at the Java language.

          For many problem domains, it often doesn't matter what language you throw up against Haskell -- the Haskell program will often be smaller by one or more orders of magnitude (for a sufficiently rich/interesting program, anyways). The grandparent poster didn't even craft the example in question; Java was just the vicitm-elect of this particular case. I'll observe that even if the Java program there could be made sho

      • Re: (Score:2, Insightful)

        by Erixxxxx (920617)
        From the Haskell implementation:

        "This document does not describe any algorithms for transforming a RELAX NG schema into simplified form, nor for determining whether a RELAX NG schema is correct."

        From the Jing implementation:

        "This version of Jing implements:

        * RELAX NG 1.0 Specification,
        * RELAX NG Compact Syntax, and
        * parts of RELAX NG DTD Compatibility, specifically checking of ID/IDREF/IDREFS."

        also from the Jing implement
      • by Thuktun (221615)

        Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell really is. The Java code must explicitly model and simulate many Haskel features [...]

        To be fair, this may be due to someone more familiar with Haskell trying to port their implementation to Java, rather than a native Java implementation being required to simulate those features. I don't know the history of those two implementations, but the fact that the Java implementation tries to simu

  • I have to agree. (Score:4, Insightful)

    by JanusFury (452699) <kevin.gadd@gmail . c om> on Monday December 04, 2006 @11:37PM (#17109006) Homepage Journal
    Has anyone here ever tried to read an XML schema for anything relatively complex? It's a nightmare. RELAX looks much cleaner and more direct, which I wholeheartedly approve of.
    • Re:I have to agree. (Score:5, Interesting)

      by sien (35268) on Monday December 04, 2006 @11:55PM (#17109112) Homepage
      Yes. I've done it using Relax NG [relaxng.org] and it was easy, simple and readable.

      It also works really, really well with the nXML [thaiopensource.com] mode for emacs.

      Finally, XML schemas in a way that are not verbose, ugly and unreadable. And if you do need one of the older schema languages there are translators from RelaxNG available.

      • Re:I have to agree. (Score:5, Interesting)

        by radtea (464814) on Tuesday December 05, 2006 @12:47AM (#17109446)

        I was at SGML '96 where XML was first announced, and was one of those people who went home and wrote a (non-validating) XML parser over the weekend, based on the draft spec. I've used both DTDs and XML Schemas and can say without question that schemas are actually a bigger pain to work with than DTDs. DTDs were bad enough, but schemas have been a major step backwards, adding complexity without adding the features one actually needs.

        Some years ago I wrote a code generator that used DTDs as the data modelling language. I sold it to the company I was working for at the time and someone I had no control over re-wrote it use schemas because they were "simpler". The result had major bugs and dropped features, not entirely due to schema-related problems, although it is worth noting that the "simplifications" included handling schemas in completely incorrect ways, because if you handled them correctly they could not do the job. I created a new generator from scratch last year and tried to do thing "properly" with schemas. It was essentially impossible, and I wound up creating a custom XML-based language use as input.

        At the time there was no Relax NG standards process, so I stayed clear of it. But it has the blessing of James Clarke too (author of the SP SGML parser and the expat XML parser.) So it is probably worth another very hard look.
        • by LizardKing (5245)

          I was at SGML '96 where XML was first announced

          Was that at some hotel in Swindon, UK? If so then I was there as well, if not then it must have been very shortly after the announcement, as XML (along with XSL) dominated the meeting. When XSL was described by a heavily bearded academic guy, several of the audience members became apoplectic. Apparently they thought DSSSL was a better alternative, something that amused me as all the DSSL tools I was aware of were either incomplete or as fiddly as fuck to wo

          • Was that at some hotel in Swindon, UK?

            No, It was in Boston. The original XML spec was a 20 page booklet. One tidbit: This was at the height of the browser wars and MS was all gung-ho for XML while Netscape wanted nothing to do with it. Many of the SGML gurus were quietly rooting for MS for just that reason.
  • by jhd (7165)
    "W3C XML Schemas (XSD) suck"

    Hey Tim, don't hold back, tell us what you really think.
  • I agree! (Score:3, Funny)

    by Maddog787 (1021593) on Tuesday December 05, 2006 @12:06AM (#17109174)
    I refuse to use XML in any shape way or form no matter what anyone say or does with it!!!
  • I've been picking up Emacs lately, and the xml-mode standardly used (nxml-mode) uses RELAX over XML Schema. I suspect that probably says a lot for RELAX's parseability. I've had just a little bit of experience playing around with Schemas and they seem about as navigable as DTDs, which is to say not very. I haven't tried RELAX though.
  • by SimHacker (180785) on Tuesday December 05, 2006 @12:38AM (#17109388) Homepage Journal

    Relax NG is a great example of the triumph of Design-by-Inspired-Individuals vs. Design-by-Committee.

    In The State of XML [xml.com], Edd Dumbill explains the secret behind the success of Relax NG:

    Incidentally the RELAX NG success can equally well be framed as a case of design-by-inspired-individuals vs. design-by-committee as much as it can be seen as a OASIS vs. W3C thing.

    -Don

  • by iamacat (583406) on Tuesday December 05, 2006 @12:48AM (#17109452)
    With a notation similar to RELAX NG compact syntax. XML has been a killer of readable formats like windows-style ini files. It tries to be readable by both human and machine and succeeds at neither. It's like programming in assembler, because it can be read by a human better than machine code and compiled faster than C.
    • by killjoe (766577) on Tuesday December 05, 2006 @01:39AM (#17109688)
      I believe you are looking for lisp. It's XML cleaned up, simplified and hulkified.
      • by iamacat (583406)
        LISP -> XML alternative == Postscript -> PDF. You don't always want to execute your data, especially with today's abundance of malware.
        • Nonsense. Just stick to READ and don't call EVAL on your data. Or write your own toy EVAL that is restricted to certain known operations.

          You might want to set up your own READ macros as well, to ensure that nobody uses #. maliciously.
      • XML is not a programming language. Lisp is not a markup language. I believe the comparison you were looking for was to s-expressions, which are a lot lighter than XML but don't do nearly as much. That, and nobody outside Lisp/Schemers use them. Hell, the nascent JSON spec already has more traction.
        • XML is not a programming language. Lisp is not a markup language. I believe the comparison you were looking for was to s-expressions, which are a lot lighter than XML but don't do nearly as much.

          Bare S-expressions don't define enough semantics to do what XML does; Lisp goes to far for what XML is used for in being a full programming language (though, given all the XML-related technologies that are widely used to add more and more programming-like features to XML, it may not be "too far"); somewhere between

      • I believe you are looking for lisp. It's XML cleaned up, simplified and hulkified.

        Not Lisp, but S-expressions, which are the basis of Lisp syntax; Lisp is an "application" of S-expressions, the same as XML applications are applications of XML. S-expressions extended with something similar to XMLs encoding declarations could substitute for XML and would be arguably cleaner—certainly, cleaner to Lispers, though I'm not so sure that:

        (foo
        (bar baz (spam: "eggs"))

        is really more readable (rath

  • Speaking of XML, how much smaller would XML files be if they made one minor simple change...

    Add to mean "close the matching element".

    *sigh* I wish I'd been on the committee when they specified the standard.

    • Re: (Score:3, Interesting)

      Damn! I mean, add </>...

      (Argh, the "wait between comments" thing is infuriating...)

      • by nuzak (959558) on Tuesday December 05, 2006 @01:21AM (#17109598) Journal
        That feature is in SGML. In fact it can be even shorter than that, you can express an entire tag and its content with <tag/content goes here/ (even the ending > is optional). SGML even lets you change the angle brackets to anything else you want. You can make any SGML doc look like nothing you or anyone else has ever seen ... all part of the feature set.

        SGML is full of fun little hacks like that, and it was a pain in the ass to read. Omitting the tag name from the end tag makes it impossible to know you have an improperly closed tag til the end of the document, and then you have no idea which tag wasn't closed. And no, that wasn't a theoretical problem either, this became a real problem with giant SGML docs that used all the shortcuts.

        If you really hate XML's verbosity so much, realize that it was designed for easy reading, not easy writing. I whipped up my own xml mode in emacs and made '</' trigger an "electric-slash" behavior that closes the tag automatically. Not rocket science.

        • by tbray (95102)
          Heh, my own tagged-text mode uses just '/' to mean "close whatever needs closing". Works great. (control-/ if yoiu want a real /).
    • by horster (516139)
      totally agree...
    • by Electrum (94638)
      Speaking of XML, how much smaller would XML files be if they made one minor simple change...

      Add to mean "close the matching element".


      You mean like Lisp [defmacro.org] S-expressions [wikipedia.org]?

      <copy>
      <todir>../new/dir</todir>
      <fileset>
      <dir>src_dir</dir>
      </fileset>
      </copy>

      (copy
      (todir "../new/dir")
      (fileset (dir "src_dir")))

    • by Nasarius (593729)
      If file size is a concern, XML compresses easily. The OpenOffice file formats are zipped XML.
      • XML is slow to parse. Adding zip into the mix does nothing to help this.

        XML is basically a bloated way of expressing S-expressions. More compact (i.e. easy to parse, and small to store) versions already exist. What is really needed is a storage format that allows branches to be parsed in parallel. XML is inherently sequential; I have to parse an entire branch to know where the next one starts. It would be nice if I could scan ahead quickly and to the next branch at the same depth and parse this at t

  • XML nightmare (Score:4, Insightful)

    by rgaginol (950787) on Tuesday December 05, 2006 @01:25AM (#17109614)
    If XML Schema was a work colleague they would be Wally from Dilbert - it's not that things are impossible to do with it, it's just that the relative simple things become hard and the complex almost impossible. Due to the fact that almost anything is possible with XML schema with enough work (weeks, months years...) instead of just scrapping it, people keep at it doggedly despite the number of times we get bitten. I'd love to see the community move more completely to RELAX NG if it makes my life easier.
  • by SimHacker (180785) on Tuesday December 05, 2006 @01:33AM (#17109660) Homepage Journal

    From the xml-dev [xml.org] mailing list:

    From: Rick Jelliffe
    To: xml-dev@lists.xml.org
    Date: Wed, 29 Nov 2006 12:46:06 +1100

    Robert Koberg wrote:

    I wonder if the people who think RNG won have "Re-elect Gore" bumper stickers...

    Maybe a better analogy would be that the people who say that XSD is lovely is Mr Bush's "Mission Accomplished!"

    Though of course there are differences between Iraq and XSD. One seems to be about people with their own fiefdom agendas stubbornly miring us in a quagmire, using a grabbag of thin reasons to justify it, denying any evidence that things are not rosy, perpetually promising that things are turning around, and enmeshing all sorts of decent people in a life of horror, difficulty and with no confidence in accomplishing the mission. The other is in the Middle East.

    Just joking...
    Rick

  • Mono has complete support for RelaxNG in the form of the Commons.Xml.Relaxng assembly.

    In addition to RelaxNG, it provides NVDL and RNC support.

    • Re: (Score:3, Funny)

      by SeaFox (739806)
      Mono has complete support for RelaxNG in the form of the Commons.Xml.Relaxng assembly.

      So should the lesson here be to "RELAX if you have MONO"?
  • Since you are simplifying your life by making the schema for web requests simpler, why not go all the way, ditch SOAP, and embrace REST [xfront.com] for XML-over-HTTP communications?
  • by Qbertino (265505) on Tuesday December 05, 2006 @03:57AM (#17110450)
    I call this the Line of View (as in PoV) or 'Horizon' Problem. The general problem is this: In XML we've got a standard that is universal for displaying n-dimensional structures in a basically 1-dimensional enviroment. (For the time being, we're ignoring that XML text ususally goes from left to right and top to bottom, making that something 2D to look at)
    The question now is: where do you draw the line of view? Along which line do I take my knife to cut open my n-dimensional structure to unravel it and flatten it out into a 1-dimesional string of characters? This is a problem that is impossible to solve satisfactory for all possible PoVs or - as I say - Lines of View, or better yet, Horizons to the structure. Will I unravel my DB of books by authors? By issues? By vendors? By publishers or by weight and size? ... At some point you will have to look at in which way you want to handle your stuff and which way you're going to unravel it. This will undoubtly influence on how much XML clutter you will have to construct. With XML it's the same as with databases: It/they will allways be pathetic crutches for us to latch on to the real work. Undispensable, but crutches nontheless.

    What I'm getting to is this: mapping n-dimensional stuff to 1-dimensional structures will allways suck one way or the other. It's just that with XML we all start agreeing upon in which way it's supposed to suck. I don't think that changing the Schema standard (or worse: introducing additional standards) will actually attack this hard problem. I have a strong suspicion that Relax NGs relief is illusional, short term and re-introduces downsides that XML Schema allready has takled with it's pesky and strict nature. For one it would be consistency with the View-Horizon once chosen all the way through the given data-structure. I don't know for shure - go test and find out - but I do know that universal serialization will allways come with downsides and RelaxNG (or any other schema) won't change that.
    • by julesh (229690)
      I think your problem is that you're using XML to perform the job of a relational database.

      Not all tasks can be solved with the same tools.
  • I don't see why XML schemas has to exist. BNF notation serves the exact same purpose: it describes a grammar. A BNF-like derivative is more than enough to define XML schemas. The compact syntax of RELAX NG is just that, and a bright idea.

    It is really annoying when CS has to be discovered all over again. The problem of validating text to a certain format has been solved many decades ago, and BNF and variations of are known from the 60s...
     
    • by portnoy (16520)
      It's because they don't just describe a grammar. They also define a conceptual arrangement for the data, and can be used to express what the common types of the document are. If all they cared about was syntactic grammar, BNF forms would be absolutely fine -- and indeed you can find references to arguments about whether to use DTDs or BNFs [infoloom.com] for some internet XML structures where syntax was the main concern.

      But BNFs by their nature don't have a formal means to differentiate between syntactic rules that defi
  • (damn short subject lines!)

    I agree that RelaxNG is much easier to read, and it will much more completely describe a grammar than will the other standard - and MUCH more completely define it than will a DTD.

    Unfortunately, as far as I can tell there is no way to, within an XML document, state "Use THIS RelaxNG schema file to validate this document", as you can with a DTD. Thus, even if I have placed my RelaxNG schema on my web server, I cannot set things up such that (for example) libXML2 can automatically fe
  • Schema definition by it's nature is tedious but necessary at this point. If you're going to take a standard thats already entrenched and suggest everyone stop and polish the edges from it how about we kill the verbosity of the xml end-tag instead?

    Do we lose anything other than bandwidth use by doing this,

    <tagNameThatCanBeLong>Some Text</>

    instead of this:

    <tagNameThatCanBeLong>Some Text</tagNameThatCanBeLong>

    If the next end tag must belong to the last start tag what's the point of nami
    • Let me also suggest that we replace the <tag>data</tag> construct with tag=data[newline]

      I've been writing configuration files like that for years, and it works great. The only time I really want tags anyhow is when nesting stuff... and if you live in corporate IT land, you'll realize that doesn't actually happen because people who specify XML configuration files don't usually understand that a container is something that could hold things other than whisky and so forth.

"Consequences, Schmonsequences, as long as I'm rich." -- "Ali Baba Bunny" [1957, Chuck Jones]

Working...