Jake Lazaroff writes "According to the W3 News Archive, the charter for the XHTML2 Working Group — set to expire on December 31st, 2009 — will not be renewed. What does this mean? XHTML2 will never be a W3C recommendation, so get on the HTML 5 bandwagon now. According to the XHTML FAQ, however, the W3C does 'plan for the XML serialization of HTML to remain compatible with XML.' Looks like with HTML 5, we'll get the best of both worlds."
I know a lot of web developers who dont know the difference between XHTML and HTML, and I hear XHTML as a buzzword all the time. The less confusion the better in my opinion. The HMTL5 spec is quite readable,but if you've not taken a stab at working with HTML5 (it runs all browsers) yet this article should be pretty useful: http://www.phpguru.org/static/html5 [phpguru.org]
The main key is, that, while HTML5 is based on the superior SGML (because of more freedom), XHTML had started to enforce strictness and cleanness. This meant the browser did not have to support a ton of typos, just because the editor was a freakin' lazy ass. Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes. HTML4 browsers eat it all.
It is horrible, and actively supports the dumbing down of people. (Those who want to write websites.) Face it: If they have to, they will learn it. Nobody is too stupid for that. Some just repeat so often that they are stupid, that they actually become stupid. But this can be reversed in exactly the same way. (Ask any psychotherapist about self-fulfilling prophecies.)
Another great feature of XHTML, was its modularity and cross-language features. You could integrate XHTML, SVG, MathML, etc, into one document. Imagine a P tag inside a SVG circe, containing a math formula, and you begin to understand the sheer power of that concept.
Now if they implement HTML5 right, and we get the same cleanness that XHTML 1.1 had (Strict only. No transitional shit.), and they add cross-language abilities too (trough SGML), then I'm all for it! But if not, this could be a huge step backwards, into the web development mess of IE6 times!
Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes. HTML4 browsers eat it all.
So, what you're saying is that the computer works for people instead of the other way around?
So, what you're saying is that the computer works for people instead of the other way around?
No, what it means is that the computer tries to guess what some dyslexic jackass who insists on writing code despite being functionally illiterate and proud of it meant. Since we have no sentient computers yet, and since the markup diarrhea these people produce would be challenging even for a human to decrypt, the task is hopeless, and the websites that result will break in fascinating ways with each new browser version, or whenever whoever visits them has a different screen resolution than the "designer", or the stars are not just right. And whenever that happens, the website gets replaced by a new, equally broken version, and the designer gets paid for delivering said abomination.
And of course whenever the browser fails to extract meaning from the chaos that would horrify even Cthulhu, it's the user who gets blamed: he didn't use the right version of the right browser, running at the right resolution, with the right versions of the right plugins installed. That, or he has Linux installed on another and unrelated machine.
It is horrible, and actively supports the dumbing down of people.
This is where I take issue with your argument. I completely agree that having the page break catastrophically when there was an error would be easier and better for people who design websites professionally (like me), especially in this day and age.
However, I don't believe that it supports the dumbing down of people, I believe it support less knowledgeable users. To use the compiler as an example, when my sister-in-law learned programming, she learned Java; to get to the point where she could do basic things like "hello world," she had to instantiate objects and call functions. My wife learned with php, and she had to type one line, a command and a string. The barrier for entry was much lower and she was rewarded much faster, thereby gaining a greater desire to learn more. Br.
At the time, browsers taking incorrect HTML was the same philosophy: you lower the barrier of entry. When someone writes a lot of web pages, they tend to become more knowledgeable, not less. There are exceptions that make everyone serious about the craft cringe and want to beat their heads against a brick wall, but for the most part skills are progressing. I don't know whether the web landscape would be better or worse right now if they'd required strict HTML for every web site, but I can tell you that a lot of people who were enthusiastic supporters and creators of web pages in the early days wouldn't have gone down that route had the barrier for entry been higher.
HTML 5 is based on the DOM. The HTML4-compatible syntax is defined from scratch, it isn't based on SGML because no web browser actually parses SGML correctly. Most of them don't do HTML4.01 fully for that matter (IE doesn't do simple things like <q>, Moz doesn't support all the weird table-column align stuff...).
It comes in an SGML-inspired format, that is not strictly SGML but matches real word HTML almost exactly. The big difference from HTML4 besides the new tags is that it does not use a DTD, nor does it support the shortag features of SGML, with the exception of the short attribute feature. Thus "<title/</<body/".
(Yes, that has three open brackets, zero close brackets, and 3 slashes) is not valid HTML5, despite being valid HTML4. (At least once you add the DTD).
Now if they implement HTML5 right, and we get the same cleanness that XHTML 1.1 had (Strict only. No transitional shit.), and they add cross-language abilities too (trough SGML), then I'm all for it!
There is an XML mode for HTML5, see HTML vs. XHTML [whatwg.org]. HTML5 even uses the same xmlns="http://www.w3.org/1999/xhtml/" namespace.
HTML5 tries to defines exactly how a browser should handle the billions of unclean documents out there. This will help browser interoperability in the real worldwide web of garbled HTML, and has huge benefits for script parsing HTML because the DOM contents after reading in HTML should be somewhat similar in different browsers.
The main key is, that, while HTML5 is based on the superior SGML (because of more freedom), XHTML had started to enforce strictness and cleanness. This meant the browser did not have to support a ton of typos, just because the editor was a freakin' lazy ass. Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes. HTML4 browsers eat it all.
Totally wrong. One of the most important rules in software is: "be liberal in
They should have never created XHTML. They should have XMLized HTML in the first place. But, XHTML has corrected many wrong thing that HTML developpers used to do. Now, HTML5 should simply pick up the best of both world while still being XML compliant.
When fed invalid XHTML, the browser chokes, which would have gone a long way to eliminating much of the crap code, and crap "web developers" out there. I don't see why it's the browsers business to be THAT lenient, and second guess the developer all the time.
When fed invalid XHTML, the browser chokes, which would have gone a long way to eliminating much of the crap code, and crap "web developers" out there. I don't see why it's the browsers business to be THAT lenient, and second guess the developer all the time.
The problem is, a lot of web pages today are not a single coherent document, they're a bunch of little code fragments concatenated together (template, content, advertising, etc.). When coders get sloppy, this can result in invalid markup. When browsers choke, the content producer may have no idea how to fix the problem - it may not even be their problem.
What HTML5 tries to do is clearly define exactly how broken markup is supposed to be handled, so all browsers can try to "second guess the developer" in exactly the same way.
Kudos to Firefox for reigniting the browser war. In Browser War 2.0, all the major players are striving toward standards compliance, trying to bring their behavior in line with a single unified goal instead of adding their own proprietary features to HTML itself. Five years from now, when IE6 and IE7 are as distant a memory as IE4 and IE5 are now, web development is going to be a lot easier.
Yes, we can dream. How lazy do you have to be not to close your tags, and nest them properly? It's a low barrier, but given people's infinite laziness, they will write their code until it is just not-too-crappy to render in IE. Then call it a day.
A strict standard would also give MS less wiggle room to subvert the standard in their IE implementation of it.
A key feature of html5 is that is specifies the algorithms to use when normalizing poorly formed markup. It doesn't eliminate ambiguous cases, but it gets rid of many of them, meaning that the presentation and DOM should almost always be the same, regardless of the browser.
Getting a web page clean is a hard problem... when you accept user input in something approaching HTML format, like/. does
No it is not. Have php run the user input through tidy. Even if it doesn't display as the user wanted in their browser, at least it displays consistently between browsers which is more important imho. Just go. Install it now in php [php.net]. Seriously, if you are not checking html code coming in from users, something is not right. They could destroy your page with some of those unclosed tags.
Getting a web page clean is a hard problem... when you accept user input in something approaching HTML format, like/. does.
No it isn't. You should not ever, ever, be inserting user-provided HTML directly in to a document. If you do, then well done, you've just created a cross-site scripting vulnerability. And you've let pranksters submit <!-- and hide half of your page.
The correct way of handling user-provided HTML is to parse it with an HTML parser, construct a DOM tree, navigate this stripping out any tags that aren't on your whitelisted set, and then use the result. Most of the time, you want to run it through a very relaxed HTML parser because hand-typed HTML in a web form is likely to be full of errors. When you dump the DOM tree as HTML, it can be XHTML 1, HTML 3.2, or any other dialect you want.
Agreed. XHTML was rather pointless. It didn't add any particularlly interesting features, made pages more difficult to author, and its claim that it made life easier for browser authors was belied by poor support and slow rendering. Making things more "XMLish" with closed tags and quoted attributes was a good idea, but in reality writing XML-conformant CSS/Javascript XML was a pain in the butt and usually not done.
I suppose XHTML might have been useful as part of a document management/transformation system,
Agreed. XHTML was rather pointless. It didn't add any particularlly interesting features, made pages more difficult to author, and its claim that it made life easier for browser authors was belied by poor support and slow rendering. Making things more "XMLish" with closed tags and quoted attributes was a good idea, but in reality writing XML-conformant CSS/Javascript XML was a pain in the butt and usually not done.
I suppose XHTML might have been useful as part of a document management/transformation syst
> The ability to parse a web document using native XML methods is pointless?
In the general sense, yes. Web documents are nearly always served to web browsers, and every single web browser does a faster & better job of parsing HTML over XHTML.
As I mentioned, there certainly are cases when XML can be useful, but the usual situation of serving content to end users isn't one of them.
Lazy? Writing messy HTML takes more effort than writing clean XHTML. If you use a decent editor -- one that can take advantage of the structure and parseability of XML to provide validation, auto-completion, etc. on the fly -- then XHTML practically writes itself.
But, XHTML has corrected many wrong thing that HTML developpers used to do.
No it hasn't! Writing valid code (be it HTML 4.01, XHTML, or HTML 5) and checking it with a validator [w3.org] is what has corrected many wrong things that HTML developers used to do. Valid HTML 4.01 is still just as legitimate as XHTML ever was.
More importantly, when are they going to finish the CSS3 spec?
I love that HTML5 is getting pushed to the forefront and browsers are advancing more than ever, but as a web designer that CSS3 spec needs to get done and pushed on the browser developers because it will be another 2 - 5 years before mass adoption and I'm pretty tired of CSS2.1's limitations.
There is no "CSS3 spec". There is a whole bunch of separate specs all advancing along the REC track separately. They're at various stages of readiness.
For example, CSS Namespaces is in CR ("spec work done, implement it please"). It'll become a REC once there is a test suite and two interoperable implementations and the various paperwork involved in becoming a REC is done.
Selectors Level 3, CSS Color Level 3, CSS Multi-column layout are all in Last Call, with the next step being either CR or PR (PR is "this is done implemented and all; just needs sign-off from the W3C staff"). Same for Media Queries, CSS Basic User Interface, CSS Marquee Level 3, CSS Print Profile, etc.
Was there a particular part of "CSS3" you were interested in seeing specced and implemented?
That's the whole problem. All the experts are working for the browser vendors. The W3C never had any business overriding them. Css3 will never happen (standardized & widely implemented). But of course the relevant bits have long been implemented and now those await standardization. It would be nice if w3c bureaucracy could catch up here.
Basically what's wrong here is that after a agile start in the nineties, w3c turned into yet another standards organ. Essentially, for most of the past ten years they've
XHTML 1 was the XML-ization of the existing HTML 4 stuff.
XHTML 2 was a new HTML version that sought to remove lots of HTML cruft (including non-XML syntax) and add new capabilities. Basically, it was working toward a new HTML version. This effort has died, because browser makers are not behind it - they are all behind HTML 5.
HTML 5 has always had an XML profile called XHTML 5, and that won't go away.
I believe because they wanted to keep it short and simple, and hixie doesn't believe in versioning HTML - having a version-less doctype forces people to keep it backwards-compatible when html6 rolls around. Perhaps someone else who followed the process better can chime in here.
At least with XML you have a simple, well-defined way to convert the XML text to a tree. With HTML 5, there's only "well-defined error handling". Read the
sort-of specification [whatwg.org] for this.
Here's what's supposed to happen for just one of the hard cases. (There are dozens of other cases, some at least as ugly.) Parsing is in "body" mode (normal content) and an end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", "nobr", "s", "small", "strike", "strong", "tt", "u" has been encountered.
Follow these steps:
Let the formatting element be the last element in the list of active formatting elements that:
is between the end of the list and the last scope marker in the list, if any, or the start of the list otherwise, and
has the same tag name as the token.
If there is no such node, or, if that node is also in the stack of open elements but the element is not in scope, then this is a parse error; ignore the token, and abort these steps.
Otherwise, if there is such a node, but that node is not in the stack of open elements, then this is a parse error; remove the element from the list, and abort these steps.
Otherwise, there is a formatting element and that element is in the stack and is in scope. If the element is not the current node, this is a parse error. In any case, proceed with the algorithm as written in the following steps.
Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one.
If there is no furthest block, then the UA must skip the subsequent steps and instead just pop all the nodes from the bottom of the stack of open elements, from the current node up to and including the formatting element, and remove the formatting element from the list of active formatting elements.
Let the common ancestor be the element immediately above the formatting element in the stack of open elements.
Let a bookmark note the position of the formatting element in the list of active formatting elements relative to the elements on either side of it in the list.
Let node and last node be the furthest block. Follow these steps:
Let node be the element immediately above node in the stack of open elements.
If node is not in the list of active formatting elements, then remove node from the stack of open elements and then go back to step 1.
Otherwise, if node is the formatting element, then go to the next step in the overall algorithm.
Otherwise, if last node is the furthest block, then move the aforementioned bookmark to be immediately after the node in the list of active formatting elements.
Create an element for the token for which the element node was created, replace the entry for node in the list of active formatting elements with an entry for the new element, replace the entry for node in the stack of open elements with an entry for the new element, and let node be the new element.
Insert last node into node, first removing it from its previous parent node if any.
Let last node be node.
Return to step 1 of this inner set of steps.
If the common ancestor node is a table, tbody, tfoot, thead, or tr element, then, foster parent whatever last node ended up being in the previous step, first removing it from its previous parent node if any.
Otherwise, append whatever last node ended up being in the previous step to the common ancestor node, first removing it from its previous parent node if any.
Create an element for the token for which the formatting element was created.
Take all of the child nodes of the furthest block and append them to the element created in the last st
Now try to imagine Microsoft, Opera, Mozilla, and Google implementing that compatibly.
I believe they already do, for the most part. HTML5 parsing rules were mostly reverse-engineered from existing browsers' HTML parsing rules, which are more or less consistent across modern browsers, so it's only documenting what most existing browsers already do.
What the spec is defining is a limited subset of an SGML-like language (whose entire parsing rules, if incorporated into HTML, would span for pages) and how to transform it into a DOM. It isn't mandating any new parser rules, it only documents them for the benefit of new implementations of the spec, and to align what minor variations there are between browser parsing models together. Compared to SGML rules (of which HTML 4.01 is technically a subset), this is a great improvement.
I see a lot of debate here about XML versus SGML (or SGML-like) serialization and parsing rules, and plenty of people (rightly) pointing out that there is an XML version of HTML 5.
But what about those features which those of us who already code strictly to spec either way really care about? New elements that were scheduled to debut in XHTML 2.0 such as nl, h and section, the ability to put href and src attributes in any element, etc [xhtml.com]?
Those are the sorts of things which made the debate for me, not some silly XML vs SGML, strict vs lenient debate - either way I'll be writing my code for strict compliance with spec. I'm more concerned with what the features of the spec will be! Less so with how it deals with people out of compliance with it.
So any news on whether X/HTML 5 will be incorporating any of these, now that it's a W3C project and XHTML 2 is dead?
The W3C oversee current net content standard evolution. As it's name imply, it's a consortium. It regroup browser developpers, server developpers, thier application developpers, and many others. It doesn't try to override browser developpers. It oversee them on a technical standard view point. Browser developpers submit improvement for them to be included in the norm. This way it garantee that browsers don't split too much from each others.
XHTML has some great features, notably the ability to embed MathML and SVG in it (great for academic sites) and strict error handling. Unfortunately these features never caught on because Internet Explorer doesn't understand the XHTML MIME type at all. What a shame.
I'm not familiar with the MathML and SVG features. Hopefully there's another (if less elegant) way to do what you want in HTML5.
As for strict error handling, one of the things HTML5 is doing is defining exactly how errors are supposed to be handled. This doesn't mean "strict" error handling in the sense that any page that contains an error will completely fail to load at all, but it does mean we can expect a consistent behavior across multiple browsers even when the HTML doesn't validate.
XHTML 1 is basically HTML4 with the added requirement that the document must also be well-formed XML. This is useful, because it allows you to put any other arbitrary (but properly-namespaced) XML data in the same file. XHTML 2 was meant to dramatically reduce the number of valid tags, clean up HTML even more than HTML 4 did, and split the spec into a large collection of smaller standards. No one really liked it; it was developed in the traditional W3C 'let's create a new standard without thinking too hard about how it will be implemented' way.
HTML 5 is an evolution of HTML 4 backed by people who actually implement these standards and developed in a more incremental way. Unlike HTML 4, HTML 5 doesn't specify the representation. It has SGML and XML bindings. HTML 5 with the SGML binding looks like classic HTML, HTML 5 with the XML binding looks like XHTML. HTML 5 with the XML binding has all of the advantages of XHTML; you can mix it with any other XML data in the same file, and have a unified DOM tree.
"XHTML 1 is basically HTML4 with the added requirement that the document must also be well-formed XML"
It also deprecated a lot of the older tags that were made obsolete by CSS hence encouraging better separation of document structure and presentation. Unfortunately HTML5 undoes this particular good work because it caters to the lowest common denominator (i.e. bad developers who don't "get" separation of concerns and trivially parsable markup).
"HTML 5 is an evolution of HTML 4 backed by people who actually implement these standards and developed in a more incremental way."
The problem is, those people implementing those standards have proven time and time again how incompetent they are at implementing those standards. The state of standards compliance in browsers has for well over a decade been utterly shameful and that really goes for Firefox as much as it does IE. I'd argue it's those who use the standards that know best - people building the biggest sites on the net because they're the ones who need the markup to be able to support large scale application development. Browser vendors need to be able to implement that standard, don't get me wrong, but putting faith in them as the ones who guide the standards has time and time again proven disastrous - look at the HTML5 video tag debacle for perhaps the most recent example.
I'm not disagreeing with you though, XHTML2 wasn't brilliant, but I'm not convinced HTML5 is even any better than XHTML1 which was also an evolution of HTML4 and IMHO a better one. It was designed with those people building enterprise applications for the web in mind rather than joe average, who is more content using the likes of MySpace and Facebook to manage their content for them in the first place.
Of course, HTML5 can do everything XHTML does for the reasons you state, but sadly it seems to encourage bad practice whereas XHTML discouraged it. One final beef I have with HTML5 is that accessibility seems to have been ignored in it's creation, for example there were no real efforts to ensure easy inclusion of subtitles the previously proposed audio/video formats. Again, we really just don't seem to be any further on with web standards than we were at the start of the decade and again, the people to blame are the browser vendors as much as the W3C and it's allowed not particularly ideal or portable proprietary tools such as Flash to gain a lot of ground as a result.
It also deprecated a lot of the older tags that were made obsolete by CSS hence encouraging better separation of document structure and presentation. Unfortunately HTML5 undoes this particular good work because it caters to the lowest common denominator (i.e. bad developers who don't "get" separation of concerns and trivially parsable markup).
I think you read a different version of HTML 5 to me. It still deprecates or removes all of the tags that HTML 4 and XHTML 1 removed, for example removing the center and font tags which were only deprecated by HTML 4.
Where it introduces new tags, it is for expressiveness. A lot of the 'separation of content and presentation' folks seem to think that HTML just needs three tags; span, div, and object. HTML 5 doesn't add more presentation elements, but it does add more tags with well-defined semantics. Examples of this include section and nav tags. These don't specify anything about the presentation, they just indicate that a part of the document is a section, or a set of navigation commands. A mobile browser, for example, might have an option to hide and show the nav section to conserve screen space.
Take a look at the current draft of HTML 5 [w3.org]. You'd be hard-pressed to find anything presentation-related. Presentation still goes in the stylesheets, HTML 5 just adds tags for common things so you don't need quite so many class attributes.
Take a look at the current draft of HTML 5. You'd be hard-pressed to find anything presentation-related.
I think this attitude is more a case of wishful thinking and sometimes outright denial rather than than reality. Take a look at some of these, for instance:
<br> and <pre> - explicitly control line-breaking (<pre> has ASCII art as a use case!).
<ul> and <ol> - the order of HTML elements already forms part of their semantics. The real reason both element types are kept around is because one is numbered and one is not.
<small> - nuff said.
<i> - I'll quote this, because it's utterly laughable: "The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized." - or, in other words, "let's list every case we can think of where using italics is the typographical convention so we can pretend it isn't an element type that means use italics." Are there any real shared semantics between a ship name and a thought? No, they just want to use italics.
Count me in as one of the "give me more expressiveness" crowd. Span, div, and object are good enough for most purposes, but have their own problems. Writing [X]HTML/CSS pages for all media--conventional browser, print, and screen readers--is a bear. Having sane defaults for tags like STRONG and EM--that is, a certain inflection for the screen reader, and a decent-looking print default--saves developers a lot of time.
Unlike HTML 4, HTML 5 doesn't specify the representation. It has SGML and XML bindings. HTML 5 with the SGML binding looks like classic HTML
No, HTML 5 has an XML serialisation and a tag-soup-compatible serialisation that, yes, looks like classic HTML, but in fact isn't based on SGML. It's based on the way popular browsers parse HTML rather than what they are supposed to do according to previous HTML specifications. One consequence of this is that obscure parts of previous versions of HTML that are not well-supported by popular browsers are not supported by HTML 5 - it's not completely backwards-compatible with previous versions of HTML.
Does this mean that someday/. will actually render properly in a browser? I used IE, FF, Opera, and Safari all regularly./. does not reneder 100% in any of them:(
How do you know that one of them isn't proper and the rest are merely deviations from proper?
Or, more accurately, how do you know what it is supposed to look like if you say they are all wrong?
by Anonymous Coward
on Friday July 03, @12:13PM (#28573133)
Simple: view source --> use your "The Matrix" vision to render everthing in your brain. Every serious developer knows how to run code in his brain, that's how I run all my unit tests!
Good (Score:5, Informative)
Re: (Score:3, Funny)
I know a lot of web developers who dont know the difference between XHTML and HTML, and I hear XHTML as a buzzword all the time.
Duh. Everyone knows that the "X" is for Xtreme! It's Xtreme HTML, right?
Re:Good (Score:5, Insightful)
The main key is, that, while HTML5 is based on the superior SGML (because of more freedom), XHTML had started to enforce strictness and cleanness. This meant the browser did not have to support a ton of typos, just because the editor was a freakin' lazy ass. Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes. HTML4 browsers eat it all.
It is horrible, and actively supports the dumbing down of people. (Those who want to write websites.)
Face it: If they have to, they will learn it. Nobody is too stupid for that. Some just repeat so often that they are stupid, that they actually become stupid. But this can be reversed in exactly the same way. (Ask any psychotherapist about self-fulfilling prophecies.)
Another great feature of XHTML, was its modularity and cross-language features.
You could integrate XHTML, SVG, MathML, etc, into one document. Imagine a P tag inside a SVG circe, containing a math formula, and you begin to understand the sheer power of that concept.
Now if they implement HTML5 right, and we get the same cleanness that XHTML 1.1 had (Strict only. No transitional shit.), and they add cross-language abilities too (trough SGML), then I'm all for it!
But if not, this could be a huge step backwards, into the web development mess of IE6 times!
Parent
Re:Good (Score:5, Interesting)
So, what you're saying is that the computer works for people instead of the other way around?
Parent
Re:Good (Score:4, Insightful)
But by working that way, the computer encourages people to create unreadable messes, that other developers can't easily understand.
Simpler parsing rules are more a boon for the people than for the computers. Think about it.
Parent
Re:Good (Score:5, Insightful)
No, what it means is that the computer tries to guess what some dyslexic jackass who insists on writing code despite being functionally illiterate and proud of it meant. Since we have no sentient computers yet, and since the markup diarrhea these people produce would be challenging even for a human to decrypt, the task is hopeless, and the websites that result will break in fascinating ways with each new browser version, or whenever whoever visits them has a different screen resolution than the "designer", or the stars are not just right. And whenever that happens, the website gets replaced by a new, equally broken version, and the designer gets paid for delivering said abomination.
And of course whenever the browser fails to extract meaning from the chaos that would horrify even Cthulhu, it's the user who gets blamed: he didn't use the right version of the right browser, running at the right resolution, with the right versions of the right plugins installed. That, or he has Linux installed on another and unrelated machine.
Parent
Re:Good (Score:4, Insightful)
It is horrible, and actively supports the dumbing down of people.
This is where I take issue with your argument. I completely agree that having the page break catastrophically when there was an error would be easier and better for people who design websites professionally (like me), especially in this day and age.
However, I don't believe that it supports the dumbing down of people, I believe it support less knowledgeable users. To use the compiler as an example, when my sister-in-law learned programming, she learned Java; to get to the point where she could do basic things like "hello world," she had to instantiate objects and call functions. My wife learned with php, and she had to type one line, a command and a string. The barrier for entry was much lower and she was rewarded much faster, thereby gaining a greater desire to learn more.
Br. At the time, browsers taking incorrect HTML was the same philosophy: you lower the barrier of entry. When someone writes a lot of web pages, they tend to become more knowledgeable, not less. There are exceptions that make everyone serious about the craft cringe and want to beat their heads against a brick wall, but for the most part skills are progressing. I don't know whether the web landscape would be better or worse right now if they'd required strict HTML for every web site, but I can tell you that a lot of people who were enthusiastic supporters and creators of web pages in the early days wouldn't have gone down that route had the barrier for entry been higher.
Parent
Re: (Score:3, Insightful)
Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes.
Must... resist... must... resist...PHP! Bloody PHP! Bloody E_NOTICE!
Oh dear, there goes my karma...
Re: (Score:3, Insightful)
Must... resist... must... resist...PHP! Bloody PHP! Bloody E_NOTICE!
Oh dear, there goes my karma...
In attempt to preserve your "karma", I give you asolution:
.': '.$message.' at '.$filename.' ('.$line.').'); }
;).
function errorHandler($code, $message, $filename, $line) { die($code
set_error_handler('errorHandler');
You know, less talk, more action
P.S.: You could also throw an exception, which is the most convenient option, as you can handle the errors in some cases.
Re: (Score:3, Informative)
HTML 5 is based on the DOM. The HTML4-compatible syntax is defined from scratch, it isn't based on SGML because no web browser actually parses SGML correctly. Most of them don't do HTML4.01 fully for that matter (IE doesn't do simple things like <q>, Moz doesn't support all the weird table-column align stuff...).
Re: (Score:3, Informative)
HTML5 comes in two forms.
It comes in an SGML-inspired format, that is not strictly SGML but matches real word HTML almost exactly. The big difference from HTML4 besides the new tags is that it does not use a DTD, nor does it support the shortag features of SGML, with the exception of the short attribute feature. Thus "<title/</<body/".
(Yes, that has three open brackets, zero close brackets, and 3 slashes) is not valid HTML5, despite being valid HTML4. (At least once you add the DTD).
There is also a
XML mode and separating author/browser concerns (Score:3, Informative)
Now if they implement HTML5 right, and we get the same cleanness that XHTML 1.1 had (Strict only. No transitional shit.), and they add cross-language abilities too (trough SGML), then I'm all for it!
Re: (Score:3, Interesting)
The main key is, that, while HTML5 is based on the superior SGML (because of more freedom), XHTML had started to enforce strictness and cleanness. This meant the browser did not have to support a ton of typos, just because the editor was a freakin' lazy ass. Imagine a compiler that would eat any typo. Missing brackets, braces, semicolons, object-function separators, completely meaningless semantic messes. HTML4 browsers eat it all.
Totally wrong. One of the most important rules in software is: "be liberal in
XHTML merged (Score:3, Interesting)
Re:XHTML merged (Score:5, Informative)
They should have XMLized HTML in the first place.
They did. It's called XHTML.
Unless you mean XML-ise HTML 3.2 or earlier, but I believe XML didn't exist back then.
Parent
Re: (Score:3, Interesting)
They should have XMLized HTML in the first place.
They did. It's called XHTML.
And now it's failed. What does that tell us?
Re:XHTML merged (Score:4, Informative)
That most web page authors are too incompetent to even follow XML's validity rules, let alone HTML's?
Parent
Re: (Score:3, Insightful)
XHTML would have been great standard.
When fed invalid XHTML, the browser chokes, which would have gone a long way to eliminating much of the crap code, and crap "web developers" out there.
I don't see why it's the browsers business to be THAT lenient, and second guess the developer all the time.
Re:XHTML merged (Score:5, Insightful)
XHTML would have been great standard.
When fed invalid XHTML, the browser chokes, which would have gone a long way to eliminating much of the crap code, and crap "web developers" out there.
I don't see why it's the browsers business to be THAT lenient, and second guess the developer all the time.
The problem is, a lot of web pages today are not a single coherent document, they're a bunch of little code fragments concatenated together (template, content, advertising, etc.). When coders get sloppy, this can result in invalid markup. When browsers choke, the content producer may have no idea how to fix the problem - it may not even be their problem.
What HTML5 tries to do is clearly define exactly how broken markup is supposed to be handled, so all browsers can try to "second guess the developer" in exactly the same way.
Kudos to Firefox for reigniting the browser war. In Browser War 2.0, all the major players are striving toward standards compliance, trying to bring their behavior in line with a single unified goal instead of adding their own proprietary features to HTML itself. Five years from now, when IE6 and IE7 are as distant a memory as IE4 and IE5 are now, web development is going to be a lot easier.
Parent
Re: (Score:3, Interesting)
Yes, and unfortunately, as far as I can tell, this is how support for the two leading hi-def formats is right now:
Chrome 2: Supports both Theora and H.264
Safari 4: Supports H.264
Firefox 3.5: Supports Theora
IE 8: Supports neither.
Congratulations, between the 4 largest browsers we've managed to have all four possibilities you can have with two booleans (2^2)!
Re: (Score:3, Informative)
Yes, we can dream.
How lazy do you have to be not to close your tags, and nest them properly? It's a low barrier, but given people's infinite laziness, they will write their code until it is just not-too-crappy to render in IE. Then call it a day.
A strict standard would also give MS less wiggle room to subvert the standard in their IE implementation of it.
Re: (Score:3, Informative)
A key feature of html5 is that is specifies the algorithms to use when normalizing poorly formed markup. It doesn't eliminate ambiguous cases, but it gets rid of many of them, meaning that the presentation and DOM should almost always be the same, regardless of the browser.
Re:XHTML merged (Score:4, Informative)
Getting a web page clean is a hard problem ... when you accept user input in something approaching HTML format, like /. does
No it is not. Have php run the user input through tidy. Even if it doesn't display as the user wanted in their browser, at least it displays consistently between browsers which is more important imho. Just go. Install it now in php [php.net]. Seriously, if you are not checking html code coming in from users, something is not right. They could destroy your page with some of those unclosed tags.
Parent
Re:XHTML merged (Score:5, Insightful)
Getting a web page clean is a hard problem ... when you accept user input in something approaching HTML format, like /. does.
No it isn't. You should not ever, ever, be inserting user-provided HTML directly in to a document. If you do, then well done, you've just created a cross-site scripting vulnerability. And you've let pranksters submit <!-- and hide half of your page.
The correct way of handling user-provided HTML is to parse it with an HTML parser, construct a DOM tree, navigate this stripping out any tags that aren't on your whitelisted set, and then use the result. Most of the time, you want to run it through a very relaxed HTML parser because hand-typed HTML in a web form is likely to be full of errors. When you dump the DOM tree as HTML, it can be XHTML 1, HTML 3.2, or any other dialect you want.
Parent
Re: (Score:3, Insightful)
Good compilers still choke on syntax error, you idiot.
And if you downloaded a program and it choked when you tried to use it, you'd download a different one, now wouldn't you?
Your plan only works if all of the browser vendors promise to obey it all at the same time. The first vendor to break the promise wins.
Re: (Score:3, Interesting)
Agreed. XHTML was rather pointless. It didn't add any particularlly interesting features, made pages more difficult to author, and its claim that it made life easier for browser authors was belied by poor support and slow rendering. Making things more "XMLish" with closed tags and quoted attributes was a good idea, but in reality writing XML-conformant CSS/Javascript XML was a pain in the butt and usually not done.
I suppose XHTML might have been useful as part of a document management/transformation system,
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
> The ability to parse a web document using native XML methods is pointless?
In the general sense, yes. Web documents are nearly always served to web browsers, and every single web browser does a faster & better job of parsing HTML over XHTML.
As I mentioned, there certainly are cases when XML can be useful, but the usual situation of serving content to end users isn't one of them.
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
Lazy? Writing messy HTML takes more effort than writing clean XHTML. If you use a decent editor -- one that can take advantage of the structure and parseability of XML to provide validation, auto-completion, etc. on the fly -- then XHTML practically writes itself.
Re:XHTML merged (Score:4, Insightful)
Bullshit. Every person on Earth should be allowed, and encouraged, to create web pages. I hate this elitist crap.
Parent
Re:XHTML merged (Score:5, Insightful)
But, XHTML has corrected many wrong thing that HTML developpers used to do.
No it hasn't! Writing valid code (be it HTML 4.01, XHTML, or HTML 5) and checking it with a validator [w3.org] is what has corrected many wrong things that HTML developers used to do. Valid HTML 4.01 is still just as legitimate as XHTML ever was.
Parent
CSS 3 spec (Score:5, Insightful)
More importantly, when are they going to finish the CSS3 spec?
I love that HTML5 is getting pushed to the forefront and browsers are advancing more than ever, but as a web designer that CSS3 spec needs to get done and pushed on the browser developers because it will be another 2 - 5 years before mass adoption and I'm pretty tired of CSS2.1's limitations.
Re:CSS 3 spec (Score:4, Informative)
There is no "CSS3 spec". There is a whole bunch of separate specs all advancing along the REC track separately. They're at various stages of readiness.
For example, CSS Namespaces is in CR ("spec work done, implement it please"). It'll become a REC once there is a test suite and two interoperable implementations and the various paperwork involved in becoming a REC is done.
Selectors Level 3, CSS Color Level 3, CSS Multi-column layout are all in Last Call, with the next step being either CR or PR (PR is "this is done implemented and all; just needs sign-off from the W3C staff"). Same for Media Queries, CSS Basic User Interface, CSS Marquee Level 3, CSS Print Profile, etc.
Was there a particular part of "CSS3" you were interested in seeing specced and implemented?
Parent
Re: (Score:3, Interesting)
That's the whole problem. All the experts are working for the browser vendors. The W3C never had any business overriding them. Css3 will never happen (standardized & widely implemented). But of course the relevant bits have long been implemented and now those await standardization. It would be nice if w3c bureaucracy could catch up here.
Basically what's wrong here is that after a agile start in the nineties, w3c turned into yet another standards organ. Essentially, for most of the past ten years they've
Sounds like a few people are confused... (Score:5, Informative)
XHTML 1 was the XML-ization of the existing HTML 4 stuff.
XHTML 2 was a new HTML version that sought to remove lots of HTML cruft (including non-XML syntax) and add new capabilities. Basically, it was working toward a new HTML version. This effort has died, because browser makers are not behind it - they are all behind HTML 5.
HTML 5 has always had an XML profile called XHTML 5, and that won't go away.
Re: (Score:3, Informative)
The doctype.
Not sure you'll like the answer : ) :
<!doctype html>
I believe because they wanted to keep it short and simple, and hixie doesn't believe in versioning HTML - having a version-less doctype forces people to keep it backwards-compatible when html6 rolls around. Perhaps someone else who followed the process better can chime in here.
HTML 5 parsing is just awful. (Score:4, Insightful)
At least with XML you have a simple, well-defined way to convert the XML text to a tree. With HTML 5, there's only "well-defined error handling". Read the sort-of specification [whatwg.org] for this.
Here's what's supposed to happen for just one of the hard cases. (There are dozens of other cases, some at least as ugly.) Parsing is in "body" mode (normal content) and an end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", "nobr", "s", "small", "strike", "strong", "tt", "u" has been encountered.
Follow these steps:
If there is no such node, or, if that node is also in the stack of open elements but the element is not in scope, then this is a parse error; ignore the token, and abort these steps.
Otherwise, if there is such a node, but that node is not in the stack of open elements, then this is a parse error; remove the element from the list, and abort these steps.
Otherwise, there is a formatting element and that element is in the stack and is in scope. If the element is not the current node, this is a parse error. In any case, proceed with the algorithm as written in the following steps.
Otherwise, append whatever last node ended up being in the previous step to the common ancestor node, first removing it from its previous parent node if any.
Re:HTML 5 parsing is just awful. (Score:4, Informative)
Now try to imagine Microsoft, Opera, Mozilla, and Google implementing that compatibly.
I believe they already do, for the most part. HTML5 parsing rules were mostly reverse-engineered from existing browsers' HTML parsing rules, which are more or less consistent across modern browsers, so it's only documenting what most existing browsers already do.
What the spec is defining is a limited subset of an SGML-like language (whose entire parsing rules, if incorporated into HTML, would span for pages) and how to transform it into a DOM. It isn't mandating any new parser rules, it only documents them for the benefit of new implementations of the spec, and to align what minor variations there are between browser parsing models together. Compared to SGML rules (of which HTML 4.01 is technically a subset), this is a great improvement.
Parent
Will they roll XHTML 2 features into X/HTML 5? (Score:3, Interesting)
I see a lot of debate here about XML versus SGML (or SGML-like) serialization and parsing rules, and plenty of people (rightly) pointing out that there is an XML version of HTML 5.
But what about those features which those of us who already code strictly to spec either way really care about? New elements that were scheduled to debut in XHTML 2.0 such as nl, h and section, the ability to put href and src attributes in any element, etc [xhtml.com]?
Those are the sorts of things which made the debate for me, not some silly XML vs SGML, strict vs lenient debate - either way I'll be writing my code for strict compliance with spec. I'm more concerned with what the features of the spec will be! Less so with how it deals with people out of compliance with it.
So any news on whether X/HTML 5 will be incorporating any of these, now that it's a W3C project and XHTML 2 is dead?
Re: (Score:3, Insightful)
Re: (Score:3, Informative)
XHTML has some great features, notably the ability to embed MathML and SVG in it (great for academic sites) and strict error handling. Unfortunately these features never caught on because Internet Explorer doesn't understand the XHTML MIME type at all. What a shame.
I'm not familiar with the MathML and SVG features. Hopefully there's another (if less elegant) way to do what you want in HTML5.
As for strict error handling, one of the things HTML5 is doing is defining exactly how errors are supposed to be handled. This doesn't mean "strict" error handling in the sense that any page that contains an error will completely fail to load at all, but it does mean we can expect a consistent behavior across multiple browsers even when the HTML doesn't validate.
The trouble with
Re:No more compound documents? (Score:4, Informative)
XHTML 1 is basically HTML4 with the added requirement that the document must also be well-formed XML. This is useful, because it allows you to put any other arbitrary (but properly-namespaced) XML data in the same file. XHTML 2 was meant to dramatically reduce the number of valid tags, clean up HTML even more than HTML 4 did, and split the spec into a large collection of smaller standards. No one really liked it; it was developed in the traditional W3C 'let's create a new standard without thinking too hard about how it will be implemented' way.
HTML 5 is an evolution of HTML 4 backed by people who actually implement these standards and developed in a more incremental way. Unlike HTML 4, HTML 5 doesn't specify the representation. It has SGML and XML bindings. HTML 5 with the SGML binding looks like classic HTML, HTML 5 with the XML binding looks like XHTML. HTML 5 with the XML binding has all of the advantages of XHTML; you can mix it with any other XML data in the same file, and have a unified DOM tree.
Parent
Re:No more compound documents? (Score:4, Informative)
"XHTML 1 is basically HTML4 with the added requirement that the document must also be well-formed XML"
It also deprecated a lot of the older tags that were made obsolete by CSS hence encouraging better separation of document structure and presentation. Unfortunately HTML5 undoes this particular good work because it caters to the lowest common denominator (i.e. bad developers who don't "get" separation of concerns and trivially parsable markup).
"HTML 5 is an evolution of HTML 4 backed by people who actually implement these standards and developed in a more incremental way."
The problem is, those people implementing those standards have proven time and time again how incompetent they are at implementing those standards. The state of standards compliance in browsers has for well over a decade been utterly shameful and that really goes for Firefox as much as it does IE. I'd argue it's those who use the standards that know best - people building the biggest sites on the net because they're the ones who need the markup to be able to support large scale application development. Browser vendors need to be able to implement that standard, don't get me wrong, but putting faith in them as the ones who guide the standards has time and time again proven disastrous - look at the HTML5 video tag debacle for perhaps the most recent example.
I'm not disagreeing with you though, XHTML2 wasn't brilliant, but I'm not convinced HTML5 is even any better than XHTML1 which was also an evolution of HTML4 and IMHO a better one. It was designed with those people building enterprise applications for the web in mind rather than joe average, who is more content using the likes of MySpace and Facebook to manage their content for them in the first place.
Of course, HTML5 can do everything XHTML does for the reasons you state, but sadly it seems to encourage bad practice whereas XHTML discouraged it. One final beef I have with HTML5 is that accessibility seems to have been ignored in it's creation, for example there were no real efforts to ensure easy inclusion of subtitles the previously proposed audio/video formats. Again, we really just don't seem to be any further on with web standards than we were at the start of the decade and again, the people to blame are the browser vendors as much as the W3C and it's allowed not particularly ideal or portable proprietary tools such as Flash to gain a lot of ground as a result.
Parent
Re:No more compound documents? (Score:5, Informative)
It also deprecated a lot of the older tags that were made obsolete by CSS hence encouraging better separation of document structure and presentation. Unfortunately HTML5 undoes this particular good work because it caters to the lowest common denominator (i.e. bad developers who don't "get" separation of concerns and trivially parsable markup).
I think you read a different version of HTML 5 to me. It still deprecates or removes all of the tags that HTML 4 and XHTML 1 removed, for example removing the center and font tags which were only deprecated by HTML 4.
Where it introduces new tags, it is for expressiveness. A lot of the 'separation of content and presentation' folks seem to think that HTML just needs three tags; span, div, and object. HTML 5 doesn't add more presentation elements, but it does add more tags with well-defined semantics. Examples of this include section and nav tags. These don't specify anything about the presentation, they just indicate that a part of the document is a section, or a set of navigation commands. A mobile browser, for example, might have an option to hide and show the nav section to conserve screen space.
Take a look at the current draft of HTML 5 [w3.org]. You'd be hard-pressed to find anything presentation-related. Presentation still goes in the stylesheets, HTML 5 just adds tags for common things so you don't need quite so many class attributes.
Parent
Re:No more compound documents? (Score:5, Insightful)
I think this attitude is more a case of wishful thinking and sometimes outright denial rather than than reality. Take a look at some of these, for instance:
Parent
Re: (Score:3, Insightful)
Count me in as one of the "give me more expressiveness" crowd. Span, div, and object are good enough for most purposes, but have their own problems. Writing [X]HTML/CSS pages for all media--conventional browser, print, and screen readers--is a bear. Having sane defaults for tags like STRONG and EM--that is, a certain inflection for the screen reader, and a decent-looking print default--saves developers a lot of time.
Re:No more compound documents? (Score:5, Informative)
No, HTML 5 has an XML serialisation and a tag-soup-compatible serialisation that, yes, looks like classic HTML, but in fact isn't based on SGML. It's based on the way popular browsers parse HTML rather than what they are supposed to do according to previous HTML specifications. One consequence of this is that obscure parts of previous versions of HTML that are not well-supported by popular browsers are not supported by HTML 5 - it's not completely backwards-compatible with previous versions of HTML.
Parent
Re:Rendering (Score:4, Insightful)
Does this mean that someday /. will actually render properly in a browser? I used IE, FF, Opera, and Safari all regularly. /. does not reneder 100% in any of them :(
How do you know that one of them isn't proper and the rest are merely deviations from proper? Or, more accurately, how do you know what it is supposed to look like if you say they are all wrong?
Parent
Re:Rendering (Score:4, Funny)
Simple: view source --> use your "The Matrix" vision to render everthing in your brain.
Every serious developer knows how to run code in his brain, that's how I run all my unit tests!
Parent