CDN Optimizing HTML On the Fly 121
Caerdwyn writes "Cotendo, which is a content distribution network, has taken to altering HTML as it passes through their CDN to optimize web pages for faster rendering. This is essentially a repackaging of the Apache mod mod_pagespeed (from Google), with the critical difference being that the rewriting of HTML occurs inline rather than at the web server. We all know that well-written HTML can result in much better rendering of whatever your content is; the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?' and 'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'"
Legal precedent (Score:1, Interesting)
Re: (Score:1, Insightful)
Re: (Score:2, Interesting)
Re:Legal precedent (Score:5, Insightful)
What's the difference between me configuring my servers to optimize our sites on our front-end proxies, and having our CDN doing it on their front-end proxies?
I think you missed that this is a service Cotendo provides to its paying customers.
Now, if Cotendo was doing this without their customers' permission, then your objection might have some kind of relevance. I can't find anything to indicate that this is the case though, and it seems like a dumb and stupid business move if it is.
The new Page Speed service offered by Cotendo will be part of its proprietary new performance application platform Cloudlet(TM), which is able to execute both open source and any proprietary code. Cotendo's new platform is in production with select customers and partners, which will be announced soon.
This makes it sound like it may actually be optimising the output from applications running on their own servers, rather than as a proxy altering content sent from the customer's servers.
Re: (Score:2)
and it seems like a dumb and stupid business move if it is.
Dumb business moves can often be overlooked because they provide other benefits that outweigh the negatives.
This also true for stupid business moves, as there can usually be found ways to counter the effects of such a decision.
However, if a business move is both dumb and stupid, you should probably go back and look at your decisions a little more carefully.
Re: (Score:2)
ISPs and carriers, are supposed to carry the content. Period. The moment they start altering that content, they have gone too far. I don't give the slightest damn what their justification is. It is not to be done.
I see nothing w
Re: (Score:2)
What if the telephone company could re-order the words you speak, or perhaps substitute other words "as good as" your own, in order to make their voice-compression algorithms more efficient? Would you agree to that?
If my telephone company offered me a service where their automated systems would re-order my words to make me sound like I actually knew what I was talking about, and I paid them to provide that service, then I would expect them do that. If they were to do it without my consent, then that would be another thing. But I haven't seen any indication that Cotendo is doing anything without their customer's blessing.
I see nothing wrong with encouraging the sources of the content to use things like mod-pagespeed on their servers.
But the CDN is the source. Technically true if the Cloudlet(tm) service is in fact running the cust
Re: (Score:2)
"If my telephone company offered me a service where their automated systems would re-order my words to make me sound like I actually knew what I was talking about, and I paid them to provide that service, then I would expect them do that. If they were to do it without my consent, then that would be another thing. But I haven't seen any indication that Cotendo is doing anything without their customer's blessing."
I will grant your point that this is a voluntary service for customers who explicitly pay for it. But in ANY other circumstance, if used by any sort of "common carrier" for example, this would be a Very Bad Idea Indeed. In fact it would almost certainly be illegal. See the other respondent's comment about altering magazines "carried" by 7-11.
(NOTE. In just about every way: morally, ethically, and technically, ISPs and most other internet services are "common carriers", just like companies that carry tele
Re: (Score:2)
But, it's optimizing the HTML, not changing the content of your message. In the context of a phone call, it would be like if they took your analog phone signal (like, from the handset), compressed it and then transmit
Re: (Score:2)
"But, it's optimizing the HTML, not changing the content of your message. In the context of a phone call, it would be like if they took your analog phone signal (like, from the handset), compressed it and then transmitted it in a form that the other side could understand. The end result remains the same."
It's NOT the same thing at all, and not the same end result. That is the point I was trying to get across, and which you seem to have missed. The only saving grace of this operation is that it's entirely voluntary on the part of the customers. It's not just some arbitrary ISP doing it without their knowledge.
First off, "optimizing the HTML" IS altering the content of my message. For a webpage, the HTML is the message. "Optimizing" it is indeed altering it. If optimizers were perfect, maybe that wouldn't
Re: (Score:2)
This week on "Thin End of The Wedge" - "Why there is no difference between a Web server GZIPing content and a Web server replacing all images with Goatse".
Re: (Score:2)
Be aware that the old gzip package has been renamed gnu-gzip. The new gzip package is actually GoatseZIP.
Re: (Score:2)
Intent. The law cares about intent (usually).
Re: (Score:2)
What's the legal difference (IANAL) between optimizing HTML and inserting ads?
What's the legal difference between a NAT gateway modifying packets in order to deliver content to you, and a NAT gateway modifying packets in order to insert ads? By your line of reasoning, every "home router" manufacturer should already be doing this.
Re: (Score:2)
Re: (Score:2)
There are plenty of home router manufacturers
Only one of which has an exclusive deal with your ISP to provide hardware.
Re: (Score:2)
And there are plenty of CDN providers. So I guess you're taking abck your previous comment?
Re: (Score:2)
AOL used to do this too. If you loaded a page through their crappy client software, a transparent proxy would replace the original JPG / GIF with an equivalent ART image. ART was an AOL proprietary image format which was pretty good at compressing images at low data rates. I suppose the theory was they made their software more responsive and reduced bandwidth / phone charges for a largely unnot
Re: (Score:2)
What's the legal difference (IANAL) between optimizing HTML and inserting ads?
Optimizing HTML does not appear to create a derivative work as defined in US copyright statute [copyright.gov] because the removal of data does not represent an original work of authorship by itself. Ads, on the other hand, are an original work of authorship; there's a reason that the text of an ad is "copywritten" before it's copyrighted.
Re: (Score:2)
If I publish HTML, Javascript, and CSS, then I expect that to reach the viewer untouched.
Of course it will if you use HTTPS. If you're using HTTP, any man in the middle can break your site anyway.
Formatting matters when rendering HTML
Not all lossy modifications to an HTML document, JavaScript program, or CSS style sheet affect the end result. The difference between one space, eight spaces, a tab, and a newline does not matter due to CSS's whitespace normalization rules. So one way to improve compressibility of HTML is to normalize whitespace outside elements where whitespace does matter (e.g. <pre> and parts of <script>
Re: (Score:1)
Another problem is that this could lead to formatting problems in web pages. Unfortunately, formatting HTML s
Re: (Score:2)
In this case, the really huge difference is that Cotendo is actually HIRED to deliver the content as efficiently as possible, including optimizing the HTML. They're not imposing this on the unwilling, it's a service that the content providers are willingly paying for. They'd be in legal trouble if they DIDN'T alter the HTML. Legally speaking, they are in the same position as an editor for hire.
Put another way, when you click that link, you are getting the content as the rights holder intends. Why would that
Re: (Score:2)
Re: (Score:2)
If I were a content provider whose HTML was being modified in-flight, I'd invoke a law that already exists for that sort of thing - it's called copyright. My customer requested information from me; I provided it, and as such it is automatically copyrighted. Any modification in transit without authorization is illegal already, IMHO.
What happens when I request data from a web service for the lowest price for an item from pricegrabber and some intermediate ISP decides to replace the real answer with the price
Re:Legal precedent (Score:4, Insightful)
If I were a content provider whose HTML was being modified in-flight, I'd invoke a law that already exists for that sort of thing - it's called copyright. My customer requested information from me; I provided it, and as such it is automatically copyrighted. Any modification in transit without authorization is illegal already, IMHO.
The article is about a content distribution network. That means that the content provider is paying them to make sure that their content reaches the customers quickly.
If the content provider doesn't like the content being modified, they should just ask their CDN provider to stop doing it - and if they won't, then just use another one! No need for legal action here :-)
Re: (Score:2)
ISPs have been injecting ads for years. Have you ever tried one of those free isps? I don't think it's going to catch on with mainstream ISPs because people don't like to pay for something and still get bombarded by ads. It still amazes me that cable television can get away with it, but DVRs have almost solved that problem.
Re:Legal precedent (Score:4, Interesting)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
My content how it was intended (Score:3, Interesting)
If I see a bad website that takes 20 minutes to load, then I will never buy anything from that site or it's company. If they can't hire a decent web programmer, they don't deserve my money.
However, if you change the page to make it render faster, the ISP is lying FOR the shitty company and their shitty website by making it appear to be a well crafted site.
tl;dr: Leave the shit shitty. It'll put bad programmers out of business which we need.
Re: (Score:2)
If you pay a CDN to deliver your content, you see it probably as a good service, if it can actually optimize the code.
Re: (Score:2)
If you get to the point where you need and can afford to pay for a CDN - you should already have some decent in house SEO who can do this for you.
Re: (Score:1)
No offense, but I took the liberty of mending your statement.
Re: (Score:2)
It seems that they only follow best practices for the web to optimize JPEG and PNG files, and remove whitespace and comments from JavaScript and CSS. They also enable gzip compression of all text files.
Optimizing caching — keeping your application's data and logic off the network altogether
Minimizing round-trip times — reducing the number of serial request-response cycles
Minimizing request overhead — reducing upload size
Minimizing payload size — reducing the size of responses, downloads, and cached pages
Optimizing browser rendering — improving the browser's layout of a page
Re: (Score:1)
Re: (Score:2)
One thing is evident from this story: neither the Slashdot editor (Timothy), or the submitter (assuming the rubbish as not added by the editor) or most Slashdot commentators have the slightest idea what a CDN is.
They will rewrite your HTML if you pay them to do it, and you configure your site and DNS to use their services. Got that?
Re: (Score:2)
How short-sighted of you. What does the quality of a web site have to do with the quality of the products sold through it?
Imagine the following scenario: someone has been making a given product for quite a while and as a result is quite proficient in their trade. Say this person makes axes, fine balanced axes which make e
Re: (Score:1)
I'm amazed no one has invented a "Compiler HTML" format. You'd take HTML, run it through a compiler on the web server and spit out some compact binary representation of the page. I know people with forums that serve GBs of text every day - compiled HTML would cut their bandwidth costs significantly at the cost of more CPU time on the server. Or the HTML compiler machine could be a proxy in front of another server running PHP generated HTML. It's fetch pages and cache the compiler version and serve that. You
Too much work (Score:4, Insightful)
Instead of doing it over and over again on the fly, why not just do it once and shoot the "fixed" html back to the authors, and firmly insist that they update their pages? This seems like a much better way to accomplish the same thing.
Re: (Score:1, Insightful)
The authors could be using any one of a number of frameworks which make it at best very hard to meet some best practices.
Re:Too much work (Score:4, Insightful)
Also, most HTML these days is generated rather than static, even if just to have common parts as includes rather than multiple copies.
Re: (Score:2)
So why doesn't the service do this: Fix the HTML, diff the original and result, alert the developers of the changes?
Re: (Score:2)
CDN customers are likely to be large customers, and large customers don't have Web developers per se, except maybe one or two to address hotspots.
The rest of the time they are using a CMS, and all of the major CMSs have some ... sub-optimal code.
It's inevitable in code written by volunteers around the world with no real central co-ordination and decision making, and it's much better than not having free-as-in-beer CMSs at all.
BUT - if your site is on SuperFantasticCMS and you find that ten of the core modul
Re: (Score:3, Informative)
CDN customers are likely to be large customers, and large customers don't have Web developers per se, except maybe one or two to address hotspots.
The rest of the time they are using a CMS, and all of the major CMSs have some ... sub-optimal code.
This. Newspapers are notorious for crappy sites, implemented 10+ years ago on top of expensive proprietary tools based on the use of table elements and "liberal" abuse of SGML properties. Even the much-admired BBC site is kept together by a hodgepodge of 15-year-old code, which is known to inflict brain damage after webdevs are repeatedly exposed to it.
These are big CDN customers, and they will jump on any opportunity to optimize without rewriting their legacy systems.
It's inevitable in code written by volunteers around the world with no real central co-ordination and decision making, and it's much better than not having free-as-in-beer CMSs at all.
Bollocks. CMSes built by OSS communitie
Re: (Score:2)
>BUT - if your site is on SuperFantasticCMS and you find that ten of the core modules have these kinds of issues, it's a no-brainer to elect to use this service instead of fixing those ten modules and then battling to get them folded in upstream.
You know you raise a point, SuperFantasticCMS is really generating shittier html with each release now and their all snotty about accepting patches - so the users forked it. You really ought to be using HyperFantasticCMS instead - it's had major code cleanups, th
Re: (Score:2)
The rest of the time they are using a CMS,
Citation needed: what proportion of websites that have enough volume to use a CDN use an off the shelf CMS? I know that many have heavy in-house development, or use a framework (the BBC uses Catalyst, severak newspapers use Django) which means they will control the HTML output.
and all of the major CMSs have some ... sub-optimal code.
Even if you are right, only some of the HTML is generated by the CMS, most will come from your own templates.
It's inevitable in code written by volunteers around the world with no real central co-ordination and decision making
Proportion of volunteer developers on FOSS CMSs? Are people writing code because they need it for their work, or to drum up b
Re: (Score:2)
There are already ways to have your markup checked if you want that.
This is a paid CDN service. If the content provider doesn't want to use it, they don't.
Legal troubles? (Score:5, Interesting)
the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?' and 'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'
I couldn't give a rat's ass about legal troubles. Slashdot is still a tech forum, right?
There are LOADS of much more interesting questions to ponder, such as: what is exactly the speed improvement? And does it work for Javascript and CSS too? And wouldn't it be much better to work on images instead? Or is that too computationally intensive? What kind of algorithm do they use? In what language is it implemented? Et cetera. Legal troubles shmegal smougles.
Re:Legal troubles? (Score:5, Informative)
https://code.google.com/speed/page-speed/docs/rules_intro.html [google.com]
I think this link answers all your questions.
After a quick first glance, it seems like it isn't doing anything that a good web designer shouldn't have already have done. Then again, the percentage of well-designed pages out there mean this could still provide a speedup...
Re: (Score:3, Funny)
My first thought was "why not write good code to start with?" This is like worrying about a new liposuction method, when instead you should get off your fat ass and drop that Snickers bar. It is solving the symptoms, not the problem.
Re: (Score:2)
My first thought was "why not write good code to start with?"
That's hilarious, that is. I don't think I have ever in my life seen code start bloated and move towards tight efficiency. Usually you find developers following the age-old rules about "Don't optimise" and "Don't optimise - yet" and by the time it is appropriate to start optimising, the product's already shipped.
Re: (Score:2)
That's hilarious, that is. I don't think I have ever in my life seen code start bloated and move towards tight efficiency.
That's not an answer to his question. He asked why not start with tight and efficient code?
Re: (Score:3, Insightful)
That's not an answer to his question. He asked why not start with tight and efficient code?
Fair point.
I think the last time I saw anything tight and efficient was back when the nature of the computer it was running on forced that. IME, code efficiency is inversely proportional to the power of the system on which it is expected to run.
Re: (Score:1)
Re: (Score:2)
Because you are often on a tight schedule defined by someone so far away from programming he thinks it's done like in Swordfish.
Also a note on your fat ass comment - some might have it that easy, but quite a lot are fighting "programming" in their genes. Dropping 10% of your weight will cause all sort of hormones to fire, your body is programmed to think it's in distress and send hormones out trying to force you to regain weight. I know from experience how fucking hard it is, it is absolutely doable, but it
Re: (Score:1)
Re: (Score:2)
How does using an optimizer make it hard to maintain? You still have the original source. That makes no sense. That is like saying that the Linux kernel is unmaintainable because it is compiled before it is used. You don't maintain the binary, you maintain the source, THEN compile it with the changes.
What about using less javascript code to begin with? The problem with many programmers is the false idea that more lines is better code, and more features is a better experience.
Re:Legal troubles? (Score:4, Insightful)
After a quick first glance, it seems like it isn't doing anything that a good web designer shouldn't have already have done. Then again, the percentage of well-designed pages out there mean this could still provide a speedup...
And then, you might find yourself in my position. I administer a website with over 100,000 static files, created using a variety of tools over the course of the last 8 years. And one of those tools was FrontPage.
Given the size of our shop, coupled with the need to handle new content coming in, the best I can realistically hope for is that the formatting of our new content doesn't suck quite as tremendously as the older stuff. On top of everything else, we provide important legal content to one of the most Internet-deprived regions in the world. Bandwidth around here is often measured in single-digit kilobytes.
... You can bet your boots I'm going to give this module a test-drive. I'd be crazy not to.
Google Analytics (Score:2)
Re: (Score:2)
That's mostly because the GA code needs to go at the bottom of the page, not the top.
Re: (Score:2)
Re: (Score:2)
My problem with GA is that they add so many cookies.
For all the different domains (with and without www, etc.) seperate cookies and they are large.
This slows down the browser when doing requests to the server, because uploading large cookies takes more time because of the upload-/-download ratio of a lot of customer/business connections.
Re: (Score:1)
Re: (Score:2)
JavaScript viruses? That will never happen. (Yeah here is hoping someone is clever enough to make it just so).
Re: (Score:2)
Re: (Score:2)
> I couldn't give a rat's ass about legal troubles.
Especially as there are none.
There are no legal issues (Score:5, Insightful)
If you voluntarily upload your web site to a CDN that tells you it is going to optimise your code, what legal issues could there be? The arrangement is entirely mutually consensual. If you don't want your site optimised, then don't use that CDN.
Re: (Score:2)
Oh please (Score:5, Insightful)
This seems like an ad for Contendo disguised as an inflammatory post.
Any webmaster worth their salt is using a variety of tools to improve loading speed - minification of html/css/js, combining scripts, CSS optimization, js packing, compressing PNGs with better tools and using CSS sprites.
I use W3 Total Cache for two of my blogs and the speed increase is substantial.
While we are at it, I wish developers would think it through before using JQuery for trivial stuff. Loading JQuery + bunch of plugins to do simple (and I mean simple) fades or form validations is pointless. Here's an example [richnetapps.com] of what I mean.
So if they're doing this transparently, it's all th better.
Re: (Score:1)
How much do all these techniques alter the original code? Just curious
Re: (Score:1)
Re:Oh please (Score:4, Insightful)
Re: (Score:1)
Re: (Score:2)
I agree, that free service isn't that fast and the cache-time isn't very long. The cache time in the HTTP-headers is set to just 1 hour, please, why ?
Re: (Score:2)
Well, if JQuery were integrated with the browsers, it'd be great.
I keep wrestling with it because of lazy CMS (WP, Joomla) plugin developers. If you're not careful, you can end up with three requests to load JQuery, one local, one 1.3.x from Google and another 1.4.x from Google as well, plus various extensions like JQueryCycle AND some Scriptaculous as well. I've seen it.
And all of this for trivial stuff, like validating if a form field is not null, cross-fading a div or opening a pop-up. Come on.
JQuery is
Re: (Score:2)
"if JQuery were integrated with the browsers"
if you mean, if features from jquery (and similair frameworks) were included in the browser, then that has already happend.
It is called the document.querySelectorAll
http://www.w3.org/TR/selectors-api/ [w3.org]
First we had webdevelopers who wanted to better seperate content, behaviour and style. And they started to implement that, they asked the browser developers to make such an API because it would be a lot faster if browser did that. The browserdevelopers didn't do so,
No legal trouble (Score:4, Informative)
'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'"
Why should there be? They're not selling bandwidth. They're selling an optimization service (at least, according to their press release, that's what they're selling). This seems to be a clear opt-in situation for their customers. Also, their customers are the ones who are going to be saving money because of this, probably not Cotendo.
But what of the crypto possibilities? (Score:2, Insightful)
Just look to North Korea [thedailywtf.com]!
Re: (Score:2)
At least it has their email address [mailto].
Yes it will cause browser quirks (Score:1, Flamebait)
... in browsers with an incomplete implementation of the rendering engine, of course. i.e. IE
Re: (Score:2)
Re: (Score:2)
I guess I should have. http://acid3.acidtests.org/ [acidtests.org] is a good measure of the various rendering implementations of browsers.
The problem: older sites are hacked together to accommodate bugs and workarounds in older browsers. Changing the source on-the-fly could potentially break these workarounds.
Even Microsoft says so, in their not-so-many-words way: [microsoft.com]
Some Web sites are designed for older browsers. You may experience compatibility issues on these sites until they are updated for Internet Explorer 8 or for Internet Explorer 9 Beta.
Blind optimizations? (Score:2)
Optimized HTML (Score:1)
Encode as binary HTML (fastinfoset or exi) & transcode jpg images to jpeg-2000 (approx. 50% saving on image bandwidth vs. optimized jpeg).
Simples.
Apache at CDN level? (Score:2)
Re: (Score:2)
The summary said, doing what is similair to the apache-module.
If you need an answer... (Score:2)
the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?'
A snippet out mod_pagespeed's "rewrite CSS" filter:
"CSS minification is considered moderate risk. Specifically, there is an outstanding bug that the CSS parser can silently lose data on malformed CSS. This only applies to malformed CSS or CSS that uses proprietary extensions. All known examples have been fixed, but there may be more examples not discovered yet. Without this the risk would be low. Some JavaScript code depends upon the exact URLs of resources. When we minify CSS we will change the leaf name o
Beware mod-pagespeed (Score:1)
I installed mod-pagespeed recently on a server, and it had some unintended results to put it mildly.
Initial page-loads were slower, as perhaps page-speed was analyzing each page and figuring out an optimization plan. However, that wasn't the worst problem: mod-pagespeed sometimes BROKE javascript code. It attempted to combine and minify javascript files and code, and modern web browsers started producing javascript errors and not working as expected.
Needless to say, mod-pagespeed was immediately removed
Re: (Score:2)
Re: (Score:2)
it doesn't ? that would be kind of defeat the reason of using it then.
Man In The Middle? (Score:1)
Re: (Score:2)
Only if the customer was so naive as to use HTTP in the first place.. Oh, I see what you did, there. :)
Interesting Feature (Score:2)
View Source (Score:2)
What happened to view source, the browser function that build the web?
I think it is not nice to deliver unreadable code to your users. Removed line breaks and indenting spaces, obfuscated javascript variables, automatic changing of meaningful file names to some hash-gibbeish ... do not like.
Re: (Score:2)
> I think it is not nice to deliver unreadable code to your users.
Then you dislike all systems such as CMSs that deliver generated code, I assume.
Re: (Score:2)
Indeed! Most of them cannot even produce hypertext :)
However, with a bit of work it is possible to make them generate code that is more or less readable for the users.
Anyway, not-so-great readable code is still better than code that is only optimized for computers. Almost not different from going binary. And that is against my understanding of the web's spirit.
Go Daddy is also using mod_pagespeed (Score:2)
That's not where time is going (Score:2)
If pages load slow, it's very seldom because their HTML has too much white space.
Most page load delays today come from waits for loads from third-party sites. Usually ads, of course. Or because they're doing something in Javascript that's eating time in the browser.
Now, rewriting the page to remove ads - that would really speed things up. Or just replace all images from ad sites. The server still reads the image from the ad site, so the ad site thinks it delivered the image, but there's no need to s
my take on this... (Score:2)
What is very interesting about this, is that <rest of comment optimized away>