Forgot your password?
typodupeerror
Networking IT

Are Long URLs Wasting Bandwidth? 379

Posted by ScuttleMonkey
from the waste-away-and-build-a-bigger-pipe dept.
Ryan McAdams writes "Popular websites, such as Facebook, are wasting as much as 75MBit/sec of bandwidth due to excessively long URLs. According to a recent article over at O3 Magazine, they took a typical Facebook home page, looked at the traffic statistics from compete.com, and figured out the bandwidth savings if Facebook switched from using URL paths which, in some cases, run over 150 characters in length, to shorter ones. It looks at the impact on service providers, with the wasted bandwidth used by the subsequent GET requests for these excessively long URLs. Facebook is just one example; many other sites have similar problems, as well as CMS products such as Word Press. It's an interesting approach to web optimization for high traffic sites."
This discussion has been archived. No new comments can be posted.

Are Long URLs Wasting Bandwidth?

Comments Filter:
  • by teeloo (766817) on Friday March 27, 2009 @06:05PM (#27364267)
    compression to shorten the URL's?
    • by Skal Tura (595728)

      handshake for the compression, and packet headers would probably become more than the potential benefits, not worth the effort.

    • by corsec67 (627446)

      You mean something like mod_gzip?

      That leave only the url in the request header, the rest should (already) be compressed by mod_gzip.

      • by tepples (727027)
        Take a page full of short URLs and a page full of long URLs. Run them both through mod_gzip. The page with short URLs will still probably come out smaller.
        • by jd (1658) <imipak@@@yahoo...com> on Friday March 27, 2009 @06:48PM (#27364911) Homepage Journal

          Most of the time, yes, but then there's a question of trade-off. Small URLs are generally hashes and are hard to type accurately and hard to remember. On the other hand, if you took ALL of the sources of wastage in bandwidth, what percentage would you save by compressing pages vs. compressing pages + URLs or just compressing URLs?

          It might well be the case that these big web services are so inefficient with bandwidth that there are many things they could do to improve matters. In fact, I consider that quite likely. Those times I've done web admin stuff, I've rarely come across servers that have compression enabled.

          • Those times I've done web admin stuff, I've rarely come across servers that have compression enabled.

            Not sure why you would see that. Even for small sites that don't come close to hitting their minimum bandwidth allocation, using mod_gzip increases the visitor's experience because the HTML and CSS files download a lot faster and the processing overhead is minimal.

            As for this story, I think whoever wrote it had this epiphany while they were stoned. There are so many other ways that Facebook could save bandwidth if they wanted to that would be easier.

            75Mb/s is probably nothing to a site like Facebook. Let's

            • by dgatwood (11270) on Friday March 27, 2009 @08:14PM (#27365951) Journal

              Depending on your network type, you may not get any benefit from shorter URLs at all. Many networking protocols use fixed-size frames, which then get padded with zeroes up to the end of the frame. For example, in ATM networks, anything up to 48 bytes is a single frame, so depending on where that URL occurs relative to the start of a frame, it's possible that it would take a 48 byte URL to cause even one extra frame to be sent.

              Either way, this is like complaining about a $2 budget overrun on a $2 billion project. Compared with the benefits of compressing the text content, moving all your scripts into separate files so they can be cached (Facebook sends over 4k of inline JavaScript with every page load for certain pages), generating content dynamically in the browser based on high density XML without all the formatting (except for the front page, Facebook appears to be predominantly server-generated HTML), removing every trace of inline styles (Facebook has plenty), reducing the number of style sheet links to a handful (instead of twenty), etc., the length of URLs is a trivial drop in the bucket.

    • by truthsearch (249536) on Friday March 27, 2009 @06:24PM (#27364577) Homepage Journal

      They should just move all the GET parameters to POST. Problem solved. ;)

      • by slummy (887268)
        That wouldn't be very UX-centric.

        If pages continually POST to each other, hitting the browser's back button will display the annoying alert asking you to "Resend POST data".
        • by jd (1658)

          Then dump CGI-like syntax completely and use applets that send back data via sockets.

        • by smellotron (1039250) on Friday March 27, 2009 @09:40PM (#27366725)

          You're missing the joke... GET requests look like this:

          GET /url?a=b&c=d HTTP/1.0

          POST requests look like this:

          POST /url HTTP/1.0
          a=b&c=d

          Same amount of content... URL looks shorter, but the exact same data as the querystring gets sent inside the request body. Thus, switching from GET to POST does not alter the bandwidth usage at all, even if it makes the URL seen in the browser look shorter.

      • by dgatwood (11270) on Friday March 27, 2009 @09:09PM (#27366447) Journal

        And even with the wink, this still got initially moderated "Interesting" instead of "Funny".... *sigh*

        To clarify the joke for those who don't "GET" it, in HTTP, POST requests are either encoded the same way as GET requests (with some extra bytes) or using MIME encoding. If you use a GET request, the number of bytes sent should differ by... the extra byte in the word "POST" versus "GET" plus two extra CR/LF pairs and a CR/LF-terminated Content-length header, IIRC.... And if you use MIME encoding for the POST content, the size of the data balloons to orders of magnitude larger unless you are dealing with large binary data objects like a JPEG upload or something similar.

        So basically, a POST request just hides the URL-encoded data from the user but sends almost exactly the same data over the wire.

    • by gbh1935 (987266) on Friday March 27, 2009 @07:09PM (#27365191)
      This thread is wasting more bandwidth
  • by slummy (887268) <shawnuthNO@SPAMgmail.com> on Friday March 27, 2009 @06:06PM (#27364295) Homepage
    Wordpress by default allows you to configure URL writing. The default is set to something like: http://www.mysite.com/?p=1 [mysite.com].

    For SEO purposes it's always handy to switch to the more popular example: http://www.mysite.com/2009/03/my-title-of-my-post.html [mysite.com].

    Suggesting that we cut URL's that help Google rank our pages higher is preposterous.
  • Who knows? (Score:5, Funny)

    by esocid (946821) on Friday March 27, 2009 @06:08PM (#27364313) Journal
    Are forums (fora?) like these wasting bandwidth as well by allowing nerds, like myself, to banter about minutia (not implying this topic)? Discuss amongst yourselves.



    Read the rest of this comment
    • Re:Who knows? (Score:5, Insightful)

      by phantomfive (622387) on Friday March 27, 2009 @06:20PM (#27364535) Journal
      Seriously. No one better tell him about the padding in the IP packet header. A whole four bits is wasted in every packet that gets sent. More if it's fragmented. Or what about the fact that http headers are in PLAIN TEXT? Talk about a waste of bandwidth.

      In reality I think by watching one youtube movie you've used more bandwidth than you will on facebook URLs in a year.
    • One man's waste is another man's treasure. Some say, "The world is my oyster." I say, "The world is my dumpster."

      Wasted bandwidth, indeed.

    • by jd (1658)

      I discussed it with myselves, but there was no agreement. Well, other than the world should use IPv6 or TUBA and enable multicasting by default.

  • by Foofoobar (318279) on Friday March 27, 2009 @06:08PM (#27364315)
    The PHPulse framework [phpulse.com] is a great example of a better way to do it. It uses one variable sent for all pages which it then sends to a database (rather than an XML page) where it stores the metadata of how all the pages interelate. As such, it doesn't need to parse strings, it is easier to build SEO optimized pages and it can increase page load times by 10 times over other MVC frameworks.
    • Re: (Score:3, Insightful)

      by mattwarden (699984)

      You mean that ?area=51 crap? How is http://mysite.com/?area=51 [mysite.com] usable?

      (Unless the page is about government conspiracies, I guess.)

  • by markov_chain (202465) on Friday March 27, 2009 @06:09PM (#27364331) Homepage

    The short Facebook URLs waste bandwidth too ;)

  • Wordpress? (Score:4, Informative)

    by BradyB (52090) on Friday March 27, 2009 @06:09PM (#27364335) Homepage

    By default Wordpress produces short urls.

  • Waste of effort (Score:5, Interesting)

    by El_Muerte_TDS (592157) <elmuerte@dr u n k snipers.com> on Friday March 27, 2009 @06:09PM (#27364339) Homepage

    Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually).
    I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls.
    But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

    ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

    • Re:Waste of effort (Score:5, Insightful)

      by JCY2K (852841) on Friday March 27, 2009 @06:13PM (#27364413)

      Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually). I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls. But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

      ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

      Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually). I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls. But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

      ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

      To your ps, most of that is easily comprehensible It was an article that ran today; only the 2017250 is unmeaningful in itself. Perhaps article.pl?sid=09/03/27/Muerte/WasteOfEffort would be better but we're trying to shorten things up.

    • Re:Waste of effort (Score:5, Interesting)

      by krou (1027572) on Friday March 27, 2009 @06:20PM (#27364521)

      Exactly. If they wanted to try optimize the site, they could start looking at the number of Javascript files they include (8 on the homepage alone) and the number of HTTP requests each page requires. My Facebook page has *20* files getting included alone.

      From what I can judge, a lot of their Javascript and CSS files don't seem to be getting cached on the client's machine either. They could also take a look at using CSS sprites to reduce the number of HTTP requests required by their images.

      I mean, clicking on the home button is a whopping 726KB in size (with only 145 KB coming from cache), and 167 HTTP requests! Sure, a lot seem to be getting pulled from a content delivery network, but come on, that's a bit crazy.

      Short URIs are the least of their worries.

    • by gknoy (899301)

      Depending on the link density of one's pages that are actually served out to users, the bits used by the links themselves might be a large proportion of the page that is served. Yes, there's other stuff (images, javascript), but from the server's perspective those might be served someplace else -- they're just naming them. If the links can be shortened, especially for temporary things not meant to be indexed, it can save some bandwidth.

      I'm not saying it's a primary way to save bandwidth, just that it's an

  • Irrelevant (Score:5, Insightful)

    by Skal Tura (595728) on Friday March 27, 2009 @06:09PM (#27364345) Homepage

    It's irrelevantly small portion of the traffic, while at the scale of Facebook, it could save some traffic, but does not make any impact on the bottomline worthwhile the effort!

    150 chars long url = 150 bytes VS 50KILObytes + Images of rest of the pageview....

    I'm throwing out of my head that 50kilobytes for the full page text, but a pageview often runs at over 100kb.

    So it's totally irrelevant if they can shave off the 100kb a whopping 150bytes.

    • ya. i hav better idea. ppl shuld just talk in txt format. saves b/w. and whales. l8r

      Seriously, though, I don't exactly get how a shorter URL is going to Save Our Bandwidth. Seems like making CNET articles that make you click "Next" 20 times into one page would be even more effective. ;)

      The math, for those interested:

      So to calculate the bandwidth utilization we took the visits per month (1,273,0004,274) and divided it by 31. Giving us 41,064,654. We then multiplied that by 20, to give us the transfer in kilobytes per day of downstream waste, based on 20k of waste per visit. This gave us 821293080, which we then divided by 86400 which is the number of seconds in a day. This gives us 9505 kilobytes per second, but we want it in kilobits, so we multiply it by 8. Giving us 76040, finally we divide that by 1024 to give us the value in MBits/sec. Giving us 74Mbit/sec. One caveat with these calculations is that we do not factor in gzip compression. Using gzip compression, we could safely divide the bandwidth wasting figures by about 50%. Browser caching does not factor in the downstream values, as we are calculating the waste just on the HTML file. It could impact the upstream usage as not all objects maybe requested with every HTML request.

      • Re:Irrelevant (Score:4, Interesting)

        by Skal Tura (595728) on Friday March 27, 2009 @06:37PM (#27364749) Homepage

        So to calculate the bandwidth utilization we took the visits per month (1,273,0004,274) and divided it by 31. Giving us 41,064,654. We then multiplied that by 20, to give us the transfer in kilobytes per day of downstream waste, based on 20k of waste per visit. This gave us 821293080, which we then divided by 86400 which is the number of seconds in a day. This gives us 9505 kilobytes per second, but we want it in kilobits, so we multiply it by 8. Giving us 76040, finally we divide that by 1024 to give us the value in MBits/sec. Giving us 74Mbit/sec. One caveat with these calculations is that we do not factor in gzip compression. Using gzip compression, we could safely divide the bandwidth wasting figures by about 50%. Browser caching does not factor in the downstream values, as we are calculating the waste just on the HTML file. It could impact the upstream usage as not all objects maybe requested with every HTML request.

        roflmao! I should've RTFA!

        This is INSULTING! Who could eat this kind of total crap?

        Where the F is Slashdot editors?

        Those guys just decided per visit waste is 20kb? No reasoning, no nothing? Plus, they didn't see on pageviews, just visits ... Uh 1 visit = many pageviews.

        So let's do the right maths:
        41,064,654 visits
        Site like Facebook would probably have around 30 or more pageviews per visit. let's settle for 30.

        1,231,939,620 pageviews per day.

        150 average length of url. Could be compressed down to 50. 100 bytes to be saved per pageview.

        123,193,962,000 bytes of waste, 120,306,603Kb per day, or 1392Kb per sec.

        In other words:
        1392 * 8 = 11136Kbps = 10.875Mbps.

        100Mbps guaranteed costs 1300$ a month ... They are wasting a whopping 130$ a month on long urls ...

        So, the RTFA is total bullshit.

        • Re:Irrelevant (Score:4, Informative)

          by Anonymous Coward on Friday March 27, 2009 @09:02PM (#27366377)

          You missed the previous paragraph of the article where they explained where they got the 20k value, perhaps you should read the article first. :)

          They rounded down the number of references, but on an average Facebook home.php file there are 250+ HREF or SRC references in excess of 120 characters. They took that these could be shaved by 80 bytes each. Thats 80 bytes x 250 references = 20,000 bytes or 20k.

          Your math is wrong, its taking into account just one URL, when there are 250 references on home.php alone! They did not even factor in more than one page view per visit. If they did it your way, you would be looking at far more bandwidth utilization that 74MBit/sec.

  • Twitter clients (including the default web interface) auto-tinyURL every URL put into it. Clicking on the link involves not one but *2* HTTP GETs and one extra roundtrip.

    How long before tinyurl (and bit.ly, ti.ny, wht.evr...) are cached across the internet, just like DNS?

  • by nysus (162232) on Friday March 27, 2009 @06:11PM (#27364367)

    This is ridiculous. If I have a billion dollars, I'm not going to worry about saving 50 cents on a cup of coffee. The bandwidth used by these urls is probably completely insignificant.

    • That's a funny way to look at it. If I save 50 cents a day on my cup of coffee I will have another billion dollars in just 5479452 years (roughly). And that's excluding compound interest!

      • Re: (Score:2, Offtopic)

        by jd (1658)

        Just how interesting are the compounds in coffee, anyway?

    • by scdeimos (632778) on Friday March 27, 2009 @06:23PM (#27364565)

      I think the O3 article and the parent have missed the real point. It's not the length of the URL's that's wasting bandwidth, it's how they're being used.

      A lot of services append useless query parameter information (like "ref=logo" etc. in the Facebook example) to the end of every hyperlink instead of using built-in HTTP functionality like the HTTP-Referer client request headers to do the same job.

      This causes proxy servers to retrieve multiple copies of the same pages unnecessarily, such as http://www.facebook.com/home.php [facebook.com] and http://www.facebook.com/home.php?ref=logo [facebook.com], wasting internet bandwidth and disk space at the same time.

      • Re: (Score:3, Insightful)

        by XanC (644172)

        You can't ever rely on the HTTP-Referer header to be there. Much of the time, it isn't; either the user has disabled it in his browser, or some Internet security suite strips it, or something. I'm amazed at the number of sites that use it for _authentication_!

        • Plus (Score:3, Insightful)

          by coryking (104614) *

          The HTTP-Referer isn't designed for ?ref=somesource

          Your stat software wants to know if more people click to your page through the logo ?ref=mylogo or through a link in the story ?ref=story. The Referer can't give you that info.

          The HTTP-Referer also is no good for aggregation. It only give you a URL. If you didn't append something like ?campaign=longurl, it would be almost impossible to track things like ad-campaigns.

          HTTP-Referers *are* good for dealing with myspace image leeches. If you haven't I sugges

  • by kenh (9056)

    How many times are the original pages called? Is this really the resource hog?

    What about compressing images, trimming them to their ultimate resolution?

    How about banishing the refresh tags that cause pages to refresh while otherwise inactive? Drudgereport.com is but one example where the page refreshes unless you browse away from it...

    If you really want to cut down on bandwidth usage, eliminate political commenting and there will never be aneed for Internet 2!

    • by jd (1658)

      Replacing all the images with random links to adult sites would save considerable bandwidth and I doubt the users would notice the difference.

  • Wow. Just wow. (Score:4, Informative)

    by NerveGas (168686) on Friday March 27, 2009 @06:11PM (#27364379)

    75 whole freaking megabits? WOWSERS!!!!

    They must be doing gigabits for images, then. Complaining about the URLs is complaining about the 2 watts your wall-wart uses when idle, all the while using a 2kW air conditioner.

    • Typical half-assed slack-alism. HEY! If I take a really small number and multiply it by a REALLY HUGE number, I get a REALLY BIG NUMBER! The end is nigh! Panic and chaos!!!
    • by Dynedain (141758)

      Even worse, it's like complaining about one person's wall-wart in an entire city of homes using air-conditioners.

  • by JWSmythe (446288) * <jwsmythe@jwsm[ ]e.com ['yth' in gap]> on Friday March 27, 2009 @06:12PM (#27364383) Homepage Journal

        This is a stupid exercise. Oh my gosh, there's an extra few characters wasted. They're talking about 150 characters, which would be 150 bytes, or (gasp) 0.150KB.

        10 times the bandwidth could be saved by removing a 1.5KB image from the destination page, or doing a little added compression to the rest of the images. The same can be said for sending out the page itself gzipped.

        We did this exercise at my old work. We had relatively small pages. 10 pictures per page, roughly 300x300, a logo, and a very few layout images. We saved a fortune in bandwidth by compressing the pictures just a very little bit more. Not a lot. Just enough to make a difference.

        Consider taking 100,000,000 hits in a day. Bringing a 15KB image to 14KB would be .... wait for it .... 100GB per day saved in transfers.

        The same can be said for conserving the size of the page itself. Badly written pages (and oh are there a lot of them out there) not only take up more bandwidth because they have a lot of crap code in them, but they also tend to take longer to render.

        I took one huge badly written page, stripped out the crap content (like, do you need a font tag on every word?), cleaned up the table structure (this was pre-CSS), and the page loaded much faster. That wasn't just the bandwidth savings, that was a lot of overhead on the browser where it didn't have to parse all the extra crap in it.

        I know they're talking about the inbound bandwidth (relative to the server), which is usually less than 10% of the traffic. Most of the bandwidth is wasted in the outbound bandwidth. That's all anyone really cares about. Server farms only look at outbound bandwidth, because that's always the higher number, and the driving factor of their 95th percentile. Home users all care about their download bandwidth, because that's what sucks up the most for them. Well, unless they're running P2P software. I know I was a rare (but not unique) exception, where I was frequently sending original graphics in huge formats, and ISO's to and from work.

    • Re: (Score:3, Informative)

      by Skal Tura (595728)

      it's actually not even 0.15Kb, it's 0.146kb >;)

      and 100mil hits, 1kb saved = 95.36Gb saved.

      You mixed up marketing, and in-use computer kilos, gigas etc. 1Kb !== 1000 bytes, 1Kb === 1024bytes :)

      • by JWSmythe (446288) *

            Nah, I just never converted the KB (Bytes) of file size and string size (8 bit characters are 1 byte), so I never converted it down to the Kb/s (kilobits per second) for bandwidth measurement. :)

    • by drinkypoo (153816)

      While you have a good point, your argument can be summed up as "I've already been shot, so it's okay to stab me."

      • by JWSmythe (446288) *

            Naw, it's more like, I'd rather be poked with that blunt stick than shot with a cannon. :)

    • Badly written pages (and oh are there a lot of them out there) not only take up more bandwidth because they have a lot of crap code in them, but they also tend to take longer to render.

      ebay has "upgraded" their local site http://my.ebay.com.au/> and "my ebay" is now a 1M byte download. That's ONE MILLION BYTES to show about 7K of text and about 20 x 2Kb thumbnails.

      The best bit is that the htm file itself over 1/2 Mbytes. Then there's two 150K+ js files and a 150k+ css file.

      Web "designers" should be for

    • This is a stupid exercise. Oh my gosh, there's an extra few characters wasted. They're talking about 150 characters, which would be 150 bytes, or (gasp) 0.150KB.

      Perhaps, but I'm reminded of the time when I started getting into the habit of stripping Unsubscribe footers (and unecessarily quoted Unsubscribe footers) from all the mailing lists (many high volume) that I subscribed to. During testing, I found the average mbox was reduced down in size by between 20 and 30%.

      If you accept the premise that waste is

  • but how much is that as a proportion of their total bandwidth usage, if they were worried about bandwidth im sure they could just compress the images a little more and save much more
  • by RobertB-DC (622190) * on Friday March 27, 2009 @06:12PM (#27364407) Homepage Journal

    Seriously. Long URL's as wasters of bandwidth? There's a flash animation ad running at the moment (unless you're an ad-blocking anti-capitalist), and I would expect it uses as much bandwidth when I move my mouse past it as a hundred long URL's.

    I'm not apologizing for bandwidth hogs... back in the dialup days (which are still in effect in many situations), I was a proud "member" of the Bandwidth Conservation Society [blackpearlcomputing.com], dutifully reducing my .jpgs instead of just changing the Height/Width tags. My "Wallpaper Heaven" website (RIP) pushed small tiling backgrounds over massive multi-megabyte images. But even then, I don't think a 150-character URL would have appeared on their threat radar.

    It's a drop in the bucket. There are plenty of things wrong with 150-character URLs, but bandwidth usage isn't one of them.

    • by Skal Tura (595728)

      lol, i used to run Wallpaper Haven :)

      People came to me complaining "it's not haven, it's heaven!" Ugh ... Didn't know what Haven means :D

  • If you take and type a full page (no carriage returns) into notepad and save it, you end up with 5kb per printed page at the default font/print settings. When was the last time that a web page designer cared about 5kb? If 150 bytes (yes, 150 char's) is a concern, trim back on the dancing babies and mp3 backgrounds before you get rid of the ugly url's.

    Besides, if not for those incredibly long and in need of shortening URL's, how else would we be able to feed rick astley's music video youtube link into tiny
    • Re: (Score:3, Interesting)

      by Overzeetop (214511)

      Actually, when I had my web page designed (going on 4 years ago), I specifically asked that all of the pages load in less than 10 seconds on a 56k dialup connection. That was a pretty tall order back then, but it's a good standard to try and hit. It's somewhat more critical now that there are more mobile devices accessing the web, and the vast majority of the country won't even get a sniff at 3G speeds for more than a decade. There is very little that can be added to a page with all the fancy programming we

  • What's the percentage savings? Is it enough to care or is it just another fun fact?

    Simplifying / nanoizing / consolidation javascript and reducing the number of sockets required to load a page would probably be more bang for the buck. Is it worth worry about?

  • First, absolute numbers mean nothing. It is like 200 million for this wasted federal program on the 20 I waste on coffee over equal period. Without know the percent of total, or how much it would really save, or even if the problem can be fixed. As it is, this is just random showboating, perhaps interesting from a theoretical sense if the math is correct, but given the absolutes I doubt the author can really do the math correctly.

    Second, define 'waste'. Most rational people would argue that facebook i

  • 75 MBit/s? What's that in Libraries of Congress per decifortnight?

  • it goes in cycles... you get better hardware, then you saturate it with software. Then you get better software and you saturate it with hardware.

    Currently, we can apply said metaphor with internet connections. We started with jpegs. We had low baud modems. We then moved on to moving pictures we needed to download. They upped it to cable. Now we are to the point where the demand for fiber to your house is going to be needed in most situations.

    Think how we've moved from dumb terminals to workstati
  • by Anonymous Coward on Friday March 27, 2009 @06:22PM (#27364553)

    For an even more egregious example of web design / CMS fail, take a look at the HTML on this page [theglobeandmail.com].

    $ wc wtf.html
    12480 9590 166629 wtf.html

    I'm not puzzled by the fact that it took 166 kilobytes of HTML to write 50 kilobytes of text. That's actually not too bad. What takes it from bloated into WTF-land is the fact that that page is 12,480 lines long. Moreover...

    $ vi wtf.html

    ...the first 1831 lines (!) of the page are blank. That's right, the &lt!DOCTYPE... declaration is on line 1832, following 12 kilobytes of 0x20, 0x09, and 0x0a characters - spaces, tabs, and linefeeds. Then there's some content, and then another 500 lines of tabs and spaces between each chunk of text. WTF? (Whitespace, Then Failure?)

    Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

    • ...except they aren't using mod_gzip/deflate. At first I thought you browsed the web RMS style [lwn.net] and maybe wc* didn't support compression** and you were just getting what you deserved***, but then I checked in firefox and lo and behold:


      Response Headers - http://www.theglobeandmail.com/blogs/wgtgameblog0301/ [theglobeandmail.com]

      Date: Fri, 27 Mar 2009 23:39:54 GMT
      Server: Apache
      P3P: policyref="http://www.theglobeandmail.com/w3c/p3p.xml", CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa CONi OUR NOR IND PHY ONL UNI COM NAV INT DEM STA

    • by LateArthurDent (1403947) on Friday March 27, 2009 @07:57PM (#27365769)

      ...the first 1831 lines (!) of the page are blank...Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

      Believe it or not, someone had it in mind. This is most likely a really, really stupid attempt at security by obscurity.

      PHB:My kid was showing me something on our website, and then he just clicked some buttons and the entire source code was available for him to look at. You need to do something about that.
      WebGuy:You mean the html code? Well, that actually does need to get transferred. You see, the browser does the display transformation on the client's computer...
      PHB:The source code is out intellectual property!
      WebGuy:Fine. We'll handle it. ::whispering to WebGuy #2:: Just add a bunch of empty lines. When the boss looks at it, he won't think to scroll down much before he gives up.
      PHB:Ah, I see that when I try to look at the source it now shows up blank! Good work!

    • Re: (Score:3, Interesting)

      by DavidD_CA (750156)

      Wow. Judging by the patterns that I see in the "empty" lines, it looks like their CMS tool has a bug in it that is causing some sections to overwrite (or in this case, append instead).

      I'd bet that every time they change their template, they are adding another set of "empty" lines here and there, rather than replacing them.

  • by kheldan (1460303) on Friday March 27, 2009 @06:32PM (#27364675) Journal
    Dear Customer,
    In order to maximize the web experience for all customers, effective immediately all websites with URLs in excess of 16 characters will be bandwidth throttled.

    Sincerely,
    Comcast

  • Has anyone here even looked at what the real motivation behind this study is? It's to create this idea that web hosts, are, surprisingly, wasting the valuable bandwidth provided by your friendly ISPs. Do a few stories like this over a few years, and suddenly, having Comcast charge Google for the right to appear on Comcast somehow seems fair. The bottom line is, as a consumer, its my bandwidth and I can do whatever I want with it. If I want to go to a web site that has 20,000 character URLS, then, that's where I'm headed.
  • I hope this is obvious to most people here, but reading some comments, I'm not sure, so...

    The issue is that a typical Facebook page has 150 links on it. If you can shorten *each* of those URLs in the HTML by 100 characters, that's almost 15KB you knocked off the size of that one page. Not huge, but add that up over a visit, and for each visit, and it really does add up.

    I've been paying very close attention to URL length on all of my sites for years, for just this reason.

  • Better idea (Score:5, Funny)

    by Anonymous Coward on Friday March 27, 2009 @06:43PM (#27364843)

    Just use a smaller font for the URL!

  • No (Score:5, Insightful)

    by kpang (860416) on Friday March 27, 2009 @06:52PM (#27364969) Homepage
    Are Long URLs Wasting Bandwidth?

    No. But this article is.
  • by hrbrmstr (324215) * on Friday March 27, 2009 @06:53PM (#27364981) Homepage Journal

    Isn't Facebook itself the huge waste of bandwidth as opposed to just the verbose URLs it generates?

  • Outside of meaningless SEO strings, a lot of the data in the URL would have to be transmitted otherwise, in cookie or POST data
  • by Bill Dimm (463823) on Friday March 27, 2009 @07:57PM (#27365777) Homepage

    The problem isn't bandwidth, it is that long URLs are a pain from a usability standpoint. They cause problems in any context where they are spelled out in plain text (instead of being hidden as a link). For example, they often get broken in two when sent in plain text email. When posting a URL into a simple forum that only accepts text (no markup), a long URL can blow-out the width of the page.

    Where does this problem come from? It comes from SEO. Website operators realized that Google and other search engines were taking URLs into account, so CMSs and websites switched from using simple URLs (like a numeric document ID) to stuffing whole article titles into the URL to try to boost search rankings. One of the results of this is that when someone finds a typo in an article title and fixes it, the CMS either creates a duplicate page with a slightly different URL, or the URL with the typo ends up giving a 404 error and breaks any links that point to it.

    What I don't understand is why search engines bother to look at anything beyond the domain name when determining how to rank search results. How often do you see anything useful in the URL that isn't also in the <title> tag or in a <h1> tag? If search engines would stop using URLs as a factor in ranking pages, people would use URLs that were efficient and useful instead of filling them with junk. The whole thing reminds me of <meta> keyword tags -- to the extent that users don't often look at URLs while search engines do, website operators have an opportunity to manipulate the search engines by stuffing them with junk.

  • Acrnym (Score:3, Funny)

    by noppy (1406485) on Saturday March 28, 2009 @03:54AM (#27368525)

    To fthr sav our bdwdth, ^A txt shld be cmprsd into acrnyms!

Whenever a system becomes completely defined, some damn fool discovers something which either abolishes the system or expands it beyond recognition.

Working...