Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Networking IT

Are Long URLs Wasting Bandwidth? 379

Ryan McAdams writes "Popular websites, such as Facebook, are wasting as much as 75MBit/sec of bandwidth due to excessively long URLs. According to a recent article over at O3 Magazine, they took a typical Facebook home page, looked at the traffic statistics from compete.com, and figured out the bandwidth savings if Facebook switched from using URL paths which, in some cases, run over 150 characters in length, to shorter ones. It looks at the impact on service providers, with the wasted bandwidth used by the subsequent GET requests for these excessively long URLs. Facebook is just one example; many other sites have similar problems, as well as CMS products such as Word Press. It's an interesting approach to web optimization for high traffic sites."
This discussion has been archived. No new comments can be posted.

Are Long URLs Wasting Bandwidth?

Comments Filter:
  • Waste of effort (Score:5, Interesting)

    by El_Muerte_TDS ( 592157 ) on Friday March 27, 2009 @06:09PM (#27364339) Homepage

    Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually).
    I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls.
    But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

    ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

  • by Anaplexian ( 101542 ) on Friday March 27, 2009 @06:10PM (#27364353) Journal

    Twitter clients (including the default web interface) auto-tinyURL every URL put into it. Clicking on the link involves not one but *2* HTTP GETs and one extra roundtrip.

    How long before tinyurl (and bit.ly, ti.ny, wht.evr...) are cached across the internet, just like DNS?

  • Re:Waste of effort (Score:5, Interesting)

    by krou ( 1027572 ) on Friday March 27, 2009 @06:20PM (#27364521)

    Exactly. If they wanted to try optimize the site, they could start looking at the number of Javascript files they include (8 on the homepage alone) and the number of HTTP requests each page requires. My Facebook page has *20* files getting included alone.

    From what I can judge, a lot of their Javascript and CSS files don't seem to be getting cached on the client's machine either. They could also take a look at using CSS sprites to reduce the number of HTTP requests required by their images.

    I mean, clicking on the home button is a whopping 726KB in size (with only 145 KB coming from cache), and 167 HTTP requests! Sure, a lot seem to be getting pulled from a content delivery network, but come on, that's a bit crazy.

    Short URIs are the least of their worries.

  • by Anonymous Coward on Friday March 27, 2009 @06:22PM (#27364553)

    For an even more egregious example of web design / CMS fail, take a look at the HTML on this page [theglobeandmail.com].

    $ wc wtf.html
    12480 9590 166629 wtf.html

    I'm not puzzled by the fact that it took 166 kilobytes of HTML to write 50 kilobytes of text. That's actually not too bad. What takes it from bloated into WTF-land is the fact that that page is 12,480 lines long. Moreover...

    $ vi wtf.html

    ...the first 1831 lines (!) of the page are blank. That's right, the &lt!DOCTYPE... declaration is on line 1832, following 12 kilobytes of 0x20, 0x09, and 0x0a characters - spaces, tabs, and linefeeds. Then there's some content, and then another 500 lines of tabs and spaces between each chunk of text. WTF? (Whitespace, Then Failure?)

    Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

  • by scdeimos ( 632778 ) on Friday March 27, 2009 @06:23PM (#27364565)

    I think the O3 article and the parent have missed the real point. It's not the length of the URL's that's wasting bandwidth, it's how they're being used.

    A lot of services append useless query parameter information (like "ref=logo" etc. in the Facebook example) to the end of every hyperlink instead of using built-in HTTP functionality like the HTTP-Referer client request headers to do the same job.

    This causes proxy servers to retrieve multiple copies of the same pages unnecessarily, such as http://www.facebook.com/home.php [facebook.com] and http://www.facebook.com/home.php?ref=logo [facebook.com], wasting internet bandwidth and disk space at the same time.

  • Has anyone here even looked at what the real motivation behind this study is? It's to create this idea that web hosts, are, surprisingly, wasting the valuable bandwidth provided by your friendly ISPs. Do a few stories like this over a few years, and suddenly, having Comcast charge Google for the right to appear on Comcast somehow seems fair. The bottom line is, as a consumer, its my bandwidth and I can do whatever I want with it. If I want to go to a web site that has 20,000 character URLS, then, that's where I'm headed.
  • Re:Irrelevant (Score:4, Interesting)

    by Skal Tura ( 595728 ) on Friday March 27, 2009 @06:37PM (#27364749) Homepage

    So to calculate the bandwidth utilization we took the visits per month (1,273,0004,274) and divided it by 31. Giving us 41,064,654. We then multiplied that by 20, to give us the transfer in kilobytes per day of downstream waste, based on 20k of waste per visit. This gave us 821293080, which we then divided by 86400 which is the number of seconds in a day. This gives us 9505 kilobytes per second, but we want it in kilobits, so we multiply it by 8. Giving us 76040, finally we divide that by 1024 to give us the value in MBits/sec. Giving us 74Mbit/sec. One caveat with these calculations is that we do not factor in gzip compression. Using gzip compression, we could safely divide the bandwidth wasting figures by about 50%. Browser caching does not factor in the downstream values, as we are calculating the waste just on the HTML file. It could impact the upstream usage as not all objects maybe requested with every HTML request.

    roflmao! I should've RTFA!

    This is INSULTING! Who could eat this kind of total crap?

    Where the F is Slashdot editors?

    Those guys just decided per visit waste is 20kb? No reasoning, no nothing? Plus, they didn't see on pageviews, just visits ... Uh 1 visit = many pageviews.

    So let's do the right maths:
    41,064,654 visits
    Site like Facebook would probably have around 30 or more pageviews per visit. let's settle for 30.

    1,231,939,620 pageviews per day.

    150 average length of url. Could be compressed down to 50. 100 bytes to be saved per pageview.

    123,193,962,000 bytes of waste, 120,306,603Kb per day, or 1392Kb per sec.

    In other words:
    1392 * 8 = 11136Kbps = 10.875Mbps.

    100Mbps guaranteed costs 1300$ a month ... They are wasting a whopping 130$ a month on long urls ...

    So, the RTFA is total bullshit.

  • by MoFoQ ( 584566 ) on Friday March 27, 2009 @06:42PM (#27364823)

    because they got more requests than the number of unique things TinyURL or whatever can handle.

    Better is to use a better way of doing AJAX other than using GET....they can use POST and make sure gzip is on.

    I think if they really put their minds on it, they can also implement clientside JSON compression using some of the javascript compression libraries that are out there (or use a simple flash wrapper to do the dirty work).

    Just throw a bunch of kiddies (or 21yr olds) in a room and offer them free pizza/beer whatever.....it'll get done.

  • by Overzeetop ( 214511 ) on Friday March 27, 2009 @06:45PM (#27364867) Journal

    Actually, when I had my web page designed (going on 4 years ago), I specifically asked that all of the pages load in less than 10 seconds on a 56k dialup connection. That was a pretty tall order back then, but it's a good standard to try and hit. It's somewhat more critical now that there are more mobile devices accessing the web, and the vast majority of the country won't even get a sniff at 3G speeds for more than a decade. There is very little that can be added to a page with all the fancy programming we can put into them. Mostly, I (and my clients who need to find me) want information, and one of the best ways is simply readable text with static pictures. For the web, you can really compress the heck out of an image and still have it look crisp on a monitor.

  • Re:Compared to what? (Score:3, Interesting)

    by HeronBlademaster ( 1079477 ) <heron@xnapid.com> on Friday March 27, 2009 @06:49PM (#27364933) Homepage

    Interestingly (or maybe not), Google doesn't gzip their analytics javascript file...

  • by LateArthurDent ( 1403947 ) on Friday March 27, 2009 @07:57PM (#27365769)

    ...the first 1831 lines (!) of the page are blank...Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

    Believe it or not, someone had it in mind. This is most likely a really, really stupid attempt at security by obscurity.

    PHB:My kid was showing me something on our website, and then he just clicked some buttons and the entire source code was available for him to look at. You need to do something about that.
    WebGuy:You mean the html code? Well, that actually does need to get transferred. You see, the browser does the display transformation on the client's computer...
    PHB:The source code is out intellectual property!
    WebGuy:Fine. We'll handle it. ::whispering to WebGuy #2:: Just add a bunch of empty lines. When the boss looks at it, he won't think to scroll down much before he gives up.
    PHB:Ah, I see that when I try to look at the source it now shows up blank! Good work!

  • by DavidD_CA ( 750156 ) on Friday March 27, 2009 @08:16PM (#27365983) Homepage

    Wow. Judging by the patterns that I see in the "empty" lines, it looks like their CMS tool has a bug in it that is causing some sections to overwrite (or in this case, append instead).

    I'd bet that every time they change their template, they are adding another set of "empty" lines here and there, rather than replacing them.

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson

Working...