Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet Technology

Bufferbloat — the Submarine That's Sinking the Net 525

gottabeme writes "Jim Gettys, one of the original X Window System developers and editor of the HTTP/1.1 spec, has posted a series of articles on his blog detailing his research on the relatively unknown problem of bufferbloat. Bufferbloat is affecting the entire Internet, slowly worsening as RAM prices drop and buffers enlarge, and is causing latency and jitter to spike, especially for home broadband users. Unchecked, this problem may continue to deteriorate the usability of interactive applications like VOIP and gaming, and being so widespread, will take years of engineering and education efforts to resolve. Being like 'frogs in heating water,' few people are even aware of the problem. Can bufferbloat be fixed before the Internet and 3G networks become nearly unusable for interactive apps?"
This discussion has been archived. No new comments can be posted.

Bufferbloat — the Submarine That's Sinking the Net

Comments Filter:
  • by Anonymous Coward on Friday January 07, 2011 @08:07AM (#34789798)

    http://en.wikipedia.org/wiki/X_Window_System

  • Really? (Score:2, Insightful)

    by Anonymous Coward

    Latency is bad? Bigger buffers = more latency?

    • Re: (Score:3, Informative)

      by Anonymous Coward
      I've no idea if this post explains it correctly or not [slashdot.org] (one of the replies implies that it doesn't), but if it does, it should be nearer the top of the page, hence my posting it here. :-)
    • by perpenso ( 1613749 ) on Friday January 07, 2011 @10:09AM (#34790950)

      Latency is bad? Bigger buffers = more latency?

      Buffers increasing latency is not exactly a new phenomena. Its been observed and taken into design considerations for quite some time. For example back-in-the-day serial chips essentially had a buffer of one byte. The CPU fed data one byte at a time as the buffer became available and latency was pretty low since data was immediately transmitted. As more capable serial chips became available larger buffers were introduced. A newer chip may have a larger buffer but it may also not transmit data as soon as it has a single byte. It was common to have two programmable thresholds to begin a data transmission, (1) when a certain amount of data has accumulated in the buffer or (2) when a certain amount of time has elapsed. So if a "packet" to transmit was small enough it may sit in the buffer until (2), hence more latency with larger buffers. Software that cared generally began to issue flush commands to cause anything in the buffer to be sent immediately.

      Network cards and/or the operating system may try to similarly accumulate data before transmitting a packet.

      • Ring buffers in serial ports are not quite the same thing. With a serial port, once the ring bugger had filled (i.e. inut pointer == output pointer) the sourcing program would either be deschedule, pause a time or loop until there was space in the buffer to put the next byte of data. Nothing was lost.

        With network buffers, what JG seems to be saying is that this does not happen. As packets arrive at whatever the choke point is in the circuit, there is no method for telling the sender to stop sending - the

  • Definition, please (Score:5, Insightful)

    by Megane ( 129182 ) on Friday January 07, 2011 @08:10AM (#34789810)

    I'm so glad the term has been defined so that I know what the hell we're talking about here. Oh wait, no it hasn't.

    Okay, then I'll RTFA. Oh wait, two screens worth of text later and it still hasn't.

    I'd like to change the topic now to the submarine that's sinking the English language: jargonbloat.

    • by mangu ( 126918 ) on Friday January 07, 2011 @08:14AM (#34789854)

      Just start RTFAing: "In my last post I outlined the general bufferbloat problem."

      Follow the link:

      "Each of these initial experiments were been designed to clearly demonstrate a now very common problem: excessive buffering in a network path. I call this bufferbloat

      • by elrous0 ( 869638 ) *

        I hate it when someone feels the need to come up with a piece of meaningless jargon when "excessive packet buffering" would have been much more descriptive and required less explanation.

        • by Yvanhoe ( 564877 )
          actually, buffer bloat is a good enough definition of bufferbloat. Demands for definition are a bit pompous...
          • by GooberToo ( 74388 ) on Friday January 07, 2011 @09:34AM (#34790550)

            Demands for definition are a bit pompous...

            A bit?

            Even more pompous is making a post about it when everyone can clearly see, "bufferbloat" is shorter than constantly saying something tedious like, "excessive packet buffering in the entirety of a network path."

            Perhaps this will help the uninitiated. The article describes a wide problem of excessive packet buffering in the entirety of a network path, which has been dubbed, "bufferbloat."

          • by ls671 ( 1122017 ) *

            A better definition could be:

            "A user saturating its broadband connection by transferring 20GB files and not taking care of using the --bwlimit (limit bandwidth) option with rsync"

            I have been using it for ages to prevent this very problem.

            Other types of traffic shaping can be done, with Linux tc as an example, but it is always best to do it at the application level when possible.

    • by Megane ( 129182 ) on Friday January 07, 2011 @08:15AM (#34789856)

      For what it's worth, TFS seems to be linking into the middle of the story, so maybe that's part of my problem. Still, it's really annoying to be told about this new problem with new jargon word, that's going to make the sky fall any day now, without knowing just what the hell it is.

      The previous article seems to explain things a little better: http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/ [wordpress.com]

      • Re: (Score:3, Interesting)

        Comment removed based on user account deletion
        • How much bandwidth can I have, though? Take the link between my desktop and a Slashdot server; is the correct answer "1GBit/s, no more" (speed of my network card)? Is is "20MBit/s, no more" (speed of my current Internet connection)? Is it "0.5MBit/s, no more" (my fair share of this office's Internet connection)? In practice, you need the answer to change rapidly, depending on network conditions - maybe I can have the full 20MBit/s if no-one else is using the Internet, maybe I should slow down briefly while someone else handles their e-mail.

          TCP doesn't slam the network; it starts off slowly (TCP slow start currently sends just two packets initially), and gradually ramps up as it finds that packets aren't dropped. When packet drop happens, it realises that it's pushing too hard, and drops back. If there's been no packet drop for a while, it goes back to trying to ramp up. RFC 5681 [ietf.org] talks about the gory details. It's possible (bar idiots with firewalls that block it) to use ECN (explicit congestion notification) [ietf.org] instead of packet drop to indicate congestion, but the presence of people who think that ECN-enabled packets should be dropped (regardless of whether congestion has happened) means that you can't implement ECN on the wider Internet.

          This works well in practice, given sane buffers; it dynamically shares the link bandwidth, without overflowing it. Bufferbloat destroys this, because TCP no longer gets the feedback it expects until the latency is immense. As a result, instead of sending typically 20MBit/s (assuming I'm the only user of the connection), and occasionally trying 20.01MBit/s, my TCP stack tries 20.01MBit/s, finds it works (thanks to the queue), speeds up to 20.10MBit/s, and still no failure, until it's trying to send at (say) 25MBit/s over a 20MBit/s bottleneck. Then packet loss kicks in, and brings it back down to 20MBit/s, but now the link latency is 5 seconds, not 5 milliseconds.

        • by myrdos2 ( 989497 ) on Friday January 07, 2011 @01:07PM (#34793822)

          Solutions which require the internet's infrastructure to be replaced (all the routers and switches and so forth) have been proposed for many years, and never go anywhere. The only one I'm aware of is IPv6, and look how slowly that beast has taken off. That said, TCP sawtooth isn't as bad as you make it out to be - in most cases. Whenever a packet is dropped, the TCP connection drops its speed to around half, then gradually ramps up to where it was previously. You don't get 100% of your bandwidth utilization, but you do get to automatically adjust to changing network conditions. And as the number of TCP connections over one pipe increases, you get closer and closer to max utilization rates.

          TCP fails when:
          -competing against UDP, which has no congestion control and will clog a line even if every UDP packet is dropped
          -there is interference in the line causing packet corruption, which TCP interprets as congestion
          -competing against Microsoft products, which have TCP stacks that are tweaked to grab more than their fair share of the bandwidth

          My understanding is that TCP congestion control generally isn't applied to backbones - I believe that ISPs throttle your traffic before sending it over an optic link so as not to overbook its capacity. You're probably just competing with your household, and possibly people on your block - can someone verify this?

    • Re: (Score:2, Insightful)

      by shiftless ( 410350 )

      Yeah, I see this a lot with nerds. It's pretty fucking annoying when someone launches in a long winded dissertation on some obscure subject, without even bothering to put an introductory paragraph at the top giving even the briefest overview of what the fuck they're even talking about. I shouldn't have to read fifteen paragraphs just to get a basic birds-eye view of what the problem is, a framework which I can then proceed to fill in by reading into the details.

      • by drooling-dog ( 189103 ) on Friday January 07, 2011 @08:29AM (#34789948)

        They know something you don't, they want you to know it, and they want to keep it that way for as long as possible...

      • by mcgrew ( 92797 ) * on Friday January 07, 2011 @08:45AM (#34790102) Homepage Journal

        There are two reasons I can think of why people write like that. One is they're poor communicators, the second is they want to appear intelligent.

        It seems there are two kinds of stories posted here lately -- science and tech stories written for the non-nerd by non-nerds like one last week that explained what a CPU was (!), and stories like this that coin new jargon and don't explain it, or use an acronym that most folks here will misunderstand, like using BT when referring to Britich Telecom when most of us think of BitTorrent when we see BT.

        Maybe I'm just getting old.

        • In Canada references to the Bank of Canada in news stories have lately been abbreviated to BOC. When I read "BOC to raise interest rates" I always wonder why the Blue Oyster Cult is doing that.
          • In Canada references to the Bank of Canada in news stories have lately been abbreviated to BOC.

            That's because unlike "Federal Reserve" and "Federal Express", "Bank of Canada" doesn't have a snappy, pronounceable contraction (Fed and FedEx respectively).

            When I read "BOC to raise interest rates" I always wonder why the Blue Oyster Cult is doing that.

            No, that'd be "BÖC to raise interest rates". BÖC was probably the first rock band to incorporate a gratuitous diaeresis [wikipedia.org] in its name. The root problem here is that BOC's [wikipedia.org] dis am bigger than yours [wikipedia.org].

      • There's a link to the definition in the first four words of the article. Do you want every peice of writing to repeat the definitions of every term it uses?

      • by GooberToo ( 74388 ) on Friday January 07, 2011 @09:42AM (#34790626)

        Yeah, I see this a lot with nerds. It's pretty fucking annoying when someone launches in a long winded dissertation on some obscure subject, without even bothering to put an introductory paragraph at the top giving even the briefest overview of what the fuck they're even talking about.

        Its a series of blog articles. He presumes you've been following his series of articles whereby he introduces the topic and experimentally validates his assertions. If you didn't get the introduction, blame your own laziness or the failure of the poster to also provide a link to the first blog post in the series.

        Basically you're complaining because you jumped to the middle of a book and then bitched that the chapter you started reading doesn't have an introduction. Most people will wonder what the hell is wrong with you. To then attack the author for other's failings is bizarre to say the least. And all this ignores that blogs are frequently written to be familiar and causal reading; which also entirely invalidates your general tone.

      • by Idbar ( 1034346 )
        Perhaps, the wikipedia article about Random Early Detection [wikipedia.org], may help to understand the issue. RED (proposed by Floyd and Jacobson, the latter cited in the article posted) was proposed in 1983 to overcome this issue, along with Explicit Congestion Notification (a mechanism to mark packets instead of dropping them). ECN was implemented only until Windows Vista (and wasn't enabled by default), which made complicated to actually take advantage of such schemes.

        Many mechanisms have been proposed (Even I'm pro
    • by Nursie ( 632944 )

      He's written a whole series on this over the course of months, if he doesn't explain it a long way into the series then blame the slashdot summary, not the guy doing the research/testing and telling the world about it.

    • Not having read or ever heard of this term before my take on the word is that because people have so much memory and streamed apps can have such huge buffers and are so popular you constantly have a few people pegging their network connection to initially fill their buffer even though they don't have to, needlessly congesting the network and possibly impairing those who need low latency interactive connections.

      But that's just a guess.
    • by jg ( 16880 ) on Friday January 07, 2011 @09:10AM (#34790318) Homepage

      You asked, I just provided:

      http://gettys.wordpress.com/what-is-bufferbloat-anyway/

      Good question.

      Bufferbloat is the cause of much of the poor performance and human pain using today’s internet. It can be the cause of a form of congestion collapse of networks, though with slightly different symptoms than that of the 1986 NSFnet collapse. There have been arguments over the best terminology for the phenomena. Since that discussion reached no consensus on terminology, I invented a term that might best convey the sense of the problem. For the English language purists out there, formally, you are correct that “buffer bloat” or “buffer-bloat” would be more appropriate.

      I’ll take a stab at a formal definition:

      Bufferbloat is existence of excessively large (bloated) buffers into systems, particularly network communication systems.

      Systems suffering from bufferbloat will have bad latency under load under some or all circumstances, depending on if and where the bottleneck in the communication’s path exists. Bufferbloat encourages congestion of networks; bufferbloat destroys congestion avoidance in transport protocols such as HTTP, TCP, Bittorrent, etc. Without active queue management, these bloated buffers will fill, and stay full.

      More subtlety, poor latency, besides being painful to users, can cause complete failure of applications and/or networks, and extremely aggravated people suffering with them.

      Bufferbloat is seldom detected during the design and implementations of systems as engineers are methodical people, seldom if ever test latency under load systematically, and today’s memory is so cheap buffers are often added without thought of the consequences, where it can be hidden in many different parts of network systems.

      You see manifestations of bufferbloat today in your operating systems, your home network, your broadband connections, possibly your ISP’s and corporate networks, at busy conference wireless networks, and on 3G networks.

      Bufferbloat is a mistake we’ve all made together.

      We’re all Bozos on This Bus.

    • by Nefarious Wheel ( 628136 ) on Friday January 07, 2011 @09:19AM (#34790398) Journal
      The Evil Buffer Stuffer Strikes Again!
  • Name wrong (Score:3, Informative)

    by ebcdic ( 39948 ) on Friday January 07, 2011 @08:11AM (#34789816)

    He's Jim Gettys, not Getty.

  • by cerberusss ( 660701 ) on Friday January 07, 2011 @08:13AM (#34789836) Journal

    Jim Getty, one of the original X Window System developers and editor of the HTTP/1.1 spec

    I'd murder four people just to have TTY in my name. Five if I could capitalize them, and postfix with a number. I'd name my son Dev.

    You'd get a business card with something like Dev GeTTY1, Armadillo Avenue 64, Seattle, Washington

  • If we all switched to ATM (Asynchronous transfer mode [wikipedia.org]), would this issue be fixed, regardless of the size of RAM available at the endpoints? Yes, yes I realize that this would be utterly impractical; my question is theoretical.

    • If we all switched to ATM, I'd find you in your sleep and murder you.

      TBH though.. MPLS sorta tries to split the difference in the 'good' ways. Especially if you drink the Kool Aid (tm) and have the budget to spend on rolling it out.

  • Comment removed (Score:5, Insightful)

    by account_deleted ( 4530225 ) on Friday January 07, 2011 @08:18AM (#34789874)
    Comment removed based on user account deletion
    • by Aladrin ( 926209 )

      I agree. I was having the same issues on a much smaller connection until I set up QOS. Now, I never have issues with any of my stuff.

    • by Nursie ( 632944 )

      There's much more to it than that - the connection gets maxed out too easily, or it maxes out way below where it should, the reason being that too much is buffered. Too much buffering = lots of latency = TCP/IP latency and bandwidth calculations go out the window and you can't get the transfer speed you ought to.

      Or so I understand it.

      • by thijsh ( 910751 )
        Not quite the whole story. In peak hours when your 20 Mbit ADSL drops to 2 Mbit speeds it's simply because they oversold the line that much... The problem isn't buffering but trying to use 1000% of the bandwidth, that is never going to work smoothly. Buffers don't really change it because you have a choice for either large buffers + higher latencies or small buffers + more dropped packets, your throughput will suffer about the same...

        In my experience most software will handle some higher latencies just fi
    • by vadim_t ( 324782 ) on Friday January 07, 2011 @08:42AM (#34790076) Homepage

      Several issues:

      1. People who aren't networking engineers don't know about QoS, or don't know/want to know how to configure it.
      2. QoS used that way is a hack to work around an issue that doesn't have to be there in the first place
      3. How do you determine the maximum throughput? It's not necessarily the official line's speed. The nice thing about TCP is that it's supposed to figure out on its own how much bandwidth there is. You're proposing a regression to having to tell the system by hand.
      4. QoS is most effective on stuff you're sending, but in the current consumer-oriented internet most people download a lot more than they upload.

      • by Dunbal ( 464142 ) * on Friday January 07, 2011 @09:04AM (#34790258)

        but in the current consumer-oriented internet most people download a lot more than they upload.

              Because the current consumer infrastructure forces it onto you. I would happily seed my torrents all year long, except I only have 1/12th the uploading bandwidth as I have for downloading. Since I need some of it for other things, uploading becomes impractical.

              It's easy to blame the consumer, but there's a certain model imposed on him from the start.

      • It will be a hack (Score:5, Insightful)

        by dachshund ( 300733 ) on Friday January 07, 2011 @10:02AM (#34790836)

        2. QoS used that way is a hack to work around an issue that doesn't have to be there in the first place
        3. How do you determine the maximum throughput? It's not necessarily the official line's speed. The nice thing about TCP is that it's supposed to figure out on its own how much bandwidth there is. You're proposing a regression to having to tell the system by hand.
        4. QoS is most effective on stuff you're sending, but in the current consumer-oriented internet most people download a lot more than they upload.

        While the Internet in-theory is beautiful, our modern implementation really is a series of layered hacks. And the solution to Bufferbloat is going to be another hack. You're crazy if you think that the solution to the Bufferbloat 'problem' is going to be some fundamental redesign of the TCP protocol (how would you force 10 people to use it?), or the total re-architecture of millions of consumer devices to remove buffering. You're also crazy if you think the ISPs and backbone providers are going to stand by while this thing kills the Internet.

        So the question is: which hack will it be? The GP poster already identified one that works well enough --- using QoS to control flows. Your final objection about content providers stressing connections is the real one. But there's some probably a good hack to deal with it --- or more likely a series of hacks, some at the content providers themselves (e.g., Netflix), some in the backbone, and some at your ISP. It won't be elegant, but it will keep this problem from ever becoming anything more than a few cranky blog posts.

    • The problem is that maxing your connection from one site is causing everything else you do on your connection to be delayed / dropped as well, because it ends up queued behind anything that got buffered mid-transit from the first site. With a smaller buffer the large transfer would start to drop packets and back off sooner, allowing packets from other sources to "hop the queue".

      • by TheThiefMaster ( 992038 ) on Friday January 07, 2011 @08:54AM (#34790168)

        As an extreme example, say you request a 1GB file from a download site. That site has a monster internet connection, and manages to transmit the entire file in 1 second. The file makes it to the ISP at that speed, who then buffers the packets for slow transmission over your ADSL link, which will take 1 hour. During that time you try to browse the web, and your PC tries to do a dns lookup. The request goes out ok, but the response gets added to the buffer on the ISP side of your internet connection, so you won't get it until your original transfer completes. How's 1 hour for latency?

        The situation is only not that bad because:
        A: Most download sites serve so many people at once and/or rate limit so they won't saturate most peoples' connections
        B: Most buffers in network hardware are still quite small

        • Mod parent up. Half way down the comments, and this is the first post that actually explains why 'bufferbloat' is something I should care about.
        • This is an excellent explanation of what issues are happening here. I can clearly see that this is an issue, and the problem is something that over time will impact everybody.

          The problem is really focused on trying to deal with differences in bandwidth between computers... always a problem but in this case trying to match up slow connections with fast connections is particularly difficult. Since memory is cheap, a 1 GB buffer certainly can be found in some devices now and perhaps much more. I don't see this example as being really too far off the mark in the near future.... which is the point being raised and why buffer bloat is such a big deal.

          More to the point, some of the complaints that triggered the "quality of service" debate are rooted in this problem. As mentioned in the original article triggering this whole slashdot thread, setting up "quality of service" priorities only creates multiple buffer queues.... it doesn't solve the problem of the monster queue to begin with. That is why the author of the blog post suggests that the debate over network neutrality is not based upon the real problem that is facing network engineering and why it is a political solution in search of a problem.

          It takes awhile to "grok" this problem, but once you do it becomes obvious why this is such a huge deal.

    • by suv4x4 ( 956391 ) on Friday January 07, 2011 @08:49AM (#34790128)

      Really, what's the problem here?

      You really don't see the problem? How can you be so naive. Maybe you're new to this. All signs show to the fact there is a problem.

      Of course the problem is not obvious. The article itself says it'll completely surprise us. They know we won't believe it at first. But that's why we must believe it, or else it's Armageddon.

      Would you risk an Armageddon, because of your inability to understand and see?

      And that's, in short, why we must attack Iraq.

      Wait, what were we talking about :P?

      • by Dunbal ( 464142 ) *

        It's kinda like the Fed printing another 600 billion and refusing to raise interest rates, while at the same time saying everything is fine and the economy is improving.

    • Yes, he could. What about all the non-technical users?
    • by bytesex ( 112972 )

      It's not about *you* buffering - it's about the machine in the middle buffering. When that machine buffers instead of drops, your TCP connection will never become aware that it has to play nice and lower its transmission window.

    • "that's what happens when your pipe is full and the packets have to wait in the queue to be transmitted."

      And his point is that said queue is so excessively long, it's screwing up TCP's congestion avoidance. Those queues mean delay. Serious delay.
  • I knew it wasn't my age thats making me worse at online games, its all the jittering and latency caused by bufferbloat!
  • by CFD339 ( 795926 ) <andrewp@theno r t h .com> on Friday January 07, 2011 @08:31AM (#34789966) Homepage Journal

    RAM is cheap.
    High speed uplink is not cheap.
    Peering agreements are manipulative, expensive, and sometimes extortionate.

    So...

    The poorly designed, poorly peered, under allocated back haul links can't handle the traffic that routers want to push through them -- but since RAM is cheap, operators just add RAM to the buffers so that when those back-haul lines slow down for a second the packets can get pushed through.

    And we're blaming the buffer for the problem?

    • by phantomcircuit ( 938963 ) on Friday January 07, 2011 @08:55AM (#34790174) Homepage
      TCP assumes that packets will do dropped when there is congestion, if they aren't the congestion control algorithms fail (hard).
    • Yes, because the buffer is causing false information to propagate through the network. The design of TCP is predicated on peers discovering that there's congestion between them as fast as possible, and negotiating a sensible transfer rate. It's not that all buffering is bad, but that it has diminishing returns, and eventually actually makes the situation worse.
    • by jg ( 16880 )

      Your Linux (and Mac and Windows) laptops also suffer from bufferbloat.

      As does your home router and 802.11 network.

      As do some ISP's.

      Don't think the problem is just in your broadband. It's all over.

    • by gl4ss ( 559668 )

      the real interesting bit would be what would the internet be without those buffers. packetloss at 80%?

      • by compro01 ( 777531 ) on Friday January 07, 2011 @10:44AM (#34791486)

        Yes, lack of buffers would be bad, as even a trivial delay would result in a packet getting dropped. Oversized buffers are also bad, as they simply delay the packet getting dropped, preventing congestion control from reacting in a timely manner. The buffers need to be sized appropriately relative to the link speed and typical latency.

      • by rickb928 ( 945187 ) on Friday January 07, 2011 @12:01PM (#34792732) Homepage Journal

        No.

        Packet losses would be handled by adjusting to the conditions.

        Look at the trace Gettys posted in the referenced article. Lots of dup packets. Get rid of those, and there's some bandwidth that can be *used*. And allowing TCP to adjust to prevailing conditions should result in less packet loss. It might seem to be less bandwidth also, but we may be in a vicious circle of increasing bandwidth to solve a problem that is NOT bandwidth. Packet loss by itself is a symptom, not a problem.

    • by complete loony ( 663508 ) <Jeremy.Lakeman@NoSPAM.gmail.com> on Friday January 07, 2011 @10:26AM (#34791208)
      I was sharing a connection with a friend once who was throttling my upload bandwidth in an attempt at fairness. Trying to run something like bittorrent would fill all the buffers in my PC, his router and his modem, adding 1.5 seconds of latency to the link (I used to ping the host on the other end of the modem to confirm it).
  • by bmajik ( 96670 ) <matt@mattevans.org> on Friday January 07, 2011 @08:46AM (#34790110) Homepage Journal

    What Jim is saying is that TCP flows try to train themselves to the dynamically available bandwidth, such that there is a minimum of dropped packets, retransmits, etc.

    But in order for TCP to do this, packets must be dropped _fast_.

    When TCP was designed, the assumptions about the price of ram (and thus, the amount of onboard memory in all the devices in the virtual circuit) were different -- namely, buffers were going to be smaller, fill up faster, and send "i'm full" messages backwards much sooner.

    What the experimentation has determined is that many network devices will buffer 1 megabyte or MORE of traffic before finally dropping something and telling the tcp originator to slow down. And yet with a 1 meg buffer and a rate of 1 megabyte per second.. it will take 1 second simply to drain the buffer.

    The pervasive presence of large buffers all along the tcp vc, and the non-speified or tail-drop drop behavior of these large queues means that tcp's ability to rate limit is effectively nullified, and in situations where the link is highly utilized, many degenerate behaviors occur, such that the overall link has extremely high latency, and that bulk traffic will cause interesting traffic to be randomly dropped.

    Personally, I used pf/altQ on openBSD to try and manage this somewhat.. but its a dicey business.

    • Yeah, that is how I read it. The presence of large buffers causes the 'controlling protocols' to go haywire, thus network transfer efficiency hurtles out of the window.

  • by wiredog ( 43288 ) on Friday January 07, 2011 @09:05AM (#34790270) Journal

    If you put a frog in a pot of water and slowly raise the temperature it will try to jump out before the water reaches a temperature that is fatal to the frog.

  • ...chime in please. It seems like the solution to this is potentially all user-side, and controllable? Adjust the buffers in your devices if you can, or perhaps find a way to reduce the TCP buffer in your modern operating system?

    • It's not, really. There are many buffers involved. Some of them will be in the network infrastructure - routers, firewalls - that users have no control over. A lot more are in devices they control, but that don't allow configuration of such low-level parameters. Cable modems, home routers, access points.
    • nope. The problem can be in the server on the ISP side. The problem appears where a big pipe (1 Gbit for example) is poured into a small pipe (1 Mbit for example) see TheThiefMaster

      As an extreme example, say you request a 1GB file from a download site. That site has a monster internet connection, and manages to transmit the entire file in 1 second. The file makes it to the ISP at that speed, who then buffers the packets for slow transmission over your ADSL link, which will take 1 hour. During that time you try to browse the web, and your PC tries to do a dns lookup. The request goes out ok, but the response gets added to the buffer on the ISP side of your internet connection, so you won't get it until your original transfer completes. How's 1 hour for latency?

      The situation is only not that bad because:
      A: Most download sites serve so many people at once and/or rate limit so they won't saturate most peoples' connections
      B: Most buffers in network hardware are still quite small

  • by jav1231 ( 539129 )
    I won't discount the buffer problem because I just don't know. But the single biggest contributor to "latency" when I visit a webpage is connectivity to ad sourcers. I can click to go to a site and stare at a blank screen while my status bar flickers with: "Transferring data from ads.that.you.could.care.less.about.com."
  • by ei4anb ( 625481 ) on Friday January 07, 2011 @09:32AM (#34790522)
    The issue is that many IP stacks do not handle ECN (Explicit Congestion Notification) and only know when the link is saturated by packet loss. Huge buffers hide the problem. A solution is to use ECN, that's what it's for http://en.wikipedia.org/wiki/Explicit_Congestion_Notification [wikipedia.org]
  • by ebrandsberg ( 75344 ) on Friday January 07, 2011 @10:07AM (#34790918)

    let me summarize the problem that is being observed: On a given interface, if you have more buffer memory than is needed as packet buffer on the transmit side, it can induce latency. As an example, consider a 1Mb/s link. If you want to have a peak of .1s latency added by buffering at high load, then you want 1Mb*.1=12,500 bytes of buffer. If you have 1MB of buffer, then you have 8 seconds of buffer, therefore triggering the "buffer bloat" issue. Part of the problem is that buffer size would be set based on the top speed a piece of hardware could drive, i.e. if you want a 1Gb/s interface to be able to buffer .1s, then you use it at 100Mb/s, then it has 1s worth of buffer. In most home deployments where you have a router that may have a 1Gbps upstream, maybe 4 100Mb/s physical connections, and a 54Mbps wireless router, you probably have a shared buffer for all the interfaces. The result of this is that when using the 54Mb/s wireless, you can easily have the buffer over saturated, while the buffer size may be just right for the 100Mb/s interfaces.

    What is the solution to this? Realistically, the alternative is to drop packets that have resided in the buffer longer than a configured amount of time, which causes it's own performance issues. Net result: TCP would slowdown for a period of time, but would speed up again resulting in a sawtooth behavior. This would result in periodic issues with other protocols as well, i.e. VOIP would have dropped packets every time TCP ramps up again, etc.

    Solution: Don't download porn when you are trying to do VOIP calls.

  • by iwbcman ( 603788 ) on Friday January 07, 2011 @10:15AM (#34791024) Homepage

    I discovered this series of blog posts about 2 months ago, when he accidentally published one of his blog posts prematurely. I started reading it and followed the links and saw that this was a like a sleuth tale-if I had started reading this with his very first blog on the topic I would have had no idea where he was going with this. Now as to why this contribution by Jim Gettys does the world a great service:
    • Gettys is not pointing fingers at someone. The problem he is describing is truly vast, and involves lots of different people in different industries(router manufactures, ISP's, kernel driver authors, carrier grade network manufactures, etc.) with, presumably, a myriad of different intentions. The problem has been building over a long time-this didn't start yesterday, and won't be solved in a short time span, without a concerted effort on the part of everyone involved in all of these divergent industries, who often have quite divergent interests.
    • This approach that Gettys takes allows him to describe a problem which confronts everyone. By taking the high road and not pointing fingers he is able address an issue in such a way that a lot of the people who did contribute to this problem can recognize what they have done and own it, without being labelled, accused or feeling attacked. This should be a lesson to anyone who really wants to redress an issue that effects everyone.
    • Gettys develops this theme over many, many blog posts. It makes for some of the best internet reading I have experienced in years. Things only gradually become clearer-not merely what the problem is, but also all of the issues involved in it. I can read away in the internet for months at a time and not learn as much as I did by reading this series of posts.
    • Gettys knows what he is talking about. He developed this theme by talking with lots of experts -engineers at the ISP, people who played a pivotal role in the creation of the www and network specialists. He himself is not a network specialist, but he went out and met with people to discuss his findings and took clues and information from these exchanges to inform him and his quest to find out what was going on.
    • The series is short on answers. It may prove frustrating to many that he offers so little in the way of solutions to this problem. But this this due to the fact that the problem cannot be resolved by you, the end user. To solve this problem means rearchitecting countless millions of devices and altering hundreds of thousands of lines of code in multiple OS's.
    • Failure to redress this problem means that every effort to decrease latency by upping available bandwidth or upgrading network infrastructure will fail to deliver. If packets are not dropped fast, due to excessive buffering, the negotiation process fails, which invariably means congestion, which means latency-only something that addresses this issue has any chance of actually effecting change. Saying that this problem is just an issue already solved by QOS show that you don't understand the problem.
    • One of the first thoughts I had reading this was: if the techs on wallstreet read this article they will inevitably exploit this issue to win precious milliseconds on the stock exchange-ring a bell?
    • Any ISP could exploit this issue to offer a relative market advantage. Sadly when resolving an issue is in everybody's interest, market players will exploit the issue for their own relative gain. Getting everyone to actually tackle this is going to a gargantuan task.

    Hats of to Jim Gettys. Thanks for your service.

  • by rocker_wannabe ( 673157 ) on Friday January 07, 2011 @11:50AM (#34792530)

    TCP contains some of the most incredible heuristic algorithms I've ever seen. Each algorithm, like Slow Start, RTT Estimation, SACK, etc. are relatively simple but together they work incredibly well at keeping data flowing across heterogeneous networks. They work so well that I've seen TCP overcome broken ethernet drivers and make them appear to work. Unfortunately, as someone who use to look at TCP traces for a living, I can tell you it can be really hard to work backwards from packet traces to figure out what is going on in the TCP/IP stack because there can be so much going on at the same time. This means that Wireshark in the hands of a weekend-hacker can easily lead to erroneous conclusions. If you follow this link [lartc.org] and go to section 14.5 Random Early Detection (RED) you can see that the issue is already known and there are already solutions to mitigate the problem.

    Relax and take a deep breath. Now you can move on to something more important......... like where you're going to spend your eternity

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...