Bufferbloat — the Submarine That's Sinking the Net

Bufferbloat — the Submarine That's Sinking the Net 525

Posted by timothy on Friday January 07, 2011 @09:03AM from the snagged-on-the-reef-of-ram dept.

gottabeme writes "Jim Gettys, one of the original X Window System developers and editor of the HTTP/1.1 spec, has posted a series of articles on his blog detailing his research on the relatively unknown problem of bufferbloat. Bufferbloat is affecting the entire Internet, slowly worsening as RAM prices drop and buffers enlarge, and is causing latency and jitter to spike, especially for home broadband users. Unchecked, this problem may continue to deteriorate the usability of interactive applications like VOIP and gaming, and being so widespread, will take years of engineering and education efforts to resolve. Being like 'frogs in heating water,' few people are even aware of the problem. Can bufferbloat be fixed before the Internet and 3G networks become nearly unusable for interactive apps?"

Bufferbloat — the Submarine That's Sinking the Net

This discussion has been archived. No new comments can be posted.

Search 525 Comments Log In/Create an Account

Comments Filter:

Correction: JIM GETTYS (Score:4, Informative)

by Anonymous Coward writes: on Friday January 07, 2011 @09:07AM (#34789798)

http://en.wikipedia.org/wiki/X_Window_System

Name wrong (Score:3, Informative)

by ebcdic ( 39948 ) writes: on Friday January 07, 2011 @09:11AM (#34789816)

He's Jim Gettys, not Getty.

Re:Definition, please (Score:5, Informative)

by Megane ( 129182 ) writes: on Friday January 07, 2011 @09:15AM (#34789856)

For what it's worth, TFS seems to be linking into the middle of the story, so maybe that's part of my problem. Still, it's really annoying to be told about this new problem with new jargon word, that's going to make the sky fall any day now, without knowing just what the hell it is.
The previous article seems to explain things a little better: http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/ [wordpress.com]

Re:pegged connection == latency, who'd of thunk it (Score:5, Informative)

by vadim_t ( 324782 ) writes: on Friday January 07, 2011 @09:42AM (#34790076) Homepage

Several issues:
1. People who aren't networking engineers don't know about QoS, or don't know/want to know how to configure it.
2. QoS used that way is a hack to work around an issue that doesn't have to be there in the first place
3. How do you determine the maximum throughput? It's not necessarily the official line's speed. The nice thing about TCP is that it's supposed to figure out on its own how much bandwidth there is. You're proposing a regression to having to tell the system by hand.
4. QoS is most effective on stuff you're sending, but in the current consumer-oriented internet most people download a lot more than they upload.

Re:Definition, please (Score:4, Informative)

by mcgrew ( 92797 ) * writes: on Friday January 07, 2011 @09:45AM (#34790102) Homepage Journal

There are two reasons I can think of why people write like that. One is they're poor communicators, the second is they want to appear intelligent.
It seems there are two kinds of stories posted here lately -- science and tech stories written for the non-nerd by non-nerds like one last week that explained what a CPU was (!), and stories like this that coin new jargon and don't explain it, or use an acronym that most folks here will misunderstand, like using BT when referring to Britich Telecom when most of us think of BitTorrent when we see BT.
Maybe I'm just getting old.

You have have not RTFA or not UTFA.. (Score:5, Informative)

by bmajik ( 96670 ) writes: <matt@mattevans.org> on Friday January 07, 2011 @09:46AM (#34790110) Homepage Journal

What Jim is saying is that TCP flows try to train themselves to the dynamically available bandwidth, such that there is a minimum of dropped packets, retransmits, etc.
But in order for TCP to do this, packets must be dropped _fast_.
When TCP was designed, the assumptions about the price of ram (and thus, the amount of onboard memory in all the devices in the virtual circuit) were different -- namely, buffers were going to be smaller, fill up faster, and send "i'm full" messages backwards much sooner.
What the experimentation has determined is that many network devices will buffer 1 megabyte or MORE of traffic before finally dropping something and telling the tcp originator to slow down. And yet with a 1 meg buffer and a rate of 1 megabyte per second.. it will take 1 second simply to drain the buffer.
The pervasive presence of large buffers all along the tcp vc, and the non-speified or tail-drop drop behavior of these large queues means that tcp's ability to rate limit is effectively nullified, and in situations where the link is highly utilized, many degenerate behaviors occur, such that the overall link has extremely high latency, and that bulk traffic will cause interesting traffic to be randomly dropped.
Personally, I used pf/altQ on openBSD to try and manage this somewhat.. but its a dicey business.

Re:QoS (Score:4, Informative)

by Megane ( 129182 ) writes: on Friday January 07, 2011 @10:00AM (#34790220)

After reading TFSeries, the problem is excessive buffering (as in 1-10 or more seconds worth of data) screwing up TCP/IP's automatic bandwidth detection. QoS helps a little bit by getting the important packets (especially ACKs) through, but high-bandwidth TCP connections are still going nuts when they hit a slower link with excessive buffering.
And one of the major offenders is Linux commonly defaulting to a txqueuelen of 1000.

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

by Dunbal ( 464142 ) * writes: on Friday January 07, 2011 @10:04AM (#34790258)

but in the current consumer-oriented internet most people download a lot more than they upload.
Because the current consumer infrastructure forces it onto you. I would happily seed my torrents all year long, except I only have 1/12th the uploading bandwidth as I have for downloading. Since I need some of it for other things, uploading becomes impractical.
It's easy to blame the consumer, but there's a certain model imposed on him from the start.

Concerning Boiled Frogs (Score:5, Informative)

by wiredog ( 43288 ) writes: on Friday January 07, 2011 @10:05AM (#34790270) Journal

If you put a frog in a pot of water and slowly raise the temperature it will try to jump out before the water reaches a temperature that is fatal to the frog.

Re:Really? (Score:3, Informative)

by Anonymous Coward writes: on Friday January 07, 2011 @10:40AM (#34790608)

I've no idea if this post explains it correctly or not [slashdot.org] (one of the replies implies that it doesn't), but if it does, it should be nearer the top of the page, hence my posting it here. :-)

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

by bcmm ( 768152 ) writes: on Friday January 07, 2011 @10:50AM (#34790702)

That makes no sense. It doesn't matter how fat their pipe is because your computer needs to receive and ack those TCP packets. They can't just dump the file and close the connection.
OK, not on the (intentionally ridiculous) scale used in the example, but people are doing something very similar to what you describe, even though they "can't do that". http://slashdot.org/article.pl?sid=10/11/26/1729218 [slashdot.org]

Yes, buffers can introduce latency (Score:5, Informative)

by perpenso ( 1613749 ) writes: on Friday January 07, 2011 @11:09AM (#34790950)

Latency is bad? Bigger buffers = more latency?
Buffers increasing latency is not exactly a new phenomena. Its been observed and taken into design considerations for quite some time. For example back-in-the-day serial chips essentially had a buffer of one byte. The CPU fed data one byte at a time as the buffer became available and latency was pretty low since data was immediately transmitted. As more capable serial chips became available larger buffers were introduced. A newer chip may have a larger buffer but it may also not transmit data as soon as it has a single byte. It was common to have two programmable thresholds to begin a data transmission, (1) when a certain amount of data has accumulated in the buffer or (2) when a certain amount of time has elapsed. So if a "packet" to transmit was small enough it may sit in the buffer until (2), hence more latency with larger buffers. Software that cared generally began to issue flush commands to cause anything in the buffer to be sent immediately.

Network cards and/or the operating system may try to similarly accumulate data before transmitting a packet.

Re:Buffering of what? (Score:5, Informative)

by mikael ( 484 ) writes: on Friday January 07, 2011 @11:18AM (#34791080)

Within a router it would be the actual IP data packets that are being buffered. A standard router has a number of network interfaces (token ring, ethernet, wireless, ISDN, whatever....) . Each network interface is piece of hardware that is memory mapped to allow the CPU to send and receive packets. Each hardware device also has a small online memory buffer to store the most recently received or transmitted addressed data packets (every protocol layer down to the MAC source and destination address, IP address, sequence number as well as the data). Depending on system and packet size, that could be anything between 1 and 16.
The usual implementation was to have each hardware device generate an interrupt whenever some data had been received and to transfer the data from internal memory to a common pool in system RAM. The latter was divided up into pre-allocated blocks with a few large blocks (>1000 bytes) and many smaller blocks (512 bytes). Some one might have done a statistical analysis onto the theoretical distribution of the size of packet data being sent through the network. Most of the time this worked out, but there were problems that happened some times. If all the smaller blocks were in use, then the larger blocks were used instead. For efficiency, these wouldn't be transferred through the system, until all the entire block has been filled up with data, so if you have a stream of 128 byte packets, it would take eight of them before the larger block was filled. For some systems, packet sizes were enhanced to 4K or even 8K. A constant high-speed stream of small packets was most likely to do this.
Also, many of the hardware devices would simply overwrite the contents of one unprocessed data packet with the contents of the latest arrival if it wasn't collected fast enough. So that could really mess up sequence numbers.

Re:So, let me get this straight... (Score:5, Informative)

by complete loony ( 663508 ) writes: <Jeremy@Lakeman.gmail@com> on Friday January 07, 2011 @11:26AM (#34791208)

I was sharing a connection with a friend once who was throttling my upload bandwidth in an attempt at fairness. Trying to run something like bittorrent would fill all the buffers in my PC, his router and his modem, adding 1.5 seconds of latency to the link (I used to ping the host on the other end of the modem to confirm it).

Re:Definition, please (Score:5, Informative)

by davidbrit2 ( 775091 ) writes: on Friday January 07, 2011 @11:39AM (#34791402) Homepage

I'll attempt to translate.
TCP has to be able to estimate how fast* it can send data, because there's no way it can know definitively the link speed, capacity, and reliability between your system and a remote system. It does this by progressively getting faster until it starts detecting transmission problems between the two systems, at which point it backs off and slows down. Ideally, you hit a nice equilibrium at some point.
On a proper network, if some router along the path is at capacity, either internally, or along one of its outgoing paths, it should drop the packets it can't handle in a timely fashion. This seems counterintuitive at first, but remember that TCP handles the guaranteed transmission already - it will retransmit packets that didn't arrive. If the router is holding these packets in a buffer, and sending them along once the links clear up, i.e. "when it gets around to it", the packets will reach their destination with hugely inflated latency. This in turn confuses TCP, as it can't get a reliable estimate of link capacity, and the whole speed negotiation falls apart. The latency becomes wild and unpredictable as packets are sometimes buffered, sometimes not, but they always reach their destination, so TCP thinks it's sending at an acceptable rate. So now you've got all the endpoints conversing through this router that's claiming, "No problem, I can handle it!" when it really can't, and the problem just compounds itself as the router gets slammed harder and harder.
By getting timely notification of dropped packets, TCP can say, "Oh, I'm transmitting too fast for this link, time to shrink the sliding window and slow down." This both smooths out latency, and minimizes further dropped packets, not just for the two hosts involved, but for everyone else transmitting through the affected routes as well. This is how it's supposed to work, but excessive buffering of packets within routers prevents it from happening.
Moral: Dropped packets are perfectly normal and in fact required for TCP to manage its own speed and latency. Stop trying to buffer and guarantee packet delivery - TCP is handling that already.
(Disclaimer: I'm a DBA, not a network engineer. Feel free to clarify or correct anything I've mucked up.)
* "Fast" in this case means "How many packets should I send at once before stopping to wait for acknowledgment of those packets getting where they're going". "Faseter" equates to "more of them".

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

by Keramos ( 1263560 ) writes: on Friday January 07, 2011 @12:04PM (#34791772)

There is no 'bufferbloat because RAM is getting cheaper'. What he is seeing is what happens when you want to saturate your link. ... ...you get either a buffered or a dropped packet.
Yes, and if a link is saturated, there should be packet drops, which TCP senses, then automatically throttles back to reduce the required bandwidth and avoid saturation. But what is happening, is that these huge buffers are holding packets that would otherwise be dropped, and so TCP doesn't get the feedback it needs to detect saturation. So it continues transmitting at full speed, believing it has uncongested pipes, which in turn continues to fill the buffers, and so on.
Because of the buffers, most of these packets are eventually getting through, but maybe in seconds instead of tens or low hundreds of milliseconds. Thus you're getting huge latency.
Jitter is caused by the buffers eventually filling or TCP timing out (registering packet loss), dropping the rate for a little bit, the buffers draining, then TCP upping the rate again as the buffers refill, hiding the saturation, until they're full again. Rinse and repeat.
It's related to the "bloat" of buffering (due to the increasing affordability of RAM and the "more of a good thing must be better than a little of a good thing - QED" mindset) because, if the size of the buffer is kept below a certain point related to the pipe bandwidth and number of traffic streams, it tends to act just as a temporary "buffer" against spikes in the traffic (the intention of buffering), and can't cause the scenario above, having insufficient capacity to overload the bandwidth just from buffer contents alone. Above this threshold, the latency issues and back-and-forth thrashing noted above occurs. The bigger the buffers, the worse the effect.
And it's not just a "well, keep your traffic below x mbit if you're on ADSL2" issue, because it happens anywhere a high capacity pipe interfaces with a low capacity or otherwise congested (of any capacity) pipe. This might be your ISP's backbone which is getting hit by several thousand people downloading the latest WOW patch simultaneously, causing your 300kbps Skype call to go to hell through latency and jitter. If the ISP's equipment had smaller buffers, the servers would be throttling back as packet loss occurred. You'd probably still be losing packets, but they'd be detected and re-transmitted pretty quickly and you possibly wouldn't notice the latency or have jitter.
What he is seeing is what happens when you want to saturate your link.
So, no, what you get with appropriate buffers is your TCP connection moderating itself to the appropriate link capacity and availability, and latency remaining approximately the same (relative to what you're seeing in bufferbloat, but worse than an uncongested link, obviously).
With bufferbloat, your bandwidth appears to remain about the same, but your latency balloons massively and you get jitter effects as above.

Things change at large scale (Score:5, Informative)

by farnz ( 625056 ) writes: <slashdot&farnz,org,uk> on Friday January 07, 2011 @12:18PM (#34791966) Homepage Journal

How much bandwidth can I have, though? Take the link between my desktop and a Slashdot server; is the correct answer "1GBit/s, no more" (speed of my network card)? Is is "20MBit/s, no more" (speed of my current Internet connection)? Is it "0.5MBit/s, no more" (my fair share of this office's Internet connection)? In practice, you need the answer to change rapidly, depending on network conditions - maybe I can have the full 20MBit/s if no-one else is using the Internet, maybe I should slow down briefly while someone else handles their e-mail.
TCP doesn't slam the network; it starts off slowly (TCP slow start currently sends just two packets initially), and gradually ramps up as it finds that packets aren't dropped. When packet drop happens, it realises that it's pushing too hard, and drops back. If there's been no packet drop for a while, it goes back to trying to ramp up. RFC 5681 [ietf.org] talks about the gory details. It's possible (bar idiots with firewalls that block it) to use ECN (explicit congestion notification) [ietf.org] instead of packet drop to indicate congestion, but the presence of people who think that ECN-enabled packets should be dropped (regardless of whether congestion has happened) means that you can't implement ECN on the wider Internet.
This works well in practice, given sane buffers; it dynamically shares the link bandwidth, without overflowing it. Bufferbloat destroys this, because TCP no longer gets the feedback it expects until the latency is immense. As a result, instead of sending typically 20MBit/s (assuming I'm the only user of the connection), and occasionally trying 20.01MBit/s, my TCP stack tries 20.01MBit/s, finds it works (thanks to the queue), speeds up to 20.10MBit/s, and still no failure, until it's trying to send at (say) 25MBit/s over a 20MBit/s bottleneck. Then packet loss kicks in, and brings it back down to 20MBit/s, but now the link latency is 5 seconds, not 5 milliseconds.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Bufferbloat — the Submarine That's Sinking the Net 525

Bufferbloat — the Submarine That's Sinking the Net More Login

Bufferbloat — the Submarine That's Sinking the Net

Correction: JIM GETTYS (Score:4, Informative)

Name wrong (Score:3, Informative)

Re:Definition, please (Score:5, Informative)

Re:pegged connection == latency, who'd of thunk it (Score:5, Informative)

Re:Definition, please (Score:4, Informative)

You have have not RTFA or not UTFA.. (Score:5, Informative)

Re:QoS (Score:4, Informative)

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

Concerning Boiled Frogs (Score:5, Informative)

Re:Really? (Score:3, Informative)

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

Yes, buffers can introduce latency (Score:5, Informative)

Re:Buffering of what? (Score:5, Informative)

Re:So, let me get this straight... (Score:5, Informative)

Re:Definition, please (Score:5, Informative)

Re:pegged connection == latency, who'd of thunk it (Score:4, Informative)

Things change at large scale (Score:5, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot