Bufferbloat: Dark Buffers In the Internet 124

Posted by timothy on Friday December 02, 2011 @10:25PM from the perfect-for-dark-fiber dept.

Expanding on earlier work from Jim Gettys of Bell Labs with a new article in the ACM Queue, CowboyRobot writes that Gettys "makes the case that the Internet is in danger of collapse due to 'bufferbloat,' 'the existence of excessively large and frequently full buffers inside the network.' Part of the blame is due to overbuffering; in an effort to protect ourselves we make things worse. But the problem runs deeper than that. Gettys' solution is AQM (active queue management) which is not deployed as widely as it should be. 'We are flying on an Internet airplane in which we are constantly swapping the wings, the engines, and the fuselage, with most of the cockpit instruments removed but only a few new instruments reinstalled. It crashed before; will it crash again?'"

Bufferbloat: Dark Buffers In the Internet

This discussion has been archived. No new comments can be posted.

Search 124 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Friday December 02, 2011 @11:08PM (#38247044)

Comment removed based on user account deletion

Re:Is this a problem? (Score:5, Interesting)

by ObsessiveMathsFreak ( 773371 ) writes: <obsessivemathsfreak.eircom@net> on Saturday December 03, 2011 @12:37AM (#38247396) Homepage Journal

What we need is a ferry analogy.
Packet transmission is like a ferry, crossing a river at fixed intervals. But ferry sets off when it is full rather than at set times.
People wait at the shore and generally don't have to wait too long as the ferry is pretty fast and only needs a few people to fill up. For most people, walking onto the ferry involves very little waiting before the ferry actually departs and crosses the river.
Buffer bloat is when big buffers act like ferrys with huge capacity. People enter a huge 2000 passenger capacity boat, and are let on by their hundreds with seemingly no delay. But the ferry will not depart until it is reasonably full. So the people who got on first may have to wait for hours before the ferry actually departs and crosses the river.
It is clear that bigger ferries are no substitute for more ferries....or smaller rivers. Or possibly a bridge. In any case, you can get away without introducing cars or airplanes, so my job is done here.

Re:Cringely again... (Score:5, Interesting)

by m.dillon ( 147925 ) writes: on Saturday December 03, 2011 @12:41AM (#38247416) Homepage

Well, you definitely CAN tell when one or more buffers along the path begins to fill up, because latency increases. Packet loss is not necessary and, in fact, packet loss just makes the problem worse since many TCP connections implement SACK now and can keep the bandwidth saturated even in the face of packet loss.
The ideal behavior is probably not to start dropping packets immediately... eventually, sure, but definitely not immediately. Ideally what you want to do is to attempt to shift the problem closer to the edges of the network where it is easier to fairly apportion bandwidth between customers.
Send-side bandwidth limiting is very easy to implement since TCP already has a facility to collect latency information in the returned acks. I wrote a little beastie to do that in FreeBSD many years ago, and I turn it on in DragonFly releases by default.
The purpose of the feature is not to completely remove packet buffering from the network, because doing so would put the sending server at a severe disadvantage verses other servers that do not implement similar algorithms (which is most of them).
The purpose is to unload the buffers enough such that the algorithms in the edge routers aren't overloaded by the data and can do a better job apportioning bandwidth between streams.
Our little network runs this coupled with fair queueing in both directions... that is, we not only control the outgoing bandwidth, we also pipe all the incoming bandwidth through a well connected colo and control that too, before it runs over the terminal broadband links. This allows us to run FAIRQ in both direction in addition to reserving bandwidth for TCP acks and breaking down other services. FAIRQ always works much better when links are only modestly overloaded and not completely overloaded. Frankly we don't have much of a choice, we HAVE to do this because our last-leg broadband links are 100% saturated in both directions 24x7. Anything short of that and even a single video stream screws up the latency for other connections beyond hope.
This sort of solution works great near the edges.
For the center of the network, frankly, I think about the best that can be done is modest buffering and RED and then trying to reduce the load on the buffers in the center with algorithms run on the edges (that can sense end-to-end latency). The modest buffering is needed for the edge algorithms to be able to operate without bits of the network having to resort to dropping packets. In otherwords, you want the steady state load for the network to not have to drop packets. Dropping packets should be reserved for the case where the load changes too quickly for the nominal algorithms to react. That's my opinion anyhow.
-Matt

I've definitely noticed it on my DSL (Score:5, Interesting)

by Just Brew It! ( 636086 ) writes: on Saturday December 03, 2011 @01:06AM (#38247498)

As soon as I start trying to shove (or suck) more bits through the pipe than it can handle, round trip latency to "nearby" points of the Internet increases from ~25 ms to ~1 second. When I need to transfer a lot of data, I use rsync or wget if at all possible, and throttle the transfer to just below the rate the connection can handle; this results in ping times staying sane while only slowing down the transfer slightly. We shouldn't need to resort to doing stuff like this to make the network function properly!

Lag-o-Meter-of-Internet-Doom (Score:5, Interesting)

by WaffleMonster ( 969671 ) writes: on Saturday December 03, 2011 @02:49AM (#38248008)

If you look at buffers allocated to fast multi-gigabit interfaces at the core of the network they are simply not large enough compared to forwarding rates involved to be able to induce the kinds of delays needed to cause Internet wide problems.
You can argue they may not be ideal for real time voice, game or video communication when these links are oversubscribed but no doomsday is possible.
Today buffer bloat effects are mostly observed at the edge even though they need not always be.
Failure of a congestion control algorithm to control link saturation does not translate into congestive collapse of the larger network. It just results in *your* network connection turning to shit. When netalyzer runs it intentionally saturates your link at that time. In the real world only a few portions of the edge are ever saturated to the extent congestion control failure becomes an issue leading to more packets through core routers. The number of edge machines in this category would need to be significant to cause a rerun of previous issues.
That condition can not be met due to self feedbacks. If everyone maxed their pipes at once the core would saturate self-limiting edge saturation due to gross over-provisioning of available edge bandwidth in relation to core bandwidth which would ensure congestion control algorithms function properly.
I'm not arguing there is not a problem or more can't be done. I'm just arguing the doomsday congestive collapse scenario is bullshit.

Re:Cringely again... (Score:5, Interesting)

by WaffleMonster ( 969671 ) writes: on Saturday December 03, 2011 @03:51AM (#38248218)

So wouldn't the right way to go be to update TCP for the times? i mean we didn't slow computers down so we could keep PATA or PCI, we came up with new tech like SATA and PCIe to take advantage of the faster throughput. Shouldn't we do the same here as well?
We have SCTP which was intended to replace TCP except nobody seems to care.
At the end of the day the concept of TCP is not rocket science - there is a limit and diminishing returns to what more can be done twoard making TCP a perfect reflection of the concept of TCP.
Congestion management and ack/windowing have certainly evolved into high arts..but fundementally all TCP does is implement a loss free ordered data stream on top of an unordered lossy packet switched network.
This means your core limitation is embedded in the definition of TCP itself...the problem of head-of-line blocking. By using TCP you are by definition limiting yourself to the constraints of TCP.
Realtime voice/video and multi-player games use their own protocols because they are not willing to accept the constraints of TCP. It is not the implementation of TCP that is holding them back. It is the *concept* of TCP.
In my opinion we need more IP protocols to better handle varied use cases more than we need a new TCP.

Doing it wrong, again (Score:5, Interesting)

by Animats ( 122034 ) writes: on Saturday December 03, 2011 @03:56AM (#38248234) Homepage

That's a pretty simplified way of putting it, but basically correct. Major equipment vendors have been slow to adopt more advanced queuing strategies (Stochastic Fair Queuing integrated with some of the more advanced flavors of early discard.)
Right. The problem is not big buffers, per se. It's big dumb FIFO queues. There's nothing wrong with one big flow, like a file transfer, having a long latency, provided that other flows with less data in flight aren't stuck behind it. That's what "fair queuing" is all about. Each flow has its own queue, and the queues are serviced in a round-robin fashion. (With stochastic fair queuing, some hashing is done to eliminate some of the bookkeeping on flows, but the effect is roughly the same.)
I figured this out in the early 1980s (see RFC 970 [ietf.org]) and by the late 1990s, it was an established technology. We shouldn't be having this problem at this late date.
I wonder how much of the trouble comes from devices that are doing TCP-level processing in the middle of the network. Stateful firewalls and ISP ad-insertion engines [isp-planet.com] can introduce substantial latency.
If you want to test for bad behavior, try running two flows, one that never has more than one packet outstanding, and one that just does a big file-transfer like operation like a download. If the latency of the low-traffic flow goes up to the same as that of the bulk flow, there's a big dumb buffer in the middle. If the packet loss rate of the low-traffic flow goes up, there's a small dumb buffer in the middle.

Re:Is this a problem? (Score:4, Interesting)

by TheLink ( 130905 ) writes: on Saturday December 03, 2011 @05:04AM (#38248430) Journal

You don't necessarily have to size them in flight time of the circuit.

What you can do is have huge buffers, but just drop packets that are older than say 50 milliseconds since the time they entered the device (if the link/hop is supposed to be fast and low latency).

If the link is slow and/or high latency, you may wish to use higher values - 100 milliseconds. But not too high. I'm no networking expert but I don't really see the purpose of adding hundreds of milliseconds to a hop just to save a few packets that are likely to be dropped anyway, or should be dropped as an indirect signal that whoever is sending those packets should slow down.

Comment removed (Score:4, Interesting)

by account_deleted ( 4530225 ) writes: on Saturday December 03, 2011 @05:45AM (#38248560)

Comment removed based on user account deletion

Re:Cringely again... (Score:5, Interesting)

by evilviper ( 135110 ) writes: on Saturday December 03, 2011 @06:13AM (#38248630) Journal

Even Cringley points out at the first of his article that originally TCP was written for a VASTLY different and weaker network than we have now, so instead of trying to make the networks go back to a mid 1980s design, wouldn't it be smarter just to update TCP to take advantage of new tech advances?
There's nothing about a "weaker" network that necessitates a protocol redesign. TCP has had problems with congestion handling from day one, that have necessitated a million and one hacks and workarounds, because it stupidly conflates packet loss with congestion... Some links will have packet loss without any congestion, and others (like these with huge buffers) will have congestion without (immediate) packet loss. It was a bad design decision.
What's worse is that IP was designed correctly to begin with. The original design has ICMP control messages (eg. source-quench) to signal congestion, much like many other networking protocols. The real problem was that the specifics were vague, and there was no exact standard on how much to slow down, how it affects higher level protocols, etc., so it became a prisoner's dilemma, and highly unfair, and was deprecated.
Of course, this problem could occur with TCP's congestion control just as easily if any particular implementations reduced the rate of exponential backoff, so there's nothing fundamentally wrong with the original congestion control design, just the lack of consistent implementation.
Controlling congestion by dropping packets is like controlling freeway traffic by randomly pushing cars off the road with a bulldozer.

How can I improve my own connection? (Score:4, Interesting)

by Edgester ( 105351 ) writes: on Saturday December 03, 2011 @12:03PM (#38250154) Homepage Journal

What can I do with my own laptop and wifi router to make my own situation better?

Re:How can I improve my own connection? (Score:5, Interesting)

by Chirs ( 87576 ) writes: on Saturday December 03, 2011 @04:58PM (#38252444)

As an end-user there are only a few things you can do:
1) Reduce the outgoing tcp queue size.
2) Reduce the tx ring buffer size in the network device driver
3) Set your router's upstream quality-of-service settings to throttle your upstream data transmission rate to just less than your upstream bandwidth.
Alternately, if you only have one heavy user of upstream bandwidth you could do something like what is described at "http://wanners.net:8000/blog/2011/05/zapping-upload-bufferbloat-with-one-command/". Basically throttling the upstream bandwidth directly on the machine in question rather than on the router.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Bufferbloat: Dark Buffers In the Internet 124

Bufferbloat: Dark Buffers In the Internet More Login

Bufferbloat: Dark Buffers In the Internet

Comment removed (Score:5, Interesting)

Re:Is this a problem? (Score:5, Interesting)

Re:Cringely again... (Score:5, Interesting)

I've definitely noticed it on my DSL (Score:5, Interesting)

Lag-o-Meter-of-Internet-Doom (Score:5, Interesting)

Re:Cringely again... (Score:5, Interesting)

Doing it wrong, again (Score:5, Interesting)

Re:Is this a problem? (Score:4, Interesting)

Comment removed (Score:4, Interesting)

Re:Cringely again... (Score:5, Interesting)

How can I improve my own connection? (Score:4, Interesting)

Re:How can I improve my own connection? (Score:5, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot