A Peek At Google's Software-Defined Network 75
CowboyRobot writes "At the recent 2013 Open Networking Summit, Google Distinguished Engineer Amin Vahdat presented 'SDN@Google: Why and How', in which he described Google's 'B4' SDN network, one of the few actual implementations of software-defined networking. Google has deployed sets of Network Controller Servers (NCSs) alongside the switches, which run an OpenFlow agent with a 'thin level of control with all of the real smarts running on a set of controllers on an external server but still co-located.' By using SDN, Google hopes to increase efficiency and reduce cost. Unlike computation and storage, which benefit from an economy of scale, Google's network is getting much more expensive each year."
centralized = fault-tolerant? (Score:4, Interesting)
"it provides logically centralized control that will be more deterministic, more efficient and more fault-tolerant."
I'll agree with deterministic and efficient, and perhaps even less likely to fault, but more fault-tolerant seems like a stretch. SDN might get you better fault-tolerance, but that is not because the control is centralized. I suspect the control has more information about non-local requirements and loads, and that can get you better responses to faults. That happens because the controllers can communicate more complex information easier, since that is pure software, not because its centralized. You can have these fault tolerance gains via non-centralized SDN too.
Re:centralized = fault-tolerant? (Score:4, Interesting)
Compare it to the alternative such as the good old spanning tree protocol. You have a number of independent agents who together have to decide how to react to a fault. This is complex and requires clever algorithms that can deal with timing issues and what not.
With a centralised controller the problem is much easier. One program running on one CPU decides how to reconfigure the network. This can be faster and possibly find a better solution.
Of course you need redundant controllers and redundant paths to the controllers. Apparently Google decided you need a controller per location.
Re:centralized = fault-tolerant? (Score:4, Insightful)
With a centralised controller the problem is much easier. One program running on one CPU decides how to reconfigure the network. This can be faster and possibly find a better solution.
I can see how centralizing the control can be easier. But if the history of Internet networking has taught us anything, we should expect somebody to come up with a more clever distributed algorithm (perhaps building on OpenFlow) that will make SDN's a footnote in history while the problem gets distributed out to the network nodes again, making it more resilient.
That's not to say that trading off resiliency for performance today isn't worthwhile in some applications.
Re: (Score:2)
ut if the history of Internet networking has taught us anything, we should expect somebody to come up with a more clever distributed algorithm
The internet has moved from centralized to decentralized to centralized again. It is not the case that it has moved one-directionally towards a distributed system. Currently big parts of the internet are centrally managed (e.g. SuperDNS/GoogleDNS, IBGP, MPLS routing, most of network provisioning).
Current view is that centralizing BGP would be a "good thing" (TM).
Re: (Score:2)
There may be a move to concentrate traffic in fewer large networks, but that's not the same as the Internet getting more central management.
Re: (Score:2)
but DNS is just as distributed as always
Google DNS is centralized.
BGP *is* somewhat centralized, as it always was
The change is that now many organizations drop centrally computed routing tables on the routers as opposed to the OSPF+manual tweaks that used to dominate before.
Re: (Score:2)
Google DNS is centralized.
Well, yes. Every network has "centralized DNS" it's how DNS operates. That this is a sudden and startling discovery to you indicates nobody should listen to you.
The change is that now many organizations drop centrally computed routing tables on the routers
That's always been relatively common. Especially if you have only one or two peers, dynamically learning the entire Internet routing table was a massive waste of resources. Many holders of a single class-C run BGP to advertise their route, not to learn routes. They default out, and advertise, so that their block is reachable if a link goes dow
Re: (Score:2)
It is clear you do not know what Google DNS is. It is not the DNS that serves the "google network" but a global provider of DNS services for all and people are encouraged to use it instead of their local DNS. This makes your comment
That this is a sudden and startling discovery to you indicates nobody should listen to you
rather ironic.
That's always been relatively common. Especially if you have only one or two peers, dynamically learning the entire Internet routing table was a massive waste of resources.
I'm talking AS level organizations including internal routers as well as border routers.
Re: (Score:2)
It is clear you do not know what Google DNS is. It is not the DNS that serves the "google network" but a global provider of DNS services for all and people are encouraged to use it instead of their local DNS.
Ah yes, the traditional "you must not have all the information, or you'd agree with me" argument. It's proof your logic is flawed, not proof of my ignorance. You do realize that "back in the day" there were people encouraging others to use things like 198.6.1.3, the DNS server for the largest (by volume, not reach) and fastest growing (by $ per day spent on infrastructure) ISP on the planet, rather than local ones because local ones were much more prone to failure than link failure to 198.6.1.3, right?
Re: (Score:2)
You are funny trying to play the I'm older and wiser card. You are likely to lose that one too.
And all you prove with your 198.6.1.3 example is what I said in my original posting: there have been waves or centralization (such as that one) and waves of decentralization and back again (e.g. Google DNS).
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
You had to buy specialized routers/expansion cards for decades to do certain things. Now you reconfigure those things on the fly.
Re: (Score:2)
All the other things I've seen mentioned were SDN within a NIC for CPU offload. But if you are putting a computer in a NIC, you can do other things with it anyway.
Re: (Score:2)
How can you have a software defined network? (Score:4, Interesting)
A network is physical infrastructure - software isn't going to be rerouting cables or installing new wifi nodes anytime soon.
If all they mean is routing tables are dynamically updated then how is this anything new?
This isn't a troll, I genuinely don't see where the breakthrough is.
Re: (Score:2)
You're missing the point. The summary describes it as a 'Software Defined Network Network', a true innovation.
Huh what is IOS and DDWRT then (Score:2)
Re: (Score:2)
Re:How can you have a software defined network? (Score:5, Informative)
Its not what they are doing here exactly but there is not reason you can't have a logical topology over top of a physical one. Actually its very useful, especially when combined with a virtual machine infrastructure. Perhaps you want to have two machines in separate data-centers to participate in software NLB they need network adjacency, for example, yet I doubt you want a continuous layer two link stretched across the country. Sure if its just two DCs maybe a leased line between them will work, what if you have sites all over the place and potentially want to migrate the hosts to any of them at any time? That would allow for maintenance at a facility, or perhaps you power on facilities during off peak local electrical use, and migrate your compute there?
People are doing these things today but once you get beyond a singe VM host cluster it gets pretty manual. With admins doing lots of work to make sure all the networks are available where they need to be hard coded GRE tunnels, persistent ethernet over IP bridges, etc. They all tend to be static, minimal overhead when not in use sure, but overhead and larger attack surface non the less. A really good soup to nuts SDN might make the idea of LAN and WAN as separate entities an anarchism. Being able to have layer two topology automatic wherever needed would be very cool.
Re:How can you have a software defined network? (Score:5, Informative)
There is no routing as such. For each new "flow" the switch needs to ask a computer (controller) what to do. The controller will then program the switch with instructions for the new flow.
You claim that the flow table is just a glorified routing table. Maybe it is but much more fine grained, you can match on any fields in the IP packets, including layer 2 and 3 such as MAC, IP, port numbers, IP TCP packet types (syn packets) etc. Also you can mangle the packets, for example modify the MAC or IP address before forwarding the packet.
With this you can build some amazing things. The switch can be really dumb and yet it can do full BGP routing: RouteFlow: https://sites.google.com/site/routeflow/ [google.com]
The other canonical use case is virtualisation. No it will not be rerouting physical cables. But it can pretend to do so. Combine it with VMs you can have a virtual network that can change at any time. If you migrate a VM to another location, the network will automatically adapt. And still the switches are dumb. All the magic is in the controllers.
Before OpenFlow you would need to make a vlan (or MPLS). When moving the VM to a new location, you would need to reconfigure a number of switches to pass around this vlan and there is no standard protocol to do so.
OpenVSwitch supports OpenFlow so you can pretend your virtual network with virtual switches includes the VM host itself: http://openvswitch.org/ [openvswitch.org]
Re: (Score:3)
Translation: Google Big.
Yep. And there comes a point when you're scaling up that quantitative differences become qualitative differences that demand completely different solutions to the old problems.
Translation: Firmware Is Magic.
No, firmware is static, and the code it contains must fit in limited capacity storage devices and run on low-end CPUs, unless you want to pay big money for your switches. Much better to make the switch firmware simple and the switches cheap, and put your logic in a few much more powerful machines with visibility into the bigger pictur
Re: (Score:2)
"it's all stateless" - no not exactly. First OpenFlow has counters and flow rules can apply to those counters. You can use this to rate limit flows or you can use it to sample packets (copy every 500th packet etc). Or to load balance.
But most important, the whole point of OpenFlow is that you do not upload the whole set of rules to the switch. Indeed the actual rules might be too complex for the switch to hold or to compute.
Take the BGP implemented by RouteFlow as an example. The global BGP table has about
Re: (Score:2)
Small typo: "No, because you need to upload all the routes, in fact you will upload no routes" should be "No, because you need NOT to upload all the routes, in fact you will upload no routes".
Re: (Score:2)
Can you point to any cheap switch that can hold 500.000 BGP routes in the dataplane? I didn't think so.
You are also missing the point: Do you really want to pay extra for software features? Software that has been done way better in open source controllers?
A Juniper router with 6x 10 Gbit/s is $50,000. An OpenFlow enabled switch with four times as many 10 gig ports is only one tenth of that. I do not know where you work, but in my shop that is some savings that we will take.
Re: (Score:2)
I will give you that the OpenFlow system is stupid in some ways. For example I can push a MPLS label on a packet, but I can not push a LISP header. Why not? Because they made separate instructions such as "push VLAN label" and "push MPLS label" - instead of a generic "push N bytes".
OpenFlow is two things. It is a language for the data plane. Not much different from what you are asking for. It is not turing complete, probably by design. So you can not make the data plane do just anything, but on the other ha
Re: (Score:2)
You say that 50k juniper router providers lower latency than the cheap OpenFlow-switch - but that is just BS.
That's proof you don't know what you are talking about. The expensive router/switches have dedicated ASICs for ports. RAM and processing at the port to get the packet in, processed, and back out as soon as possible. Pulling something in a "linux" router (often a PC with more networking cards) and you have to pull it through the cards, through the bus/MB to the CPU, process it, and send it back to the card for exit.
Re: (Score:2)
How the hell did you manage to conclude anyone here was talking about PCs with networking cards? The "cheap" switches I am talking about are products such as Juniper E4550 that got 32x 10G and 960 Gbps bandwidth for $19k. Compare that with Juniper M320 which is twice as expensive with only half as many 10G ports and 320 Gbps bandwidth.
Sure the M320 can do more in the data plane, but people are using it for stuff that the E4550 would do just fine, if the software would allow it.
Or you could go for a HP 5820X
Re: (Score:2)
Re: (Score:2)
No that is the point of OpenFlow. The switches becomes routers.
Re: (Score:2)
Re: (Score:2)
Please elaborate on what you mean by stateless. I already told you how it is not stateless.
Re: (Score:2)
An OpenFlow switch will:
Update counters and timers. Make decisions based on those counters and timers. Support multiple queues with different limits of delay etc. QoS. Rewrite source and destination IP address and UDP/TCP port numbers allowing the switch to do NAT without querying any external entity on a per packet basis. Add and remove VLAN, MPLS, etc tags, modify the tags, modify the MAC and much more. Automatically drop flow rules by certain events such as the last packet in a TCP flow or by counters, t
Re: (Score:2)
OpenFlow will only pass as much of the packet as you need to. For most cases that is just the headers. Say the controller is on a 10G interface and 100 bytes needs to be transferred out and then the reply will be about 100 bytes too. The time to process the packet will be the same or less compared to the switch build in controller (external controllers will generally be more powerful servers than the controller CPU in a switch or router). Time to transfer 200 bytes on a 10G is 200 ns.
Of course there might b
Re: (Score:2)
It does not matter if it sends one bit per packet -- latency is per packet, not per byte. Packets must be sitting in a queue while the switch is waiting for response -- so the time for response is determined by the time for the queue to overflow, or the packet will have to be dropped. It will never work.
So you are saying my estimate of 200 ns delay is wrong? Give me your own calculations.
Yes the incoming packet is in a queue while the switch waits for response from the controller. That response can be there within 200 ns. In the meantime the switch is not blocked from processing further packets.
A 200 ns delay on the first packet in a flow of packets is so little that is barely measurable. You will be dealing with delays much larger than that simply because you want to send out a packet on a port that is al
Re: (Score:2)
That is bullshit. Here is a guy that benchmarked the Intel X520 10G NIC that wrote a small piece titled "Packet I/O Performance on a low-end desktop": http://shader.kaist.edu/packetshader/io_engine/benchmark/i3.html [kaist.edu]
His echo service manages to do between 10 and 18 Gbit/s of traffic even at packet size of 60 bytes. And there is plenty of optimizations he could do to improve on that. The NIC supports CPU core affinity so he could have the load spread on multiple cores. The memory bandwidth issue could have bee
Re: (Score:2)
That's without data ever being accessed from userspace, no protocol stack, average packet size being half of the maximum, and there is a good possibility that the measurements are wrong, because then it would be easier to implement the whole switch by just stuffing multiple CPU cores into the device, and the whole problem would not exist.
The article was written by the guy that did the driver, I think we can assume he knows his stuff.
No it appears that if you want to switch more than 10-18 Gbit/s the computer would have a memory bandwidth problem. Trying to use multiple cores and NUMA might improve on that, but I do not think you would manage to build a 24 port switch that switches at line speed this way :-).
But if you could somehow get an external switch to do 99% of the work, this might work...
I am not sure how much more we can get out of
Re: (Score:3)
Sometimes it seems that SDN is just a new dress on an old pig, sometimes it starts to make sense.
When I'm feeling enlightened or charitable about the concept I envision it as an encapsulation system for layer 2 on layer 3, allowing layer 2 networks to be created independent of the physical constraints of actual layer 1/2 topologies.
I imagine the goal is to define a layer 2 switching domain (ports, VLANs, etc) and connect systems to it regardless of how the systems are physically connected or even located.
Re: (Score:2)
And then there's my inherent skepticism about the value payoff relative to the level of complexity added, as well as asking isn't that why we have layer 3 protocols? To define networks above and beyond their layer 2 memberships?
What once was old is now new again.
Re: (Score:1)
what is the payoff? no Cisco support contracts on gazillions of switches and interconnects (which support really is purchased for firmware updates--support/replacement is or should be quite rare) this will have a very fast payoff, despite the initial complexity curve.
a lab should be quite cheap for POC testing of production changes.
Re: (Score:2)
No it isn't. Sure, there's one ethernet cable connected from a server to the rack switch, but even there, the packets coming in could have hundreds of different VLAN tags on them.
Everywhere else, you have multiple redundant links from everything to everything else, and deciding which one to use for each packet is the complex part.
Re: (Score:2)