Forgot your password?
typodupeerror
Networking IT

How To Build a 100,000-Port Ethernet Switch 174

Posted by kdawson
from the stretching-the-fabric dept.
BobB-nw writes "University of California at San Diego researchers Tuesday are presenting a paper (PDF) describing software that they say could make data center networks massively scalable. The researchers say their PortLand software will enable Layer 2 data center network fabrics scalable to 100,000 ports and beyond; they have a prototype running at the school's Department of Computer Science and Engineering's Jacobs School of Engineering. 'With PortLand, we came up with a set of algorithms and protocols that combine the best of layer 2 and layer 3 network fabrics,' said Amin Vahdat, a computer science professor at UC San Diego. 'Today, the largest data centers contain over 100,000 servers. Ideally, we would like to have the flexibility to run any application on any server while minimizing the amount of required network configuration and state... We are working toward a network that administrators can think of as one massive 100,000-port switch seamlessly serving over one million virtual endpoints.'"
This discussion has been archived. No new comments can be posted.

How To Build a 100,000-Port Ethernet Switch

Comments Filter:
  • by BuR4N (512430) on Wednesday August 19, 2009 @02:39AM (#29115601) Homepage Journal
    I hope they have invented something better than ordinary Ethernet cables to wire that ting with.
  • Oh no... (Score:5, Funny)

    by acehole (174372) on Wednesday August 19, 2009 @02:52AM (#29115661) Homepage

    I have nightmarish pictures popping into my head of a waterfall of ethernet cables spewing from this with user's ports un-numbered with no network diagrams. People bashing on the server room door in a zombie like state muttering "MRRRHH FACEBOOK!" "TWWIIIITEEEuggggghh" with me inside screeching "NO! NO! I DONT KNOW WHAT PORT YOUR DESK IS! NO! I CAN'T MAKE THINGS GO FASTER!" before curling up in a ball listening to the hum of servers and the lamentations of the users outside the door desperately scratching to get in.

    • by Shag (3737)

      I have nightmarish pictures popping into my head of a waterfall of ethernet cables spewing from this with user's ports un-numbered with no network diagrams.

      I think this scenario is precisely why BOFHs have PFYs.

    • Re:Oh no... (Score:4, Insightful)

      by Z00L00K (682162) on Wednesday August 19, 2009 @04:30AM (#29116117) Homepage

      What a party it would be for people that likes to do broadcast storms!

      Just purge the arp cache frequently and you will have a lot of broadcasts that can clog down the network.

    • NO! NO! I DONT KNOW WHAT PORT YOUR DESK IS! NO!

      That's funny. Because right now I'm doing consulting work for a major bank. They know what port I'm on all the time. In fact, they have software that monitors my traffic and immediately cuts it off if something they don't like happens.

      I just bring in my Macbook with an EVDO dongle if I want to surf.

      • by AndrewNeo (979708)

        It's probably just a software firewall that blocks your MAC instead of your ethernet port.

    • by SkyDude (919251)

      I have nightmarish pictures popping into my head of a waterfall of ethernet cables spewing from this with user's ports un-numbered with no network diagrams.

      Whoa....someone needs a vacation.

    • by tlhIngan (30335)

      I have nightmarish pictures popping into my head of a waterfall of ethernet cables spewing from this with user's ports un-numbered with no network diagrams. People bashing on the server room door in a zombie like state muttering "MRRRHH FACEBOOK!" "TWWIIIITEEEuggggghh" with me inside screeching "NO! NO! I DONT KNOW WHAT PORT YOUR DESK IS! NO! I CAN'T MAKE THINGS GO FASTER!" before curling up in a ball listening to the hum of servers and the lamentations of the users outside the door desperately scratching t

  • by Wrexs0ul (515885) <mmeier@@@racknine...com> on Wednesday August 19, 2009 @02:53AM (#29115669) Homepage

    I would seriously hate to be the guy that tripped over that power cable.

    On the plus side it would be interesting to time how long it took for the DC's phone lines to melt.

    -Matt

    (redundant, redundant power. I know, I know)

    • by Thanshin (1188877) on Wednesday August 19, 2009 @06:53AM (#29116709)

      I would seriously hate to be the guy that tripped over that power cable.

      A sentry gun will be installed in the power cable corridor, to execute you the precise moment you've done your tripping. So you wouldn't have time to hate being yourself.

      (redundant, redundant power. I know, I know)

      To answer your worried look: yes, there's a redundant sentry gun for the other cable too.

      • How about installing those guns in a way, that vaporizes you right *before* you would trip that cable? Seems to make more sense to me...

      • by Agripa (139780)

        Were you going to install sentry guns above the drop ceiling also?

        "5 meters, man. 4, what the hell?"

    • Not redundant power. Power over Ethernet! Why should you be able to distinguish between the power cord and the data carrying cables?

      Or possibly remote microwave power. So intense that interrupting the beam will destroy anything in the way. No need for machine guns, significant savings on ammo, reduced cremation costs. Win win all around.

  • by Anonymous Coward on Wednesday August 19, 2009 @02:59AM (#29115693)

    I've long been of the opinion that putting more than a few hundred hosts on a single layer 2 network is almost always a bad idea.

    What do you do about broadcast storms? How do you prevent some clown from anywhere in that 100,000 machine cloud from poaching another machine's IP address (either maliciously or by an accidental typo)?

    Subnets and routers were invented for a reason. Just because you can bridge the whole world together into one massive virtual Ethernet segment doesn't mean you should.

    • Re: (Score:3, Funny)

      by hhedeshian (1343143)
      Easy: don't use a switch, use a hub! Everything will be a broadcast storm!
      • Re: (Score:3, Funny)

        by lorenlal (164133)

        And you could label the hubs with cheeky names like Wilma, Andrew, Ivan, and Camille.

    • by amorsen (7485) <benny+slashdot@amorsen.dk> on Wednesday August 19, 2009 @04:20AM (#29116069)

      What do you do about broadcast storms?

      In the paper they detail how they handle ARP. All other broadcasts you can get away with dropping these days; use multicast instead. (Yes, that will break NETBIOS broadcast name lookups. So sad.)

      How do you prevent some clown from anywhere in that 100,000 machine cloud from poaching another machine's IP address (either maliciously or by an accidental typo)?

      That is a solved problem if you use decent switches. You can apply pretty much any policy you like.

      • Re: (Score:3, Informative)

        by guitaristx (791223)

        A no-broadcast policy breaks Wake-on-LAN.

        • by amorsen (7485)

          WOL is not all that useful in a heavily-virtualized data center. Besides, if you have 100.000 hosts in one network, it's probably a bad idea to run an unauthenticated protocol like WOL.

          You can achieve the same by ssh'ing to the management port and turning the server on anyway.

    • You should RTFA. Most of it is about exactly those issues, of managing the address space.
    • Just because you can bridge the whole world together into one massive virtual Ethernet segment doesn't mean you should.

      Yeah but, with all those nodes you could form a beouwolf cluster. Just think for a moment about all those sockets!

    • Having done this a while, I've found that large, flat networks actually work quite well. People often bring up all kinds of fears based on folklore from the unswitched hub days, and IMO they just don't apply any more on modern layer 3 switches.

      What do you do about broadcast storms?

      ACL broadcast default-deny. Broadcast generally isn't needed any more. ARP is proxied by the switch. NetBios broadcast resolution has no place on a large network. Virtually all other niches for broadcast are superseded by multicast these days. If you ever find s

  • by Animats (122034) on Wednesday August 19, 2009 @03:00AM (#29115701) Homepage

    The paper is about adding a layer of addressing so that IP and Ethernet addresses can be moved from one machine to another as instances of virtual machines are migrated around. It's not about the problems of physically building a very large switch. The switch components are mostly stock items.

    • by foksoft (848194)

      That PMAC idea is really cool. But beyond that. Nothing special. Try to build something more large and you will find that your core layer switches have not enough ports as number of aggregation level switches will increase. And I am not mentioning problems with throughput when distant nodes will start communicating to each other.
      For me it looks like they are trying to make routers redundant. But building 100 000 node network with this topology will require really powerful core layer nodes.
      For large datacent

  • by the_macman (874383) on Wednesday August 19, 2009 @03:02AM (#29115709)

    Have fun replacing it when it fails. In my head I imagine something like this [dia.org].

  • Hasn't that already been done? [wikipedia.org]
  • by hhedeshian (1343143) on Wednesday August 19, 2009 @03:37AM (#29115879)

    Lets see... That's 100,000 ports with 2 LEDs each (link, action/fdx/speed/poe) for a total of 200,000 LEDs. Lets say they use some of the cheapest SMD LEDs on the market. Well use digikey part number 160-1183-1-ND which is a cheap 0603 foot print green LED. At quantity 200,000 that comes out to $12,000 in cut-tape packaging or $9,450 if you buy 210,000 of them in 3,000-qty reels.

    Lets say that all of the link LEDs are on 100% of the time and the the activity LED is on 50% of the time. That gives us 150,000 LEDs on at any given point in time. Our example LEDs use 20ma at 2.1V. So 150,000 LEDs at 20ma uses 3Ka. In total, 6.1Kw is burned by the green LEDs.

    All that blinking... Damn. I want one NOW!!! More than a girl friend!

    • by h4rm0ny (722443)

      That was pretty much my first thought when I saw the headline, too. I could never, ever manage to use something like this, but I totally want it!

      I don't know what I'd do with it. Probably just put a pillow on it and sleep on it just to be close to that much technology. :)
    • A much better way would be to map the led's to a big flat-screen tv, using a fractal traversal mapping. This would show clusters of activity on servers as 'blobs' of color on the monitor.

      And you dont factor in LED duty cycle or voltage drop

  • You mean (Score:5, Insightful)

    by countertrolling (1585477) on Wednesday August 19, 2009 @03:49AM (#29115927) Journal

    I can't just go out and buy 33,334 d-links and turn off DHCP on all but one of them?

  • That's one big LAN party
    • by Dan541 (1032000)

      Not to mention a shitload of crossover cables to link the damm switches together.

  • by jeko (179919) on Wednesday August 19, 2009 @04:12AM (#29116037)

    Without getting too far into it, their brilliant plan to to insinuate a layer 2 and a half using "pseudo MAC addresses," using a directory service rather than broadcasts. They're hoping they can use this mess to paper over horrific network design.

    Yeah, I'll grant you you might be able to cobble this mess together in an academic setting, and sure, you'll even be able to rig some demos that show miraculous increases in speed.

    I can guarantee they'll find funding with their promise you'll even able to hire even LESS skilled network admins, meaning Zaboomafoo the Typing Lemur now has a shot at his CCIE.

    But, damn, you ignorant twits. Most corporate networks are already mashed together by the most cut-rate cable monkeys they can find. The last thing we need is some half-assed "protocol" that will guarantee even more network designs that are guaranteed to trip and break their necks over the first packet.

    • by jcr (53032)

      You seem quite confident in your dismissal of their work.

      I can guarantee they'll find funding with their promise you'll even able to hire even LESS skilled network admins

      You say that like it's a bad thing. Network administration shouldn't be as complex as it is; it's a waste of time and effort. Networks should be self-configuring to the greatest possible extent.

      -jcr

      • by hairyfeet (841228) <bassbeast1968@NOspaM.gmail.com> on Wednesday August 19, 2009 @05:20AM (#29116313) Journal

        I think you kinda missed his point that the Networks wouldn't be so hard to admin if the corps didn't try to save a buck by lowballing and ending up with topologies that looked like they were designed by drunken gibbons. Here, let me illustrate with a true story-

        So I'm working a nice little temp job, putting in a bunch of new boxes on this little insurance company when I break for lunch I run into one of my old friends at this little outdoor BBQ joint. When I tell him how easy my job is going he says "you gotta come back with me to this law firm I'm having to rebuild. You will NOT fucking believe it!" so intrigued I follow him back. On his desk are some machines, which he asked me "notice anything funny about them?" so I move the side panels so I can see and it instantly hits me that these are ALL homemade gamers rigs. He says "Yep, not a single fucking driver alike. Fun huh? And good luck with parts! But that ain't the worst part. Check this out" so he opens up the "network room" and there is literally a MOUND of Dlink and other cheap ass home routers piled up a good 4-6 feet high. I said "WTF is this?" To which he replied "This is what a dumbass who had been their "network admin" thought a network should look like. Not only is nothing labeled in this just giant fucking mess, but there are no less than SIX different ISP home plans running this shit. Fun huh?"

        So while I'm sure he made out like a bandit I wouldn't have taken that job on a bet. I would have had nightmare for months trying to deal with that clusterfuck. All because some bean counter hired the first schmuck that walked through the door that could halfway talk a good game and was willing to work for the peanuts they were offering. So yeah, a network set up by someone with a brain that knows about network topologies isn't really that hard to maintain or add nodes to. But instead you get some paper tiger that can bullshit HR and makes a truly gigantic clusterfuck out of the thing and then it takes 3 forevers to get it straightened out. I don't even want to picture what kind of giant messes can be cooked up with this tech if you can just throw anything together and get it to function thanks to this "virtual mac" idea. Because when the thing finally breaks down like my buddy you might be really scared to open up that "network" door.

        • by Anonymous Coward on Wednesday August 19, 2009 @06:20AM (#29116563)
          You should try taking an MIS position at an engineering company. Every engineer secretly (or not so secretly) thinks that they can do a better job than the lowly MIS people. They bring in their own WAPs because they want a perfect WiFi signal in their cubicles. They stream music from the Internet, then complain when their file downloads are slow. They insist on having local Administrator rights to "their" computer, and then complain when it becomes infested with malware. One thought that bridging his WiFi and Ethernet adapters would give him faster Internet access. Another decided that he needed his own server, so he set one up and proceeded to offer DHCP on the network.

          And the programmers are the worst - every one of them thinks that being able to write software makes them qualified to administrate a nation-wide network, especially because they have a network at home, you see, and also do computer work for their friends and family.
        • by drinkypoo (153816)

          I'd take the job if I could either get in writing that I get to replace anything that offends me, or if I were going hungry. Sounds like the cheapest and easiest option would have been to just replace the lot. It makes the most sense as a contract job though, the last part of the contract is helping to hire the admin who will work there.

        • by Archangel Michael (180766) on Wednesday August 19, 2009 @11:44AM (#29119545) Journal

          That sounds like a law office I spec'd a job for. The law office manager knew me from her previous place where I was the "IT" guy. So this law office is having ALL sorts of network, computer and server problems, and asks me for a bid to fix it.

          I scope the joint, prepare a bid, and I figure it is (using numbers from memory) it was $25,000 for everything installed setup and running: new HW, Server, computers and wiring (small office). EVERYTHING was BRAND NEW.

          Their existing guy (I won't even call him IT) under bid me by $10K. They asked me to requote, and I told them no thanks. Obviously I didn't get the job.

          Well, a few months later they called me back to try to fix what was done by this other guy. I look, and his wiring was flat phone cable (cat 2???) stapled to the wall in pretty "rows". Recycled home grown computers and I didn't even bother to look at the "server". I was too afraid.

          I said to the Manager, "Network is flakey and nothing works right, huh?". Anyway, they ask me to requote them, and I hand them a copy of my original quote for $25,0000 and say "here".

          About this time, I notice all the file cabinets are covered in blue tarps, and see the roof is leaking from the rain. The office manager tells me that they do this every winter when it rains. I ask why they don't get it fixed.

          "Because when it is raining, they can't fix it, and when it isn't raining, it isn't a problem".

          The funny thing is, they spent the $15K of the original quote the guy quoted, and another $20K in service fees to the same guy trying to fix the new system he just put in ... in A FEW MONTHS!!!

          I came to the conclusion that many lawyers aren't that bright. They pinch pennies while pissing away C notes.

          I have no idea if that law firm's network ever ran right. The office manager quit shortly afterwards.

          SO, it doesn't surprise me that what you saw was in a law office.

          • by hairyfeet (841228)

            Yeah he had to shitcan the whole mess. I ended up with a few nice gamer rigs for practically nothing and I have one of their hubs for free still sitting in a closet somewhere. That's right, Mr "network admin" apparently didn't know what the difference between switches, hubs, and routers and filled the whole damned building with hubs, probably so he could blow more cash on the epeen GPUs he stuck in the homemade gamer rigs. And this was for an office where the heaviest GPU lifting they would be doing was a P

            • I could tell you story after story about lawyers. They're logic sucks. I think it is a requirement of the legal system that logic need not apply to anything.

            • Why that is so hard for law firms to understand I'll never know.

              Because in the field of Law and Business, when someone says it's so, that makes it so. The judge says he's guilty, so he's guilty. A senior partner says you're wrong, so you're wrong. The highest authority in the room are twelve people who have no idea what's going on, and the highest authority in the land are nine people who can't tell you the price of milk.

              People who eat, sleep and breathe in that atmosphere become extremely disconnected from reality. They tend to take it personally when someone tells the

      • Wizards, scripts, GUIs and "automagic" are awesome tools. I love my OSPF. I love my Spanning Tree. I love my VTP. I love my Auto speed and duplex settings. I love every tool that helps me take care of tedium and drudgery.

        But before you hand these tools to a network designer, they absolutely need to understand HOW and WHY those tools do what they do, lest your network ends up looking like it was built by Mickey the Wizard's Apprentice. Powerful tools require MORE skill on the part of the network admin, not l

        • by jcr (53032)

          Your argument basically amounts to this. My young son doesn't have the strength yet to cut firewood safely with an ax and saw, so obviously I need to hand him a top-of-the-line Stihl chainsaw.

          More like, if you want to vaccum the rug, you shouldn't need to know how to series-wind an AC motor.

          Network administration is far more difficult than it should be. I don't see any benefit in casually dismissing the work of anyone who's trying to address the problems.

          -jcr

          • in ignorance, but you better not try to design one that way. If your job is vacuum cleaner DESIGN, it would really help if you knew how to wind that motor, and more importantly, WHY that motor was wound that way. But certainly, if you're the janitor, then feel free to push that handle back and forth, serene in the knowledge that someone else has done the heavy lifting for you.

            I'm talking about the damage this idea would do to network design when Billy-the-uberl33t-LAN-Party-Badass tries to recable his Daddy

    • by sjames (1099)

      Sounds like the skepticism about Ethernet from the Token Ring fans in the day. How could you possibly get any communication done with packets colliding all the time? As for the random backoff, how can adding randomness make a network MORE reliable?!?

      It doesn't sound like their objective has anything to do with allowing trained lemurs to do networking (I thought they were already handing out CCIEs to lemurs). It also doesn't sound like speed is the intent other than allowing larger scale layer 2 switching wi

  • by viking80 (697716) on Wednesday August 19, 2009 @04:34AM (#29116139) Journal

    This seems to be a solution to a nonexistent problem. A big router, for example a cisco CRS, can be a single node supporting any data center. And it is a router, so there is no need for any exotic solution (L3 inspection on a switch?). It has a max bandwidth of 80Tb/s or 80,000 Gb Ethernet nodes. The beauty is of course that you can configure your entire data center with a single router, which greatly simplifies the network configuration, and makes changes simple.

  • by Anonymous Coward

    I wonder if D-Link has any?

    (swoooosh)

  • SMB (Score:2, Funny)

    by pengipengi (1352837)

    And then... let's say 10% of all computers starts up a SMB-share... welcome to broadcast heaven (or hell) :)

  • NATting layer two. (Score:5, Interesting)

    by argent (18001) <peter@AAAslashdo ... minus threevowe> on Wednesday August 19, 2009 @06:29AM (#29116605) Homepage Journal

    They're basically NATting the layer two protocols. Combined with a super spanning tree for the natted addresses they're practically boosting layer two into layer three.

    Before I read the paper I was thinking that it would be easier to just run all your services NATted at layer three, even using something like PPPoE (which is how cable networks solve the same basic problem, with something like half a million end-points on the same subnet). I guess it's more efficient to work with the simpler layer two protocols instead.

  • by Viol8 (599362) on Wednesday August 19, 2009 @07:09AM (#29116783)

    ... they have only needed 1 port! :)

  • Excellent idea! (Score:2, Insightful)

    by dogganos (901230)

    ...and when this switch blows the fuses, you have 100.000 servers offline instead of 24... Brilliant!

  • Good God! THE VLANS! A "show vlans" command would take all day to execute and print out to be thicker than War and Peace.
  • The past does not equal the future. Hardware improves, software improves.

    Just because you were taught from birth that you should have thirty-five 100 port switches in your building and that is what you have always done does not mean you should continue to do it. Network engineers seem to LOVE buying lots of hardware (when given the money). Maybe it's just the cool factor, maybe they want job security? It WOULD be far easier to manage a single switched fabric flat network if you have the hardware and the fai

  • Everyone has chimed in on the nightmare of cable management for something like this. But the idea that this would be a single point of failure for my data center scares me even more.
  • I regularly read Dr. Vahdat's blog [wordpress.com]. I first got interested in it after reading his paper on Epidemic Routing [ucsd.edu] which can be found in his list of publications here [ucsd.edu].

    If you read his blog post you will see that he accomplishes his goal by creating a hierarchical tree of MAC addresses instead of a simple table. He also states that a large part of the proliferation of MAC addresses in these systems is due to virtual machines. Therefore everyone's nightmares of cabling hell are relatively moot.

    Though I haven't cont

    • Re: (Score:3, Interesting)

      by shabble (90296)

      Take great care not to use any MAC addresses that are already in use. One would probably need to purchase/register entire blocks of MAC addresses just as a manufacturer of network adapters must do. Or...

      Or simply use the private/local range of MAC addresses (02:xx:xx:xx:xx:xx) (The MAC address equivalent of ,say, 10/8)?

      • by kasperd (592156)

        Or simply use the private/local range of MAC addresses (02:xx:xx:xx:xx:xx) (The MAC address equivalent of ,say, 10/8)?

        According to wireshark some of those are reserved to actual hardware vendors.

        grep ^02: /usr/share/wireshark/manuf | wc -l
        19

        • by shabble (90296)

          According to wireshark some of those are reserved to actual hardware vendors.

          grep ^02: /usr/share/wireshark/manuf | wc -l
          19

          Assuming that those aren't specifically cited as locally administered addresses, I'm sure there are some duplicates in there as well, something else vendors shouldn't be doing. OUI's shouldn't really be starting with 02.

          http://en.wikipedia.org/wiki/MAC_address#Address_details [wikipedia.org]

          A locally administered address is assigned to a device by a network administrator, overriding the burned-in address. Locally administered addresses do not contain OUIs.

          Universally administered and locally administered addresses are dis

  • Now how long would it take to wire that beast... how many man hours and would it be limited by the army of IT staff all trying to work with a 100,000+ port switch?

    How many IT staff would go mad in the sea of network wires?

    At the point of 100,000+ ports I would rather invest heavily in research to make a wireless switch that can handle 100,000+ connections at Gigabit speeds (and of course a corresponding wireless devices interface for each rack).

  • Lemme see - 100,000 eggs, one basket.

    Good idea.

  • I can't wait until it reaches the limit on the MACS it can learn and just starts forwarding. :-)

  • Ethernet is not always best. Ring topologies have inherent advantages in environments like this that should not be overlooked. Ethernet caught on in large part because of vendors catering to a dumbed-down market.

"Floggings will continue until morale improves." -- anonymous flyer being distributed at Exxon USA

Working...