Forgot your password?
typodupeerror
The Internet Hardware

Can Anyone Suggest a Good Switch? 54

Posted by Cliff
from the you-can-never-have-too-many-ports dept.
wgadmin asks: "I am a sysadmin for a 500-node Linux cluster. We are doubling the size of our compute nodes, so, as a consequence, we need a new switch. We currently have a Foundry FastIron 1500 -- however, a) it doesn't have enough ports (only 208) and b) it is extremely unreliable. We want something that's solid as a rock. We process mostly serial jobs. And we will probably require ~320 ports. What's everyone using for their HPC clusters? There's so many high performance switches on the market, we hardly know where to start."
This discussion has been archived. No new comments can be posted.

Can Anyone Suggest a Good Switch?

Comments Filter:
  • by Zapman (2662) on Monday September 20, 2004 @04:42PM (#10301221)
    What level of interconnect do you want? (gig copper? gig fiber? 10/100?)

    Or are you looking for something more specialized (HIPPI compliant or something similarly obscure?)

    That said, if you're looking for in the ethernet space, we've been really happy with our recent Extreme Networks chassie's. Their black diamond 10k line is the newest release, and it looks awesome. It's really dense, they've got crazy levels of backplane bandwidth, and ours have been really reliable (granted, we have the previous generation of the gear). The chassies have blades (just like everyone else) that can speak 10/100, 10/100/1000 copper, gig fiber, 10 gig fiber, etc.
    • We've got the same things, but watch out for their NMM's. Those have bitten us a few times in the past...

      The chassis is solid, tho. They've even improved reliability on their smaller switches (48pt 100Mb + 2 gig slots) but when we first ordered, we had about a 15 % failure rate w/in 3 months...
    • by Evo (37507)
      Do not under any circumstance buy Extreme switches. They are really a bunch of lower-bandwidth units coupled together, which results in crazy packet reordering problems if you are putting streams of any size across them.

      One of our sites tried them.... and practically kicked them straight out of the window. If you care about throughput then you want to stay well clear.
    • by wgadmin (814923) on Monday September 20, 2004 @05:48PM (#10302019)
      Sorry, I forgot to mention that we are interested in gig copper. We are exclusively interested in gig copper. And, as far as anyone has told me, we don't care about HIPPI compliance.
  • 3Com, HP and Dell? (Score:3, Informative)

    by mnmn (145599) on Monday September 20, 2004 @04:44PM (#10301241) Homepage
    We've been using 3com switches and theyre rock solid. I was rooting for cisco a while ago because I'm studying for some certs, but the price difference is huge.

    3Com comes with stackable switches, upto 8 of 48 ports which should be enough. The stacking bus is something like 10gbit or 32gbit for all-gigabit switches.

    The switch market is really (1) Cisco (2) 3Com and (3) HP for market share, and I recommend you go with these. Cisco is more than 100% expensive than anyone else for the same stuff, and I've never been impressed with HP, so look at 3Com. Take a look at all those confusing nortel switches too, the number 4 of the market. You'll most likely find your switch between 3com and cisco, unless you want to give up reliability.
    • At work, we have a couple 3com switches and a bunch of HP's. The 3com switches are absolutely horrible, but the HP switches are absolutely awesome. For much less than a Cisco switch, the HP switches have pretty much identical features, but are easier to configure. (They provide a Cisco compatible command line interface, web interface, and a menu driven interface.) I'd really recommend HP switches. From what I hear, 3com switches are better now, but I thought I'd throw my $0.02 in about them.
      • I'll second the recommendation on HPs. Had good luck with them, and they're not nearly as pricey as the Ciscoes.

  • Extreme Networks (Score:4, Informative)

    by Plake (568139) <rlclark@gmail.com> on Monday September 20, 2004 @04:46PM (#10301272) Homepage
    Extreme Networks [extremenetworks.com] has a great line of switches.

    The Black Diamond 10808 [extremenetworks.com] would work great for the type of envrionment you have setup from the sounds of it. Also, Extreme is usually 20-40% cheaper then Cisco and Foundry for the equivilant appliance.

    We currently use an Alpine 3808 [extremenetworks.com] with 192 100mbps ports and it's never had a problem with uptime and configuration is a simple and straightforward.
    • Re:Extreme Networks (Score:5, Informative)

      by unixbob (523657) on Monday September 20, 2004 @05:11PM (#10301587)
      We also use Extreme Switches and I can vouch for their reliability and performance. Instead of going for the "one big switch" approach though, we've got a pair of Black Diamond 6808's with 1u 48 port Summit 400 edge switches uplinked back to the core switch (excuse the marketing terminology). This makes cabling much tidier when you have a high number of servers as you can locate the edge switches all around the server room then just have the cables from the Summit's in the rack with the Black Diamond. It makes deploying new kit much easier, and tracing cables much easier as well. You don't end up with the switch rack being a massive mess of untraceable patch cables. The only servers that are patched directly into the Black Diamonds are those using the NAS (because they need as much bandwidth as possible)
    • Our department [ic.ac.uk] uses Extreme [extremenetworks.com] hardware throughout in a fairly large network deployment.

      We have two Linux clusters -- Viking, a 512 processor P4 cluster, and Mars, a 400 processor Opteron cluster (being commissioned). We also host a number of large Sun machines, including SunSite Northern Europe.

      The department also hosts a 250+ node teaching lab and several floors of staff and research desktops, each with 100Mbit+ to the desk and a 1Gbit uplink from each switch. At the middle of our network are two Black D
  • force 10 (Score:4, Informative)

    by complex (18458) <complex@@@split...org> on Monday September 20, 2004 @04:51PM (#10301340) Homepage
    http://www.force10networks.com/ [force10networks.com] claim to have the higest port density.
    • Re:force 10 (Score:3, Insightful)

      by PSUdaemon (204822)
      Yes, these guys are awesome. We just got one of their switches for our cluster. All ports are line speed, no over subscription. They are also soon to announce some higher density line cards for their existing chassis in the upcoming months. Definitely give them a look.
      • AFAIK, Force10 switches may not be good for clusters because of larger latency compared to others. Have not tested Force10, this info is based on papers.

        I recently made some performance testing on switches and routers that had large number of GE ports. What I learnt from that, was that a switch/router that claims "wirespeed" switching/routing at GE may start dropping packets at 40% load. And listing rfcNNNN on specs does not imply that implementation supports even minimum of specifications or could sup

  • Look no farther than TopSpin [topspin.com] for high bandwidth, low latency interconnect.

    Of course if cost is an object - I guess you are stuck on simple GbE, rather than a faster interconnect. You should look at 270 [topspin.com] for high density interconnect... Throw in some 360's for outside connectivity and you are set.

  • by photon317 (208409) on Monday September 20, 2004 @04:53PM (#10301376)

    Cisco is the de-facto brand of networking gear for standard stuff like Ethernet. How much better are these high-performance switches people are talking about as suggestions in the comments here? This is not a rhetorical question, I just realy want to know and I'm too lazy and uninterested to look into it myself, but not lazy enough to stop typing this slashdot post. Is it enough to be worth going with non-Cisco for HPC clusters that use Ethernet-based interconnects? I know Cisco isn't infallible, but for all kinds of reasons they're a good bet in networking gear come purchasing time, at least outside this HPC cluster business.
    • He could be looking for a vendor that does not include back doors [slashdot.org]. Admittedly that story was not about switches but it still is telling about the company.
    • Well, I have 1 really good reason to not go Cisco: $$$$$$$

      When we moved our building, we got to rearchitect our whole network. Great experience, if you can get the company to foot the bill. There was a LOT of political pressure to go Cisco, but it turns out that their lowest end enterprise level switching gear was still twice the cost of Extreme Networks highest end gear. In the end, the CIO/CFO couldn't get past the price tag difference, despite the cisco brand recognition.

      That, and the lowest end ex
      • I agree that Cisco has some crazy-ass pricing schemes.

        Let me tell you though if you need pricing just PM me. I have 41% (that's 1% better than most).

        If you need it for the super cluster just let me know.

        PS. I'm not a seller. I'm a buyer.

        TTYL.
  • FNN (Score:2, Informative)

    by bofkentucky (555107)
    Flat neighborhood networks [aggregate.org], basically you get to use "cheap" cards and switches in a web configuration to provide a fast interconnect between nodes.

    Other than that, a Cisco 6513 with 11 10/100/1000 48 port switch cards would fit the bill to provide a single chasis switch for all 500 nodes. Hope you've got a decent budget, because it will cost you.
    • Re:FNN (Score:2, Informative)

      by rnxrx (813533)
      The 6513 won't support 11 48-port 10/100/1000 linecards if you expect anything close to wire speed. The only blades that would even come up in that application would be 6148-GE-TX or 6548-GE-TX. In both cases the interconnect between the blade and the backplane is very oversubscribed. If you want to run at full speed, only a 6509 with supervisor 720's and the 6748 linecards will approach line rate at high density. The 6513 has serious limitations on how many high-speed blades can be employed. I would
  • Forget ethernet (Score:3, Informative)

    by keesh (202812) * on Monday September 20, 2004 @04:58PM (#10301440) Homepage
    Give serious thought to FC-IP and director-class fibrechannel kit. Performance-wise it'll thrash Ethernet, and there're various clever tricks you can do with directors clustered together via Open Trunking meaning that a bunch of 160 port boxes (a McData 6140 is your best bet here) will do as well as a larger single box.
    • Re:Forget ethernet (Score:3, Interesting)

      by FreeLinux (555387)
      Ever seen a 500-1000 port FC switch or trunked network? Any idea what such a beast, if it existed, would cost? What's the cost on 500-1000 FC interface cards for the PCs? You do know that most PCs now come with gigabit ethernet onboard, right?
  • Stackable 48 Ports (Score:4, Informative)

    by DA-MAN (17442) on Monday September 20, 2004 @05:06PM (#10301534) Homepage
    I'm a sysadmin for a 3 large clusters in the same league, we use stackable 48 port Nortel switches. Each switch is 1u, and the interconnects don't use a separate port. The switches have wildly expensive support options, however because it just works we've never had to pay for support on them.

    We use to have Foundry ourselves, but their switches were crap, they would suddenly become dumb hubs and lose their ip, etc.

    We tried HP, but found their interface cumbersome and unfamiliar with weird networking related issues that would pop up.

    Cisco's been rock solid, but very expensive.
  • The fastiron is fine for the job. Upgrade to a BigIron 15k or something if you want more ports.

    If your fastiron is unreliable, it's broken and you need it fixed, it's not normally broken.

    The LiNX runs on BigIron 8k and Exreme Networks blackdiamonds, both of which are pretty damn good.
    • I fully one hundred percent agree - a Foundry Big Iron 15000 is exactly what he needs. I work for one of the largest dot coms on the planet, and every piece of our network runs on Foundry gear. Not a single chassis or blade has failed on us in five years.

      Whatever he's running is broken.
  • by beegle (9689) * on Monday September 20, 2004 @05:25PM (#10301745) Homepage
    Send email to a few supercomputing centers. These places have tons of clusters, with lots of vendors throwing hardware at them. They're also often associated with schools, so they're not competitors and they actually -want- people to learn from what they've done.

    To get you started:
    http://www.ncne.org
    http://www.psc.edu
    http://www.sdsc.edu
    http://www.ncsa.edu

    Yeah, it's Pittsburgh-centric. Guess where I'm posting from. There's probably somewhere closer to you.

    The things you want to figure out before calling:

    -What's your budget? (Nice stuff tends to be more expensive)

    -How much does latency matter? (Usually, lots. Sometimes, not so much. Put numbers here.)

    -What's your architecture (at several levels of technical detail)? Can you use 64-bit PCI? Do you have to work with a proprietary bus? Can you use full-height, full-length cards? What OS -exactly- are you using? (Hint: "Linux" ain't close enough.) What version and vendor of PVM/MPI/whatever are you using, and can you switch?
  • Cisco 65XX (Score:5, Informative)

    by arnie_apesacrappin (200185) on Monday September 20, 2004 @05:34PM (#10301838)
    If you're looking for Gig over copper, the 6509 will probably give you the density you want in a single device. It has 9 slots, one of which is filled by the supervisor module. If you want to upgrade to the 720 Gbps switch fabric, I think that takes another slot, but could very well be wrong. But with 7 available slots at 48 ports per 10/100/1000 blade you would have 336 connections.

    The 6513 is basically the same thing but with four extra slots.

    The 6509 chassis lists at $9.5K and the 6513 $15.25K. That's completely bare bones. The supervisor modules run anywhere from $6K to $28K at list. The 48 port 10/100/1000 modules list at $7.5K while a 24 port SFP fiber blade lists for $15K. You'll need two power supplies at $2K-5K each.

    On the cheap end, to get the port density you're looking for out of Cisco, you'll pay about $70K list. But if you find the right reseller, you can see a discount of 30-40%.

    All numbers in this post should be considered best guess, based on quotes I've gotten. They may be out of date. They are not official prices from Cisco. Take with the appropriate grain of salt.

    • Yep, I was thinking the same thing. The 6513 is a nice box. The SUP 720 [cisco.com] is a pretty ridiculous board and expensive of course. Also, at present time I believe you can only run IOS on the SUP, which may be a consideration. I hate IOS for switching but using it is worth it if you have a SUP 720.
      • Re:Cisco 65XX (Score:2, Interesting)

        by rnxrx (813533)
        You can run hybrid mode/CatOS on the sup720 (catos 8.x, I think). I also would not use the 6513 - only slots 9-13 actually run at full speed. You're limited to half the backplane bandwidth on slots 1-8.
  • obvious, maybe? (Score:1, Informative)

    by IWX222 (591258)
    I'd just say go for the most that you can afford.
    Our 3Coms have served us well, and between them, they work with anything from 10M ethernet to 2gig fibre optic.

    I worked with a major brewer for a while, and their Cisco kit was very reliable, but it never had to handle much of a load. It did survive being kicked about, dropped, my boss's driving it several hundred miles unsecured in the back of a van. I doubt out 3com kit would have survived that!

    Basically, if you can afford Cisco, go for it. If not, us
  • They are definitly the best choice.
  • by scum-o (3946) <{bigwebb} {at} {gmail.com}> on Monday September 20, 2004 @05:49PM (#10302036) Homepage Journal
    We're using the unmanaged HP procurve modular 1Gbps switches in our clusters, but they run VERY HOT when utilized (our switches get hammered 24/7 - like most clusters probably do) and we had some overheating issues with them. Our clusters aren't as large as yours, but I'd suggest going with a major manufacturer (IBM, HP, Cisco) if you're putting all of your eggs in one basket (switch-wise).

    One thing is get a switch that's modular (most good ones are), but if something goes out, you'll only loose 8 or 32 nodes instead of the whole switch.
  • Nortel Passport (Score:3, Informative)

    by FreeLinux (555387) on Monday September 20, 2004 @05:51PM (#10302064)
    Nortel's Passport 8600 [nortelnetworks.com] 384 ports per chassis, true wire-speed, redundant everything, layer 2-7 switching. Also, if you need more ports simply add another 8600 and use Multi-link-trunking (MLT) between the switches. Wash Rinse Repeat. Networks that use these are smokin!

    Of course, if you are looking for the typical Ask Slashdot for free solutions answer you can forget it. These puppies cost a bundle.
  • by Enrico Pulatzo (536675) on Monday September 20, 2004 @07:45PM (#10303143)
    try a hickory tree. Stings like hell and the mere thought is a deterrant for most rascals and rapscallions.
    • by Anonymous Coward
      I disagree, my best switch was switching from Windows to MacOS.
  • Um... (Score:2, Funny)

    by robochan (706488)
    I can't really give you a solution at the moment...
    but are you hiring? ;o)
  • build a fat tree (Score:3, Interesting)

    by aminorex (141494) on Tuesday September 21, 2004 @05:03AM (#10306242) Homepage Journal
    isn't almost all the latency in your network from software? why not build a hypertree from cheap 24-port switches? At $200 a pop, you could make a
    tree with 12 roots for $8000. spend more to get
    more cross-section bandwidth, less for less. it
    scales with your budget.
  • Go with Cisco Catalyst switches. Because of the number of systems you have on that net, I'm guessing that cost is less of an object than it would be for a small business, so go with Cisco. The server farm for the company I'm consulting for right now needs to be up 24/7/365, so that's what we use. They 3550 series has a bunch of different options (10/100 or 1000 and 10/100 with gigabit uplinks).
  • Selecting a switch fabric calls for more information than that. That your jobs are primarily serial implies that latency isn't a high priority, and bisection bandwidth might even be of minimal importance but other factors come into play.

    For example, are you using NFS over the network? How large are your data sats? Do you tend to just queue jobs and have them start/finish whenever, or do you tend to launch a lot of jobs at once? Do you have to transfer a data set with each job, or is it more a matter of ru

"Don't talk to me about disclaimers! I invented disclaimers!" -- The Censored Hacker

Working...