Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Silicon Graphics Technology

SGI to Scale Linux Across 1024 CPUs 360

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""
This discussion has been archived. No new comments can be posted.

SGI to Scale Linux Across 1024 CPUs

Comments Filter:
  • Solaris (Score:3, Insightful)

    by MrWim ( 760798 ) on Sunday July 18, 2004 @11:41AM (#9731121)
    It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?
  • by puppetluva ( 46903 ) on Sunday July 18, 2004 @12:05PM (#9731267)
    Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure. The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN. This lends itself to an environment than can enjoy nearly 100% uptime. Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

    I don't work for Sun, I'm just an SA that deals with both Solaris and Linux boxes. You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter. If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.
  • by passthecrackpipe ( 598773 ) * <passthecrackpipe&hotmail,com> on Sunday July 18, 2004 @12:27PM (#9731398)
    Cache reduction - ehh cash reduction. One of the prime reasons Sun is losing serious levels of installed base to Linux is not because linux is better, it is because Sun is bloody expensive - outrageously so. And while most customers had to endure the annual fleecing with gritted teeth - due to lack of alternatives - Sun is now being pummeled out of datacenter after datacenter.

    I have replaced Sun Hardware/Software combo's in the core datacenter for many of our customers, and I can tell you that yes - Sun brings some amazing features to the table - most of which are there to serve old technology. Linux on simple CPU's delivers such an amazing price performance (depending on the job, we see an average of 3x to 4x performance increase for 25% of the cost. That means that if I were to spend the same, lifecycle-wise, on a Linux cluster as I would on a big Sun box like the 10k or 15k, I'd end up with 12x to 16x the performance of the Sun solution.

    The same functionality in terms of cpu and ram (and other hardware) failure is available on the Linux cluster, albeit in less graceful form - the magic spell to invoke goes like this:
    shutdown -h now
    if I have 300 machines crunching my data, I can afford to lose a couple, and can afford to have a few hot-standby's.

    Of course, the massively parrallel architecture does not work for all applications, and in those cases you would look to use either OpenMOSIX [openmosix.org] or of course the (relatively expensive) SGI box mentioned in this article.
  • Re:Ummm.... (Score:1, Insightful)

    by Anonymous Coward on Sunday July 18, 2004 @12:28PM (#9731407)
    Ummm, false. We're talking mean time to failure here-- get a 10,000 processors with a MTTF of 10,000 days (27 years) what are your chances one of your processors will fail tomorrow? Or this week/month/year? They don't all last 27 years.
  • by xyote ( 598794 ) on Sunday July 18, 2004 @12:50PM (#9731517)
    Well, we know that the kernel can be made to scale but what about the applications? The same issues the kernel had to face, the applications have to face also. For parallel computing you naturally try to avoid too much sharing by "parallelizing" the programs. For applications like databases, you are talking about a lot of sharing of a lot of data. Not all the techniques the Linux kernel used are available to the applications yet.
  • by shaitand ( 626655 ) on Sunday July 18, 2004 @02:10PM (#9731944) Journal
    If they sell you a copy, they've then distributed it and the gpl requires them to license those changes to you under the gpl.

    Any SGI customer can then contribute the changes back to the kernel long before a year is up.
  • by killjoe ( 766577 ) on Sunday July 18, 2004 @02:59PM (#9732268)
    Oooh you told him! Way to stick up for MS! They need help from you. They can't counteract FUD by themselves with the billions they spend on advertising, astroturfing, financing lawsuits by SCO, and paying for ADTI studies. Thank god MS has people like to you run to their aid whenever somebody says something bad about windows.

    Still though the fact that linux can scale to 1024 processors while windows can only scale to 64 is enough reason to bash windows isn't it? I mean wasn't bill gates recently bashing linux because it was a "toy" and wouldn't scale?

  • Re:Solaris (Score:3, Insightful)

    by timeOday ( 582209 ) on Sunday July 18, 2004 @05:32PM (#9733346)
    A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.
    I've heard that one. I think the guy was right! It was 1943 after all. Somehow we interpret this as, "There will only ever be a market for 5 computers, even if they change so completely that nothing is left of current technology and only the name stays the same."
  • by gazbo ( 517111 ) on Sunday July 18, 2004 @05:56PM (#9733506)
    The real difficulty is getting past the 1024 mark - once you get over 2^10 nodes (2^16 minus 6 status bits), all sorts of assumptions in the multi-CPU scheduling algorithm break, and overflows can occur all over the place.

    Let's hope we hear stories about a 1025 node machine soon!

Quark! Quark! Beware the quantum duck!

Working...