Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Silicon Graphics Technology

SGI to Scale Linux Across 1024 CPUs 360

im333mfg writes "ComputerWorld has an article up about an upcoming SGI Machine, being built for the National Center for Supercomputing Applications, "that will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors and 3TB of shared memory.""
This discussion has been archived. No new comments can be posted.

SGI to Scale Linux Across 1024 CPUs

Comments Filter:
  • Whoa! (Score:5, Funny)

    by rylin ( 688457 ) on Sunday July 18, 2004 @10:39AM (#9731090)
    Sweet, now we'll be able to run Doom3 at highest detail in *SOFTWARE*-rendering mode!
  • Ok (Score:5, Funny)

    by CableModemSniper ( 556285 ) <.moc.liamg. .ta. .odlapacnagol.> on Sunday July 18, 2004 @10:39AM (#9731094) Homepage Journal
    But does it run--crap. I mean what about a Beowulf--doh!
    Damn you SGI!
  • Longhorn (Score:3, Funny)

    by Anonymous Coward on Sunday July 18, 2004 @10:40AM (#9731105)
    Yeah, but can it run Longhorn?
  • by b1t r0t ( 216468 ) on Sunday July 18, 2004 @10:40AM (#9731109)
    Intel's sales figures for Itanic^Hum CPUs more than doubled as a result.
  • Solaris (Score:3, Insightful)

    by MrWim ( 760798 ) on Sunday July 18, 2004 @10:41AM (#9731121)
    It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?
    • Re:Solaris (Score:5, Informative)

      by justins ( 80659 ) on Sunday July 18, 2004 @10:59AM (#9731236) Homepage Journal
      Solaris is not a leader in supercomputing, never has been.

      http://top500.org/list/2004/06/

      There's no "stronghold" for Sun to lose.
    • by vlad_petric ( 94134 ) on Sunday July 18, 2004 @11:03AM (#9731253) Homepage
      Sun processors execute server workloads (database, app server) very well, but that's pretty much it. The emphasis with such workloads is on the memory system. Boatloads of caches do the job. It's also more effective to have tons of processors that are very simple, than just a couple of them that are complex and powerful.

      Scientific computing means data crunching (floating point). Complex, powerful processors are needed. The "stupider, but more" tradeoff doesn't work anymore. Sun processors have fallen behind in this respect.

    • Re:Solaris (Score:5, Interesting)

      by mrm677 ( 456727 ) on Sunday July 18, 2004 @11:03AM (#9731255)
      It seems that if they pull this off one of the dtrongholds of solaris (namely massivly parralell computing) will have been conqurered by linux. I wonder how sun are feeling at the moment?

      Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

      Lame analogy: many people have demonstrated that they can hack their Honda Civic to outperform a Corvette, however I can walk into a dealership and purchase the latter which performs quite well without mods.
      • Re:Solaris (Score:5, Interesting)

        by kasperd ( 592156 ) on Sunday July 18, 2004 @11:20AM (#9731363) Homepage Journal
        Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

        I wouldn't be surprised to see these changes in the 2.8 kernel. And what will people do until then I hear some people ask. I can tell you that right now it is very few people that actually have the need to scale to 1024 CPUs. And that will probably also be true by the time Linux 2.8.0 is released. AFAIK Linux 2.6 does scale well to 128 CPUs, but I don't have hardware to test it, neither does any of my friends. So I'd say there is no need for a rush to get this in mainstream, the few people that need this can patch their kernels. My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.
        • Re:Solaris (Score:4, Funny)

          by Nasarius ( 593729 ) on Sunday July 18, 2004 @11:31AM (#9731425)
          My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

          640 CPUs are enough for anyone? :)

          • Re:Solaris (Score:3, Interesting)

            by isorox ( 205688 )

            My guess is that in the time from now until 2.8.0 is released, we will see less than 1000 such machines worldwide.

            640 CPUs are enough for anyone? :)

            A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.

            Claims are very difficult to make, and impossible to proove. However putting a time limit on a claim is easy. 2.8.0 will be released in 05 or 06, maybe we'll all have 1024CPU boxes in 20 years, but in 20 months?

            • Re:Solaris (Score:3, Insightful)

              by timeOday ( 582209 )
              A better retort would be "There's a world market for maybe 5 computers" by the IBM dude.
              I've heard that one. I think the guy was right! It was 1943 after all. Somehow we interpret this as, "There will only ever be a market for 5 computers, even if they change so completely that nothing is left of current technology and only the name stays the same."
      • Re:Solaris (Score:3, Interesting)

        by Waffle Iron ( 339739 )
        Solaris scales to hundreds of processors out-of-the-box. Until the vanilla Linux kernel accepts these changes and scale, Solaris still has a big edge in this area.

        If someone buys one of these clusters from SGI, then it does scale "out of the box" as far as they're concerned.

    • by puppetluva ( 46903 ) on Sunday July 18, 2004 @11:05AM (#9731267)
      Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure. The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN. This lends itself to an environment than can enjoy nearly 100% uptime. Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

      I don't work for Sun, I'm just an SA that deals with both Solaris and Linux boxes. You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter. If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.
      • by r00t ( 33219 ) on Sunday July 18, 2004 @11:20AM (#9731358) Journal
        Linux runs on both of these, with official IBM support on the zSeries. On the IBM hardware, go ahead and swap out CPUs and memory. It's supported, today, with Linux.

        The Sun hardware is more difficult to deal with, since there isn't a virtual machine abstraction. You can't do everything below the OS. Still, Linux 2.6 has hot-plug CPU support that will do the job without help from a virtual machine. Hot-plug memory patches were posted a day or two ago. Again, this is NOT required for hot-plug on the zSeries. IBM whips Sun.

        I'd trust the zSeries hardware far more than Sun's junk. A zSeries CPU has two pipelines running the exact same operations. Results get compared at the end, before committing them to memory. If the results differ, the CPU is taken down without corrupting memory as it dies. This lets the OS continue that app on another CPU without having the app crash.

        • What applications are you suggesting be run on this hugely expensive mainframe?
          • Du-uh (Score:2, Interesting)

            As everybody that has read the IBM redbooks about mainframe linux knows, Sendmail is the service of choice! Of course, you could run Postfix on a decrepid old pentium-1 and get the same level of perfomance, but that won't help IBM with their Mainframe income, will it?
            • On a serious note, I can't think of any app other than Oracle that's of any use beyond the OSS stuff.

              I think it's funny how Sun is either far too expensive and we're being told to run everything on a few old 486s from the back of the office cupboard, or that Linux on a mainframe is the way to go.
        • If the results differ, the CPU is taken down without corrupting memory as it dies.

          A few questions:
          • What if an error happens in the comparison unit?
          • What happens to the program that was running on the CPU as it is taken down? (The CPU registers is part of the program state, so you cannot just continue on another CPU).
      • by passthecrackpipe ( 598773 ) * <passthecrackpipe.hotmail@com> on Sunday July 18, 2004 @11:27AM (#9731398)
        Cache reduction - ehh cash reduction. One of the prime reasons Sun is losing serious levels of installed base to Linux is not because linux is better, it is because Sun is bloody expensive - outrageously so. And while most customers had to endure the annual fleecing with gritted teeth - due to lack of alternatives - Sun is now being pummeled out of datacenter after datacenter.

        I have replaced Sun Hardware/Software combo's in the core datacenter for many of our customers, and I can tell you that yes - Sun brings some amazing features to the table - most of which are there to serve old technology. Linux on simple CPU's delivers such an amazing price performance (depending on the job, we see an average of 3x to 4x performance increase for 25% of the cost. That means that if I were to spend the same, lifecycle-wise, on a Linux cluster as I would on a big Sun box like the 10k or 15k, I'd end up with 12x to 16x the performance of the Sun solution.

        The same functionality in terms of cpu and ram (and other hardware) failure is available on the Linux cluster, albeit in less graceful form - the magic spell to invoke goes like this:
        shutdown -h now
        if I have 300 machines crunching my data, I can afford to lose a couple, and can afford to have a few hot-standby's.

        Of course, the massively parrallel architecture does not work for all applications, and in those cases you would look to use either OpenMOSIX [openmosix.org] or of course the (relatively expensive) SGI box mentioned in this article.
        • If you're lucky enough to have a massively-parallel, read-only application, then go for Linux clusters.

          Read the Sun Blueprints (http://www.sun.com/blueprints/browsesubject.html # cluster) for how a real cluster works - actaully caring about data integrity. That is the crux with clustered systems: What happens if one node "goes mad" even though it's no longer a "valid" part of the cluster?
          Look into Sun's dealing with failure-fencing; it's drastic (PANIC a node if it can't be sure it's a cluster member) but

      • The same holds true for memory, network interfaces, etc. Solaris is aware of these hardware features and can "map out" the bad memory and cpus on the fly (or allow swap-in replacements). The engineers can then replace the broken cpus/memory/interfaces WITHOUT BRINGING THE MACHINE DOWN.

        Does that happen in real life?
        Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?
        Who has seen that a memory chip or a cpu was hot-swapped in a pro

        • by Jeff DeMaagd ( 2015 ) on Sunday July 18, 2004 @12:19PM (#9731653) Homepage Journal
          Hot swapping components sounds great, but what if the screwdriver slips out of the finger of the engineer and causes a short?

          The systems I've seen that have hot-swap PCI cards have plastic partitions between the slots to prevent the cards from touching each other when hot swapping them.

          I'm not sure why the hypothetical screwdriver in such a tech's hands. Many systems have non-screw means of retaining memory, PCI cards, CPUs and such.
      • While Sun stuff is good. Altix xvm and cxfs blows away the entire line of Sun foundation suite of leadville stack, solstice disksuite. Not to mention sun cluster is completely overrated. They have to rely on veritas cluster to pull them through. If people really follow up with SGI, they're clearly on the rebound in the hi-end market. But even if they beat Sun, they still won't beat IBM in this sector.

      • by justins ( 80659 ) on Sunday July 18, 2004 @11:53AM (#9731531) Homepage Journal
        You don't pick sun for just "lots of cpus", you pick it for a very scalable OS and amazing hardware that allows for a very, very solid datacenter.

        The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

        Sun hardware has additional, wonderful resiliency features like - allowing cpu's to "fail-over" to other cpus in case of failure.

        None of that is unique to Sun.

        Finally, since Sun has been doing the "lots of cpus" thing for many years, their process management and scalability tends to be much better.

        Better than what? And says who? They've never decisively convinced the market that they're beter at this than HP, SGI, IBM or Compaq.

        If downtime costs a lot (ie. you lose a lot of money for being down), you should have Sun and/or IBM zseries hardware. Unfortunately those features cost a lot and most times you can use Linux clustering instead for a fraction of the cost and a high percentage of the availability.

        In addition to ignoring the other good Unix architectures out there in a dumb way with this comparison, you're also totally missing the point of the article. Linux supercomputing isn't just about cheap clusters anymore. Expensive UNIX machines on one side and cheap Linux clusters on the other is a false dichotomy.
        • Scalability of sorts (Score:4, Informative)

          by Decaff ( 42676 ) on Sunday July 18, 2004 @03:58PM (#9733159)
          The UNIX made by SGI (the company making the machine referenced in the article) is more scalable than Solaris. Remember, IRIX was the first OS to scale a single Unix OS image across 512 CPUs. And now they've eclipsed that, with Linux.

          Scalability is a complex issue. SGI has put a whole lot of processors together and put a single Linux image on it (so that a single program can use all memory), but this says nothing about how that setup will actually perform for general purpose use. Just because the hardware allows threads on hundreds of processors to make calls into a single Linux kernel, does not mean that there will not be major performance issues if this actually happens.

          There are performance issues with memory even on single processor systems with nominally a single large address space, and a developer may need to put a lot of work into ensuring that data is arranged to make best use of the various levels of cache.

          Many of the multi-processor architectures require even greater care to ensure that the processors are actually used effectively.

          The fact that a single Linux image has been attached to hundreds of processors is no indication of scalability. A certain program may scale well, or not.
      • I realize you were trying to defend Sun, but in this case the vendor (SGI) has far more experience with large systems than Sun does. At every point over the last 16 years (since SGI announced the original PowerSeries on 10/4/88), SGI has always supported more processors running a single OS than Sun. Those processors were MIPS based, but the Altix architecture is derived from the bricks/bus of the Origin servers.

        The flip-side of this is that SGI has been in decline for several years longer than Sun and ma
    • SGI has been playing in this NUMA market ever since they bought Cray about a decade ago. The T3 had a similar number of Alpha processors. The current Origin scales to 1024 MIPS processors. I believe both systems ran IRIX. The T3 may have used UNICOS. The point is the only thing new here is Linux on a 1024 processors. And even then SGI already has a 256 Itanium Linux system.
  • Why gaming? (Score:2, Funny)

    by wyldwyrm ( 693798 )
    Obviously this would be overkill for doom3(altho I'd still like to have it in my apartment as a space heater/server)! Ok, so it would be more than a space heater; I'd have to run my a/c 24/7/365.25, with all my windows open in the winter. But rendering would be sooooo sweet.
  • Press Release (Score:4, Informative)

    by foobsr ( 693224 ) on Sunday July 18, 2004 @10:44AM (#9731142) Homepage Journal
    The link to the press release [sgi.com] as of July 14.

    CC.
  • by mangu ( 126918 ) on Sunday July 18, 2004 @10:47AM (#9731166)
    ...how easy it is to install printer and sound drivers?
  • by k4_pacific ( 736911 ) <k4_pacific@yahoo . c om> on Sunday July 18, 2004 @10:49AM (#9731174) Homepage Journal
    Microsoft made a statement today reminding everyone that Windows Server 2003 can handle as many as 32 processors, at the same time even.

    When shown the report about Linux running on 1024 processors, Gates purportedly responded, "32 processors ought to be enough for anybody."
  • With the exception of the NUMA stuff, is there software available to re-create this? I'm not even sure what to search for; would this still be considered a "cluster"?
    • by dwgranth ( 578126 ) on Sunday July 18, 2004 @11:39AM (#9731463) Journal
      well, sgi uses/hacks NUMA, spinlocks, etc to make this happen in a more efficient manner. We recently had a SGI rep come and explain their 512CPU architechture at our LUG meeting... and he basically said that SGI has their own implementation of all of the clustering/cpu stacking techs... which they will eventually feed back into the community.. all good stuff.. understandably they will wait for a year or so so they can get their money's worth before they release their changes.
      • If they sell you a copy, they've then distributed it and the gpl requires them to license those changes to you under the gpl.

        Any SGI customer can then contribute the changes back to the kernel long before a year is up.
  • by InodoroPereyra ( 514794 ) on Sunday July 18, 2004 @11:09AM (#9731288)
    From the article:
    Earlier cluster supercomputers at the NCSA used multiple images of the Linux operating system -- one for each node -- along with dedicated memory allocations for each CPU. What makes this system more powerful for researchers is that all of the memory will be available for the applications and calculations, helping to speed and refine the work being done, Pennington said.

    "The users get one memory image they have to deal with," he said. "This makes programming much easier, and we expect it to give better performance as well."

    So, anyone has any insights as to why/how this matters for the programmers ? Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ? Or maybe they will still run using MPI on the big shared memory pool, and each process will be sent to the appropriate node by the OS on demand ? Thanks !
    • I presume it means that the application can directly access the shared data through the NUMA memory rather than using mpi to access the data from whatever node it thinks the data is on. Data coherency gets moved down into the hardware/kernel layer rather than up at the application layer. The communication of the data would be done by a low latency interconnect either way.

      This article is news to me. My impression was that HPC programmers preferred mpi over shared memory multi-threading because they found

      • by kscguru ( 551278 ) on Sunday July 18, 2004 @01:33PM (#9732081)
        Caveat: I think MPI itself is very recent (standardized only w/in the past few years), before that everyone used custom message-passing libraries.

        It's a tradeoff. MPI is "preferred" because a properly written MPI program will run on both clusters and shared-memory equally fast, because all communication is explicit. It's also much harder to program, because all communication must be made explicit.

        Shared-memory (e.g. pthreads) is easier to program in the first place (since you don't have to think about as many sharing issues) and more portable. However, it is very error-prone - get a little bit off on the cache alignment or contend too much for a lock, and you've lost much of the performance gain. And it can't run it on a cluster without horrible performance loss.

        If it's the difference between spending two months writing the shared-memory sim and four months writing the message-passing sim that runs two times faster on cheaper hardware, well, which would you choose? Is the savings in CPU time worth the investment in programmer time?

        Alas, the latencies on a 1024-way machine are pretty bad anyway. If they use the same interconnect as the SGI Origin, it's 300-3000 cycles for each interconnect transaction (depending on distance and number of hops in the transaction). Technically that's low-latency... but drop below 32 processors or so, and the interconnect is a bus with 100 cycle latencies, so those extra processors cause a lot of lost cycles.

    • Such a system looks like a _huge_ SMP box. So you can run stuff like Postgresql which as of 7.4.x doesn't cluster easily because it requires shared memory between processes.

      Sharing memory between processes running on different machines that are indeed separate machines is not that easy. Often requires fancy hardware and software.

      while the SGI solution also involves fancy hardware and software, I believe a single process gets to have terabytes of memory, which is rather different from the common cluster ar
    • by Sangui5 ( 12317 ) on Sunday July 18, 2004 @12:42PM (#9731760)

      Does this mean that the applications running on the "old" clusters, presumably using some flavor of MPI to communicate between nodes, will have to be ported somehow to become multithreaded applications ?

      NCSA still has plenty of "old" style clusters around. Two of the more aging clusters, Platinum [uiuc.edu] and Titan [uiuc.edu] are being retired, to make room for newer systems like Cobalt. Indeed, the official notice [uiuc.edu] was made just recently--they're going down tommorrow. However, as the retirement notice points out, we still have Tungsten [uiuc.edu], Copper [uiuc.edu], and Mercury (Terragrid) [uiuc.edu]. Indeed, Tungsten is number 5 on the Top 500 [top500.org], so it should provide more than enough cycles for any message-passing jobs people require.

      So, anyone has any insights as to why/how this matters for the programmers ?

      What it means is that programming big jobs is easier. You no longer need to learn MPI, or figure out how to structure your job so that individual nodes are relatively loosely-coupled. Also, jobs that have more tightly-coupled parallelism are now possible. The older clusters used high-speed interconnects like Myrinet or Infiniband (NCSA doesn't own any Infiniband AFAIK, but we're looking at it for the next cluster supercomputer). Although they provided really good latency and bandwidth, they aren't as high-performing as shared memory. Also, Myrinet's ability to scale to huge numbers of nodes isn't all that great--Tugsten may have 1280 compute nodes, but a job that uses all 1280 nodes isn't practical. Indeed, untill recently the Myrinet didn't work at all, even after partitioning the cluster into smaller subclusters.

      This new shared-memory machine will be more powerful, more convienient, and easier to maintain than the cluster-style supercomputers. Hopefully it will allow better scheduling algorithms than on the clusters too--an appaling number of cycles get thrown away because cluster scheduling is non-preemptive.

      I'd also like to point out some errors in the Computerworld article. NCSA is *currently* storing 940 TB in near-line storage (Legato DiskXtender running on an obscenely big tape library), and growing at 2TB a week. The DiskXtender is licenced for up to 2 petabytes--we're coming close to half of that now. The article therefore vastly understates our storage capacity. On the other hand, I'd like to know where we're hiding all those teraflops of compute--35 TFLOPS after getting 6 TFLOPS from Cobalt sounds more than just a little high. That number smells of the most optimistic peak performance values of all currently connected compute nodes. I.e. - how many single-precision operations could the nodes do if they didn't have to communicate, everything was in L1 cache, we managed to schedule something on all of them, and they were all actually functioning. Realistically, I'd guess that we can clear maybe a quarter of that figure, given machines being down, jobs being non-ideal, etc. etc. etc.

      As a disclaimer, I do work at NCSA, but in Security Research, not High-Performance Computing.

  • .... to see just how far you can stretch a bit before the MP loses control....
  • by gsasha ( 550394 ) on Sunday July 18, 2004 @11:16AM (#9731331) Homepage
    I wish I had that much disk space...
  • Coincidence? (Score:2, Redundant)

    by Quirk ( 36086 )
    First Doom 3 now this... coincidence? I don't think so.
  • by Sidicas ( 691633 ) on Sunday July 18, 2004 @11:22AM (#9731377) Journal
    "will run a single Linux operating system image across 1,024 Intel Corp. Itanium 2 processors..."
    "The National Center for Supercomputing Applications will use it for research"


    1. Make a system that generates more heat than a supernova.
    2.Research a solution to global warming.
    3. Profit!

  • by ShadowRage ( 678728 ) on Sunday July 18, 2004 @11:23AM (#9731379) Homepage Journal
    SCO gained $715,776
  • Wow (Score:2, Funny)

    by Steamhead ( 714353 )
    Hot damn, this is one server that could survive a slashdotting.
  • There goes SGI again, violating SCO's copyright! I'm gonna tell Darl McBride on them!
  • Impressive... (Score:2, Informative)

    ...Right on the heels of this [computerworld.com] too.
  • by xyote ( 598794 ) on Sunday July 18, 2004 @11:50AM (#9731517)
    Well, we know that the kernel can be made to scale but what about the applications? The same issues the kernel had to face, the applications have to face also. For parallel computing you naturally try to avoid too much sharing by "parallelizing" the programs. For applications like databases, you are talking about a lot of sharing of a lot of data. Not all the techniques the Linux kernel used are available to the applications yet.
    • by xtp ( 248706 ) on Sunday July 18, 2004 @12:48PM (#9731796)
      SGI has had 512 and 1024-cpu MIPS-based systems in operation for more than 5 years. Much work was done on the Irix systems to initialize large parallel computations and provide libraries and compiler support for these configurations. One technique is to provide message-passing libraries that use shared memory. A better technique is to morph (slightly) parallel mesh apps so that each computational mesh node exposes the array elements to be shared with neighbors. No message-passing needed - you push data after a big iteration and then use the (really fast) sync primitives to launch into the next iteration. With shared-nothing clusters (i.e. Beowulf) a computation (and its memory) must be partitioned among the compute nodes. The improvement over a "classical" cluster can be startling especially with computations that are more communications-bound than compute-bound. This means there is no value for replacing a render farm with a big system. But there are big compute problems, e.g. finite element, for which the shared-nothing cluster is often inadequate.

      With a single memory image system the computation can easily repartition dynamically as the computation proceeds. Its very costly (never say impossible!) to do this on a cluster because you have to physically move memory segments from one machine to another. On the NUMA system you just change a pointer. The hardware is good enough that you don't really have to worry about memory latency.

      And let's not forget io. Folks seem to forget that you can dump any interesting section of the computation to/from the file system with a single io command. On these systems the io bandwidth is limited only by the number of parallel disk channels - a system like the one mentioned in the article can probably sustain a large number of GBytes/sec to the file system.

      Let's not forget page size. The only way you can traverse a few TB of memory without TLB-faulting to death is to have multi-MByte-size pages (because TLB size is limited). SGI allowed a process to map regions of main memory with different page sizes (upto 64 MB I think) at least 10 years ago in order to support large image data base and compute apps.

      When I used to work at SGI (5 years ago) the memory bandwidth at one cpu node was about 800 MBytes/s. My understanding is that the Altix compute nodes now deliver 12 GBytes/s at each memory controller. Although I haven't had a chance to test drive one of these new systems, it sounds like they have gradually been porting well-seasoned Irix algorithms to Linux. It is unlikely that a commodity computer really needs all of this stuff, but I'm looking at a 4-cpu Opteron that could really use many of the memory management improvements.

      g
  • by Bruha ( 412869 ) on Sunday July 18, 2004 @11:52AM (#9731527) Homepage Journal
    Fire up apache and then post a link to it here on slashdot. We love a challenge.
  • by Anonymous Coward on Sunday July 18, 2004 @12:08PM (#9731607)
    That's almost enough to run Emacs!
  • 1024 physical CPUs running *one* logical host linux image running god knows how many uml instances, each fully independent of the other and seeing 3 TB of memory. The mind boggles! :-)

  • by CyBlue ( 701644 ) on Sunday July 18, 2004 @03:19PM (#9732911)
    I've been working all weekend to cluster 4 Honda Civics. When I'm done, I expect it to go 280MPH, get 12MPG and 0-60 in under 3 seconds.
  • by AtariDatacenter ( 31657 ) on Sunday July 18, 2004 @05:36PM (#9733758)
    Being an administrator of some 24-way boxes, I have to ask a more detailed question about the error handling. Is the L2 cache in the CPUs just ECC'd, Parity, or fully mirrored? You'll find that on a large installation of CPUs, not being fully mirrored on your L2 will cause quite a bit of downtime over the course of a year with that many CPUs. I don't have those Itanium 2 specs. Anyone?

    UPDATE: I looked. Itanium 2's L2 cache is ECC. It'll correct a 1 bit failure, detect and die on a 2 bit failure. Believe it or not, on a large number of CPUs running over a long period of time, it happens more often than you think. It also says it has an L3. No idea on the L3 cache protection method used. Because they don't say, I'd also guess ECC. Wheee! Lots of high speed RAM around the CPU with ECC protection. Well, nobody called this an enterprise solution, so I guess its okay.

    Also, you're going to have regular issues with soft ECC errors on that many TB of RAM. And then your eventual outright failures that'll bring down the whole image of the OS. (An OS could potentially handle it 'gracefully' by seeing if there is a userspace process on that page and killing/segfaulting it, but that's more of an advanced OS feature.)

    Boy, I'd really hate to be the guy in charge of hardware maintenance on THAT platform.

Adding features does not necessarily increase functionality -- it just makes the manuals thicker.

Working...