Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Technology

Top 500 Supercomputers 215

Anonymous Coward writes "sendmail.net has a piece on the new Top500 list of supercomputers. 'So who came out on top? Well, three US Department of Energy machines have taken spots one, two, and three to lead the list: ASCI Red (manufactured by Intel) at Sandia National Labs in Albuquerque, ASCI Blue-Pacific (IBM) at Lawrence Livermore Labs in Berkeley, and ASCI Blue Mountain (SGI) at Los Alamos. These are the only three systems to exceed 1 TF/s on the Linpack benchmark, and represent 7.4 percent of the total Flop/s on the list.' The story notes that the average growth rates for the list exceed the number set by Moore's Law. "
This discussion has been archived. No new comments can be posted.

Top 500 Supercomputers

Comments Filter:
  • Yes, but the number varies based on the components and communication infrastructure. A set of components that only consumes 10 watts scale further then one that consumes 100 watts. Basically, infrastructure becomes you limiting factor. Power and communication specifically.

    Overall though this just keep an upper bound on realistic growth.
  • by Baldrson ( 78598 ) on Thursday November 11, 1999 @10:06AM (#1541695) Homepage Journal
    Mere "flops" is a very specialized metric. It biases toward low-level programming or calculations that are easy for the programmer to partition. The more the programmer must know about the solution to the problem in order to write the software to solve it, the fewer problems get solved.

    A better metric for computation power is given by this formula concerning the memory heirarchy:

    bandwidth*size*speed where memory

    bandwidth is the average speed at which data streams to or from memory

    size is the amount of memory

    speed is the responsiveness to random access

    What the massively parallel processor advocates frequently forget is that locality of reference is an expensive assumption. A similar mistake is made by memory heirarchy advocates. For example, many systems where CD-ROM jukeboxes were included to expand the size of the memory the architects overestimated "locality of reference" and therefore underestimated the profound impact that moving the robot arm around would have on latency. Such designs are convenient for the hardward designer who wants "good numbers" and a nighmare for advanced software application that needs unpredictable access to lots of information at a high rate in order to get the solution out of the machine before the solution is obsolete. The operands have to come together through that maze of wiring. If you have partitioned the memory, it profoundly affects both latency and bandwidth. The critical thing is to allow _shared memory_, and that means advanced memory control units.

    Seymour Cray kept ahead of the supercomputer pack for more than two decades by focusing his best talent on fast, high bandwidth memory control units and building the biggest semiconductor memories to match.

  • Well, it is, AND it isn't. It's using simular technologies, it's just not using the actual Beowulf 'package'. Nitpicking at that point, I suppose.

    Depends on how you define a Beowulf cluster, really.
  • Its not really parrallel processing though, I think it would be more 'semetric', IE each chip doing one particular task, the one its best suted for. This is the way q3a works, each chip is given a spesific job. But, a 9000 chip intel box wouldn't run quake any faster then a 2 chip intel box.

    anyway, all I ment in the comment about 3d gaming was that PCs are better then Macs for 3d gaming, no one, exsept maybe apples marketing department, would deny this. And a comparable PC would be cheaper then a comparable mac.

    Yes, there are other uses for floating point, but the primary use in consumer situations is games. If a sciantest really needed high-power floating point, they could get an Alpha or somthing.
    --
    "Subtle mind control? Why do all these HTML buttons say 'Submit' ?"
  • That doesn't exactly hold. Replace my use of "speed" with "transistors" or "density", and it is still the same. Every 18-24 months density/transistors/speed may double in chips /on average/, but at the same time there are very expensive ones with higher densities/whatever that are faster, and very cheap ones with lower densities/whatever, so to remove the money factor is not fair. At any point in time, regardless of moore's law, anyone can build a faster machine simply by throwing more money, more resources at it. So it is not fair to compare machines like this. They should be normalized to density or performance per dollar (or any given monetary unit).
  • But thats not true though. In theory, yes, you dump a trillion dollars on the developement of quantum computers, and you've blown away Moore's Law. My point however is that dumping a trillion dollars into a parallelized system that uses regular processors will not break Moore's Law, and never will, because Moore's Law is based on the measurement of the density within the individual processors themselves, and not the system as a whole. i.e. the density of a system that has 1 processor is the same as the density of a system with 500 processors.

    Moore's Law is based on the fastest processor existing today, not the most economical. Now of course there are processors that will do specific functionality faster, but their semi-conductor density has not changed. Speed is not the issue at hand here, its the symptom of increasing density.
  • The way I calculated it was by figuring out how many megaflops my computer had, then dividing the overall keyrate by my accumulated keyrate to see how many of my-equivalent-computers were doing d.net. Multiplying that by my number of mflops gives an extremely rough estimate of d.net's `score,' but a better one than guessing how many computers each e-mail address has. As I said, I think I erred on the low side (probably on the very low side, since d.net is at least a several-hundred-thousand-processor machine and ASCI Red is only 9,632-processor).
  • So... Where's yer Linpack #'s AC????


  • Well duh. That was an attempt at humor. :-)

    Have a great day...

  • Right...Moore's law won't break when you add more hardware. The density will still be the same. But the cost won't. These systems should be compared on equal footing. If my $5000 system runs at 70% of the speed of your $10000 system, I win, because I have a better performance to cost ratio. Theoretically, add another of my $5000 systems, and my composite $10000 system beats your $10000 system. That's how things should be compared. It makes no sense to say my $X K7 beats your $.5X Celeron. It's bang for the buck that matters (more bang per buck indicating better design).
  • Well bang for the buck isn't really the issue. You could probably get 5000 486 processors for reeally really cheap and build a system that has more power than X system. That doesn't mean shit when it comes to it. Money never was an issue with "fastest" computer and Moore's Law.
  • by spiral ( 42436 ) on Thursday November 11, 1999 @10:06AM (#1541716)
    Massively parallel machines tend not to "run" an OS as much as they are "run by" an OS. The control nodes or stations (often separate machines) run whatever (i.e. Unix) and the actual compute nodes run their computation and little else. After all, running 10000 copies of Linux isn't really the best use of resources.
  • by Mr_Plow ( 30965 ) on Thursday November 11, 1999 @10:10AM (#1541717)
    They also had some good points as to why one would prefer not to use a supercomputer and opt for a cluster instead. The following is from Sandia labs:

    The Sandia/Intel ASCI-red TFLOPS machine has proven to be one of the more technically successful efforts in massively parallel, high-performance computing. However, large MPP systems have drawbacks. Among these are:

    • Custom hardware components are quickly superseded by commodity components.
    • Volume vendors are not the best organizations to create niche products.
    • Large system scalability requires specialized knowledge and research.
    • Large-scale systems must grow in size and capability over their lifetime.


    Applications that require high levels of compute performance will continue to grow in size, variety, and complexity. While cluster-based projects have firmly established a foundation upon which small- and medium-scale clusters can be based, the current state of cluster technology does not support scaling to the level of compute performance, usability, and reliability of large MPP systems. In contrast, large-scale MPP systems have addressed the problems related to scalability, but are limited by their use of custom components. In order to scale clusters to thousands of nodes, the following must be addressed: Use of non-scalable technology must be bounded or eliminated. Technologies like TCP/IP, NFS, and rsh have inherent scalability limitations. Scalable management and maintenance is critical. The complexity of maintaining the cluster should not increase as it grows. Usability of the machine is critical. Users should not be required to know detailed information about the cluster, such as the name of each node or which nodes are operational, to effectively use the machine.
    ------------------------------------------------ ----------
  • by Anonymous Coward
    All of the Cray machines presumably run Unicos/mk, a unix variant.

    The IBM SP series machines I've run into all ran unix.

    The Suns of course run unix.

    I would guess the HP's run unix, although it might not be HP-UX.

    The SGIs probably run IRIX unless they are "Cray/SGI" T3Es in which case they run Unicos

    I don't know about the NEC or Fujitsu machines.

  • Although to be fair, the definition of "CPU" might differ from one manufacturer to another. For one it might be just a single chip itself, for another, "CPU" might be an individual cabinet full of a couple dozen chips. Can anyone shed light on this?
  • So how do they compare these scalar arch. to the vector ones like the NEC VPP series and so on.
  • by Anonymous Coward
    Crays: Unicos
    There own supercomputer OS for vector machines.

    SGI:
    Irix for SMP

    Sun:
    Solaris for SMP

    Hitachi:
    MPP version of HI-UX, this is a varient of HP-UX optimised for a non-shared memory system.

    Intel:
    Some flavour of unix I believe. However with these machines each node executes its own copy of the OS and does SMP on that node. The Intel machines should not really classify as one computer more like a few thousand clustered together.

    Linux does not really scale beyond 4 procesors on SMP systems. The most poserful linux systems are the beowulf clusters like the ones that NASA has. I don't know why these don't appear on the list as they are surely more powerful than some of the lower end Suns. However I doubt that a beowulf cluster counts as one computer.
  • NEC simply isn't using *micro*processors. Some CPU's can be built of tens of single chips. For problems which can't be eficently solved using distributed memory (MPP), you have to use SMP. Unfortunatly SMP isn't *generally* good beyond 32-64 CPU (not enough memory bandwidth), so if you want more power, you have to build faster CPU's. That's what NEC (and probably others) is doing.
  • Typical geek typo. Personally, I often transpose digits to make powers of 2. Its the way my brain is wired.
  • Just a nit...

    Livermore NL is in Livermore. Berkeley NL is the one in Berkeley.
  • These things have always impressed me...

    Massive computing power using sometimes generic technology, others using THE LATEST in busses and network technologies.

    Quake at 100000 FPS... running OpenGL in software... I wouldn't be suprised, but then, these things run nuclear bomb simulations.

    Quick question, if you linked these up, how long would it take them to crack RC5? DES? Probably why the USGov doesn't want them exported...
  • the list would be much more meaningful and interesting if supercomps at Ft. Meade, and other classified TLA facilities, were included.
    --
  • Quake! Give me quake! Can you imagine using the top system for playing a 32-way quake deathmatch
    2048x1532 resolution simultanously on one machine?

  • I teach Unix courses (on Linux) and in the first class I try to give an idea of where Unix is used.

    I always say "the fastest computers in the world run Unix", but I'd rather be able to say "480 of the top 500 computers run Unix" - it sounds more impressive. The problem is that, although I can identify most of the operating systems on the list quite easily, I'm not sure about some of the more esoteric ones. Does anybody know exactly what all these systems are running?

  • Moore's law doesn't apply to machines like these. It applies to their components, but when you just keep adding components the aggergate will obviously grow faster. If you take the price though for that aggergate I think you'll see Moore's law probably still holds.
  • This has already been discussed [slashdot.org] here on /.
  • I wonder how you land a job coming up with those algorithms. That's freakin' insane, weeks to run on super computers. Wow.

    They should build a super computer and have it run all the time calculating pi, just to see if eventually it terminates or starts repeating... :)

  • by Anonymous Coward

    Quick question, if you linked these up, how long would it take them to crack RC5? DES? Probably why the USGov doesn't want them exported...

    Silly rabbit, the government has SPECIAL PURPOSE computers to crack rc5, DES, and every widespread block encryption algorithm. They can crack them faster than you would believe.

    These general-purpose supercomputers are put to much more nefarious uses.

  • There is not one Beowulf cluster running on Amigas with an open-source kernal written in Perl.

    What gives?

  • SGI/Cray - have they moved to MIPS, or are they still using Alpha's?

    Yeah, Cray still uses alphas in their T3Ds and T3Es.

    The T90s and SV1s use Cray's special vector processors.

  • If you compare the current/old status by maker [top500.org], with the latest status [netlib.org], you'll see that SGI/Cray have dropped from 182 machines to 133, IBM have increased from 118 to 141, and Sun have gone from 95 to 113 - all those Starfires come in handy, as there's 40 Starfires in the list with 64 400Mhz UltraSparc-II's - current max capacity for 1 Starfire. However, the first Sun entry is at #33, though Sun's UltraSparc-III and next-gen Serengetti server will help, when they eventually come out...

  • There's a user manual available here for ASCI Blue [llnl.gov]. LLNL is already working on a 10 teraOPS machine called ASCI White. 8000 processors... ASCI Red is currently 1.8 TeraOPS.
  • There's a user manual available here for ASCI Blue [llnl.gov]. LLNL is already working on a 10 teraOPS machine called ASCI White. 8000 processors... ASCI Red is currently 1.8 TeraOPS.
  • For what I understand Moore's Law is about the amount of transistors you can put per square inch, not the computing power. If you put more processors in one machine, it doesn't have anything to do with Moore's Law. Moore's Law is about going from .5 um, to .35 um, to .18, ...
  • Augh. Should have used preview. Should be:

    170 NEC NLR 8 - fastest computer with a number of processors less than 10

    101 SGI "Government" 1024 - PRESUMED slowest computer with a number of processors greater than 1000

    teach me to consider a less-than symbol "Plain Old Text."
  • It's dictated by the number of RS6000 nodes we had online on our biggest system on the day the benchmark was run. At the time, we were running with 3 different RS6000 SP systems with 128 or mode processing nodes. Nodes are constantly moved between the different systems as needs change.
  • Well, at least not yet. Ok, at first glance an SP is a collection of rack-mount RS6000s, connected via Ethernet. So where's the added value? High speed crossbar switches also connect all of those RS6000s, w/ point to point bandwidth of 30 to 130 MB/sec, depending of the model of the switch adapter.

    At our center, we're installing a batch of new SMP nodes [mhpcc.edu], so I'll be interested to see just where we place in the standings when we rerun the benchmark.

  • Next year we should have the fastest with the next ASCI machine (I work for Livermore Computing).

  • Those special purpose machines probably would NOT run the linpack test very fast. They are very much tailored to special computing (such as sound wave analysis :).

  • but only two CPUs :(
    --
    "Subtle mind control? Why do all these HTML buttons say 'Submit' ?"
  • Brilliant! -- and to think, all those "experts" they have at DOE didn't notice this...

    Think about this for just a sec... If you take a fairly communications intensive benchmark (such as linpack), which clusters do you expect to give you the best "bang for the buck"? We know that performance will be a function of 1) processor speed, 2) number of processors, and 3) communication speed/bandwidth. Obviously, those clusters with the least comms overhead will have the advantage. Now, do you expect a machine with 10^2 procs to have the same comms speed/bandwidth available per processor as a machine with 10^4 procs?

    So on one hand, you could say ASCI Red is an inefficient POS, and with respect to a benchmark like linpack, you'd be somewhat correct. On the other hand, given a less communications-bound benchmark (like a prime-number sieve, or something distributed.net-esque), ASCI Red would look a lot better.

    Now this brings about one more topic: Why do they use linpack as a benchmark, and not something with fewer comms? For "Real-world" applications on supercomputers (as they apply to science/engineering), most of the computational effort is spent doing operations on sparse matrices (i.e., matrices that are mostly zeros). One way of handling these operations is to do them in the same manner as a dense matrix (multiplying/adding all the zeros), which is horridly inefficient. The preferable alternative is to spend a great deal of time "looking" for work to do on nonzero entries. This "looking" is faster than performing all the unnecessary computation, but obviously implies more communication. Thus, putting the machines with more processors at a disadvantage.

    Happily, though, there are plenty of people with computational needs that aren't terribly comms intensive, who would rather have the 9000 processors than awesome comms speed, because for us, that's what makes our codes run faster. (That and we don't sit in a queue all day waiting for a block of cpu's to free up)

    Quake Analogy:
    Your pals across the street share a T1 and play on the same server with the same number of people all the time. With their nice ping, they each average 1024 frags/hour (gotta be even powers o' 2). Now you have a LAN party, and 64 other people (of equal "talent") get on the same server, abusing your T1, lagging it out, and you each average 16 frags/hour. Thus, looking at the first situation, they look like a butt-stomping, fragging machine. Whereas in the second situation, you look like a bunch of gay pansys. However, is it really fair to say that you're a worse quaker just because your ping sucks compared to that lpb across the street?

    The same principle (a variant of Amdahl's Law, as it's known in academic circles) applies to supercomputers.
  • This is the first Beowulf machine from the top of the list me think:

    Manufacturer: Self-made. Nice :)
  • ...Beowulf cluster you could make out of those!

    (Score: -1, Unoriginal)

    Seriously, all we really want to know is which of the machines on the list are Linux clusters of some sort. This is still Slashdot, after all...
    --
  • by Anonymous Coward
    There are 3 Self-Made which run Linux (Beowulf). The CPlant is 44th!!!
  • So is it possible that vector architectures naturally excel at the Linpack benchmark, similar to the way SSE or 3DNow processors naturally excel at 3D benchmarks (assuming of course that said benchmarks were compiled to use the extra instructions)? Does this mean that these vector machines are very good for one certain thing, namely massive floating point number crunching, but are not well suited to much else, (i.e. large numbers of users and/or jobs running, invoking heavy, rapid context switches instead of sustained FLOPs)? Don't highly pipelines architectures suffer from changing between heterogenous data streams (forching pipleine flushes on each context switch)?
  • Yes, the law should more accurately be stated "Speed /per doller/ doubles every 18-24 months"

    Throwing money at the problem isn't fair...perhaps they should have normalized these systems based on their price...
  • It'd be waste to use these for echelon. DSP's are much cheaper and better for that kind of stuff...

    Hajo
  • Nope: #44 is the first beowulf
  • The list only counts those systems where linpack has been run. I imagine that there are several beowulf (and other self-built)systems that are actually achieving more computational throughput, but aren't going to be noticed. Also, I'm sure the government doesn't release everything it has working on satellite images and what not. Overall, I'd imagine that a large subset of the people who are actually using huge machines for real work (rather than academic research) wouldn't take the time away from their work to even run linpack on their system.

    As one example of such a computer, Professor John Koza has a 1000 node (Pentium II 350Mhz) beowulf machine for his Genetic Programming Inc. ( GPI's web site [genetic-programming.com]) research group. He's running genetic programming applied to difficult problems on the machine (such as automatic analog circuit design), and is getting a nearly linear speedup because of the embarrasingly parallel nature of GP.

    Cheers,

    David Andre
    my web site [berkeley.edu]

    disclaimer: I worked with Professor Koza for several years and helped him build some of his previous machines.
  • i'm just happy that the computer topping the list is in a magical, far away place, where the sun is always shining and the air smells like warm root beer, and the towels are oh so fluffy! Where the shriners and the lepers play their ukuleles all day long, and anyone on the street will gladly shave your back for a nickel!

    go weird al!
  • I would love to be able to say I've seen one of the 500 fastest super computers in the world.

    A funny story about that box, when I went down there, and saw it the first thing I thought was that all those lights represented CPUs, then I figured it would have been imposible, since it would have certanly made it one of the fastest computers on earth....
    --
    "Subtle mind control? Why do all these HTML buttons say 'Submit' ?"
  • Note that ASCI Red is a Pentium Pro based machine; not exactly Intel's current offering.
  • You can't compare the FLOP level for these super computers to the flop level of a super computer, A single-chip system will get much, much higher flops/chip then a super computer. So you would need a lot more then 2000 g4's to equal the performance of these ASCI's.

    Also, just like the PowerPCs, the Intel chips used are very old, Pentium Pros, probably running at about 200mhz. I'd be willing to bet that an Athlon running at 800mhz, the fastest you could buy, would easily beat a G4 at 450mhz, the fastest you can buy...
    You Mac freaks never realize that is not performance per box, its price/performance, and the PC kicks the crap out of a Mac. (esp. for 3d gaming, witch is really the only need consumers have for all that FP)

    --
    "Subtle mind control? Why do all these HTML buttons say 'Submit' ?"
  • Could any of that processing power have anything to do with Echelon???

    I don't think any NSA computers are on that list. I have yet to hear any real specifics regarding what the NSA has at their disposal, except the word CRAY a few times. Suffice it to say that they have the potential to have way cooler machines than any on this list, due to their undisclosed budget.

  • Node OS on Cplant is near the same as for ASCI red, a "Puma Message Passing Operating System" (Red has Cougar, they're related). It makes sense to have a purpose built, small, (300K) and message optimised OS in the nodes for computation.

    Avalon is the "real" Beowulf Linux supercomputer, with Linux on the nodes.

    Maybe Cplant was using the word Beowulf for cachet. Then again I might be misunderstanding what a Beowulf is :-/ .
  • some do but most absolutely do NOT use intel processors. Most use stuff you can't get in your average pc. It's close, but not the same. They use a version of the cpu with hugely enhanced pipelining and floating point performance.
  • that's cuz the japanese one is a bigass vector machine. the processors are much more expensive, but they also haul ass over a regular risc chip. they're also significantly easier to program. However, for some reason, people have decided that vector machines are out of fasion. whatever, makes my job more secure 'cause parallel programming is significantly more difficult than vector programming. :-) Now if only it paid as well as say, Java programming I'd be stoked!
  • So where's the Linpack #'s???

    We're workin' on it. There should be something official announced at SC99 nect week.

    --Troy
  • The stats on #23 are out of date. Our frends at nersc joined their two T3E's so the new mcurie has a Peak of 575 Gflops not the 444.2 listed
  • Not only that, but there are numerous Hitachi, Fujitsu and NEC computers where the entries on either side of them have at least one order of magnitude more processors. What are those sneaky Japanese up to?

    And the pressing question is whether next year we can expect to see a Transmeta-based supercomputer in the top 500.

  • All of the IBM machines (RS/6000 SP) run a version of UNIX call AIX (Advanced Interactive eXecutive). It is a very scalable OS, it can be used from the smallest workstation to the largest Supercomputer.
  • I think you are right, but a while ago, people started generalizing moore's law to talk about computer speed as well as memory density.
  • Hehe... which reminds me... back when I was in school and I could play with all the nice toys (I studied CFD which needs as much power as you can throw at the problem), I compiled XTetris on a Cray Y-MP, renamed the executable something arcane, like 3flowx or something, and played against the darn thing for ~1 hr for *one* game, which you can do, since XTetris could only use one CPU at a time, the Y-MP had 256 (?) pretty slow ones and the prog can only get a time-slice of that one too...

    Trust me, heavy iron does not a good Quake machine make...

    Oh, did I mention I was on the console of a Reality Engine 2 at the time? Some guy sat next to me, stared at the screen and pronounced: "How come the Reality Engine thing is so slow?" ;-)...


    engineers never lie; we just approximate the truth.
  • Didn't we hear something a few months back about a 1000-node cluster of Alphas doing genetic algorithms? I wonder how that would have measured up.

    Also, it would be nice if the table added a column for Rmax/AcquisitionCost, and let you sort by that column. I'll bet that would launch some Beowulfen toward the top.

    --
    It's October 6th. Where's W2K? Over the horizon again, eh?
    • The power, flexibility, and support of Linux is unmatched, from clusters of multi-processor scientific behemoths, to the young school girl's modest hand-me-down 386.
    Well, actually the power of Linux is pretty well matched on both ends by OS/2, which has always had the fastest context switching. It even runs on a 286 (Linux won't).

    It does not run on Amigas.

  • What you are talking about is Amdahl's Law, which states that there is a upper bound for the speedup achievable by any parallel device, due to the inherently sequential nature of problems the machine is running...

    so yes, there is some limit to the amount one problem/program can be parallelized(sp?). However, I suspect that each of those machines is running more than one simulation at one time, so overall they are each somewhat efficient in their use of the processors n'stuff

    m'kay?
  • It's not so much that they are 'still programming them' as it's the fact that they keep adding more processor farms to them. The ASCII machines are actually clusters of clusters, and they can simply throw more processing power at them, really..
  • I know that their rendering farm is comprised of Sun boxen, but that's about all I know.

    Don Negro
  • It's a typo, of course. Thanks for pointing it out.
  • by Anonymous Coward
    I bet the NSA holds the real record. What else is being done with the > $20 million worth of electricity they use every year?
  • Yes this is slashdot, but slashdot isn't a linux site sir. This is news for nerds, not news for linux nerds.

    I know many people that frequent this site who detest linux, run windows NT, freeBSD, hell, dos :)
  • in 1993? i think it was when i change my 386SX33 to a 486DX4/100... and it has cost me only 4000$ :)
    --
    http://www.beroute.tzo.com
  • Yeah, a cpu is a chip. A node is a cabinet which may share cpus and memory and some internal interconnect. While a cpu is a chip, this chip may be a vector chip or a "normal" chip. vector chips can do sooooo much more per clock cycle for vectorizable algorithms, that's why the skew in the #'s of cpus. they're also seriously more expensive than "normal" chips.
  • oh please. vector still kicks ass at certain algorithms over parallel machines, things like irregular access to memory, and well, long vector problems. the transition to parallel machines has been almost entirely political. parallel is better at some things, but not all. the site I work for still has a Cray/SGI T90/16, and it is the heaviest used machine among a T3E, IBM SP2 and SP(3), as well as tons of little crappy clusters.
  • Beowulf clusters, eh? [nudge nudge, wink wink]
  • heh...i work at a call center for them. they just barely made it in there at #451. surprise to me... =)

    ...
    hdj jewboy
  • >(esp. for 3d gaming, witch is really the only need consumers have for all that FP)

    Well, there are other consumer uses ... some image processing (although not most), speech recognition, etc. And, if you are doing any research/programming for scientific applications (as are many of the super-computing people), then floating point performance may be the only thing that one is interested in.

    Also, it is not always the case that a single chip system will get much higher flops/chip than a multi-chip system -- it depends on how much communication is required in the particular application. If it is something like parallel search, for example, you need hardly any communication between chips and therefore you get good performance. However, for most consumer applications, you don't get as nice speedup.

    Cheers,

    David Andre
  • ASCI Blue... so that's where Erwin resides. Just hope Dust Puppy has a suit to enter the clean room!

    *ducks flying tomatoes*

    Deosyne
  • Yes, but do even those processors break Moore's Law?
  • by Anonymous Coward
    #44 is a Linux cluster, as is #256
  • While CPLANT is a clustered alpha linux machine, they makes pains to not call it a Beowulf. From the FAQ:

    FAQ 2: Is it another Beowulf machine?

    Not really. The Cplant project has some broader goals than traditional Beowulf systems. We are not trying to build a machine for a small number of users to run a small number of applications on a small number of machines. We are trying to build a production machine for hundreds of users to run all types of parallel applications on potentially thousands of nodes. We are essentially trying to build a commodity-based machine patterned after the design of the Intel TeraFLOPS machine.
  • Yes, they're on there. They're called 'Self Made'. Here are a few:

    #44 CPlant Cluster
    #265 Avalog Cluster
    #454 Parnass2 Cluster
  • It's a typo, of course. Thanks for pointing it out.

  • by Anonymous Coward
    just to add a little more to this list: ASCI Red, the Intel-based machine, runs a variant of Linux, _very_heavily modified. I don't know off the top of my head about how the Suns do vs. the beowulfs, but i know that Sun is not a heavy player in the DOD/DOE High Performance Computing community- though they're working on it. As far as clusters go, they do count as one machine, and they must be able to run the benchmarks Top500 uses. It takes a lot of work to get even a single application to work on any of these one-off machines, and sometimes they simply can't get things to work at all. ASCI Blue Mountain is a cluster of 48 full-sized (128-processor) SGI Origin 2000 supercomputers, for a total of 6144 procs (currently). These are the MIPS R10000 processors.
  • 2048x1532 resolution? You dream too small. Look here [uiuc.edu].
  • by mlfallon ( 110606 ) on Thursday November 11, 1999 @12:10PM (#1541823)
    The Hitachi machine can achieve these figures for two reasons:

    1) Their Interconnect
    2) Their Processors

    The interconnect is a hyper-bar crossbar network, with a bandwidth of 1GByte. Also they are able to get sustained message passing performance of about 90% like they did on their previous machine the SR2201. Other vendors would provide 60-65% of peak.

    The number listed in the Top500 for processors is a bit mis-leading, this is in fact the number of nodes. The Hitachi nodes are made up of a number of processors, each with pseudo-vector optimisation (allowing them to miss the cache when loading large memory blocks). This optimisation means the chip can have a high sustained performance on large scale numeric problems. The nodes can be configured as either SMP of vector. This allows the machine to address a much wider range of domain problems.

    Hitachi have a very brief page describing their machines SR8000 Product Page [hitachi.co.jp]

    I would love to see what a fully configured machine could do (6 TFlops!).

    BTW, Linpack is not a great gauge of a Supercomputers performance. When there a lot of nodes it becomes message bound and does not reflect the true performance of the machine. When looking at machines like this it is important to look at benchmarks related to domain problems. e.g. It does not really matter what interconnect you have if you are doing ray-tracing, but it matters a great deal when doing astro-physics.
  • Does Moore's Law apply to massively parallel, hand-built systems?

    I wouldn't think so... At the time, Moore was running Intel, a one-CPU-per-machine outfit, and I think his "law" was an observation on the rate of progress in the PC industry, and what advancement was possible within the technology of single Von Neumann-bottleneck-style systems.

    -schmaltz
  • by zealot ( 14660 ) <xzealot54x@NOSpaM.yahoo.com> on Thursday November 11, 1999 @09:47AM (#1541838)
    I hate articles like this. In the first place theres the, "The new Top500 numbers are in, and your laptop has never looked so tragically slow." These supercomputers are all massively parallelized machines using regular microprocessors. The actual speed of the machine, like ASCI Red, is determined by the processors used, which in this case are just normal Intel processors. So you can go out and buy a machine that computes instructions just as fast as ASCI Red. The difference is that it can do more things at once because of all the processors involved. Does it make your laptop look slow? Hell no, because if you had ASCI Red, you wouldn't have any apps that take advantage of its parallelism to run on it anyway.

    Secondly, Moore's Law is the following (from http://www.intel.com/intel/museum/25anniv/hof/moor e.htm):
    In 1965, Gordon Moore was preparing a speech and made a
    memorable observation. When he started to graph data about the
    growth in memory chip performance, he realized there was a striking
    trend. Each new chip contained roughly twice as much capacity as its
    predecessor, and each chip was released within 18-24 months of the
    previous chip. If this trend continued, he reasoned, computing power
    would rise exponentially over relatively brief periods of time.

    Moore's observation, now known as Moore's Law, described a trend
    that has continued and is still remarkably accurate. It is the basis for
    many planners' performance forecasts. In 26 years the number of
    transistors on a chip has increased more than 3,200 times, from 2,300
    on the 4004 in 1971 to 7.5 million on the Pentium® II processor.


    Since the CPUs in supercomputers use standard processors, and Moore's Law applies to these processors, his law is still intact. His law is about CPUs, not systems.

  • Ishtar (1 flop)
    Kevin Costner (1 megaflop/year)

    --
  • ASCII Red is 2 years old and the Blue machines are 1 year old. Then again these machines are expensive to build and they are still programing them. Hense why they are a bit faster than last year.

    The real action is lower down Avalon [lanl.gov] was top 100 now is down to 265. The top cluster goes to cplant [sandia.gov] take the award for top cluster now.

  • The only reason this term was thrown around is cause in a test, the G4 exceeded the 'allowed exportable power' for IC's. Personally, that's not enough to make me switch to the Apple market. I'm happy with my slow, but configurable system.

    NIVRAM
  • 94 IBM MHPCC 243 - fastest computer with a number of processors in no way related to common powers of 2

    hmm, 243 is 3^5. I wonder what strange architecture dictated that number.
  • Are you -sure-? Have you actually -seen- them run nuclear test simulations? And why are the Republicans now so against the Test Ban Treaty?

    No. The sordid truth is that the Republicans are marginally behind the Democrats and the Pentagon, in the Mega Deathmatch, currently being played on a network of supercomputers and an enhanced Quake server.

    The Republicans are desperate not to lose precious cycles to simulations, which would give the other two teams a decisive advantage. The lag might even cost them the tournament.

    These would be OK at cracking DES, but really, given that DES can be cracked in less than a day on a kit computer (and within a week by assembling kitchen equiptment), DES is essentially dead, as far as the US Government is concerned. Only people like NASA and Boeing still use DES, to any degree. There are FAR worse vulnerabilities to their approaches, though, than mere crackability. Key management is - to be blunt - pathetic.

  • by TheKodiak ( 79167 ) on Thursday November 11, 1999 @09:52AM (#1541874) Homepage
    10 IBM UCSD 960 - fastest computer with a number of processors evenly divisible by 10

    12 IBM Charles Schwab 2000 - fastest computer with a number of processors evenly divisible by 100

    15 Fujitsu Kyoto 63 - fastest computer with a number of processors not evenly divisible by 2

    46 Fujitsu NAL 167 - fastest computer with a number of processors neither evenly divisible by 2 nor equal to (2^x)-1

    94 IBM MHPCC 243 - fastest computer with a number of processors in no way related to common powers of 2

    170 NEC NLR 8 - fastest computer with a number of processors 1000

    (Yeah, yeah, I know you've got 2k TRS-80s in a Beowulf cluster in your back yard.)
  • by Count Fragula ( 67767 ) on Thursday November 11, 1999 @09:52AM (#1541875)
    I mean, it's nice to see the US machines taking the cream of the honors in raw power... but what the heck - ASCI Red gets it's 1st place berth with 9 THOUSAND some odd cpus (.0246Rmax pts / cpu), whereas the Hitachi machine gets a very respectable 5th with only 128: 6.8Rmax pts per CPU! Isn't there some credit due for the more efficient machine? It doesn't seem that impressive to simply dump silicon at a problem until you are #1...
  • Isn't it also true, however, that there are practical limitations of how many components you can add and still expect a reasonable increase in performance? i.e. After 512 processors it is no longer cost effective to add processors because of other limitations. I don't know the exact numbers. But then wouldn't moore's law apply along the limitations of these machines? Don't quote me on this, I'm just making it up.
    --------------------------------------------- -------------
  • Average growth rates per year are 1.8 percent for accumulated performance, 1.77 per year for the number one perch, and 2.0 per year for the number 500. This means the observed performance growth exceeds Moore's Law, which sets the bar at 1.6 percent per annum.
    Except that Moore's Law sets the bar at a FACTOR of 1.6, which is 60 percent, NOT 1.6 percent (i.e. factor of 1.016). I thought writers for technical forums had to know the difference between "increase by a factor of" and "percent increase"... because you certainly can't pass high school physics or math without knowing that distinction...
  • Yes, I do see a lot of entries from Intel, which means to me probably Pentium Pro's.

    Then there's IBM, which seems to be using PowerPC 604e's.

    Next, SGI uses MIPS

    SGI/Cray - have they moved to MIPS, or are they still using Alpha's? (I'm not 100% sure that's what they used before, but i'm 95% sure it is).

    My main puzzler here is NEC. WHAT ARE THEY USING??? If you go down to #73 on the list, there's a machine that was deployed in 1999 with just 16 processors? Okay, it's performance is 1/19th that of Intel's #1 offering, but it uses just 1/602 the amount of CPU's??? That's not a standard processor that i've ever heard of?

    NEC has a bunch of listing below that, too. Some use just 5 processors (though, those are all in the high 400's). What chips is it using? Can anyone explain what this machine is?
  • I think it is very interesting to note that of the top 40 supercomputers, there are seven with less than 400 processors and every single one of them is located outside the United States (six in Japan, one in France).

"God is a comedian playing to an audience too afraid to laugh." - Voltaire

Working...