Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Technology

500 Billion Very Specialized FLOPs 89

sheckard writes: "ABC News is reporting about the world's fastest 'supercomputer,' but the catch is that it doesn't do much by itself. The GRAPE 6 supercomputer computes gravitational force, but needs to be hooked up to a normal PC. The PC does the accounting work, while the GRAPE 6 does the crunching." The giant pendulum of full-steam-ahead specialization vs. all-purpose flexibility knocks down another one of those tiny red pins ...
This discussion has been archived. No new comments can be posted.

500 Billion Very Specialized FLOPs

Comments Filter:
  • With the sheer computing power of something like this, a similar device designed exclusively for cryptanalysis with enough units around to run a distributed network could probably put a big dent in many currently "secure" crypto products.

    Then you could say bye-bye to rc5-64. Perhaps before long you could eat rc5-64s like popcorn and go on to the other challenges at RSA [rsasecurity.com].

  • Isn't this much like the thing ENIAC was built for: calculating ballastic missile paths?

    I guess this one is a little faster tough...

  • by WolfWithoutAClause ( 162946 ) on Sunday June 04, 2000 @12:19AM (#1027279) Homepage
    Actually gravity simulation is pretty cool algorithmically as well as hardware wise. Originally the gravity simulators had to work out the attraction between every pair of particles. This meant that if you simulate 1000 particles they had to do 1000,000 calculations. Slow.

    So along come some doods who said why don't we recursively stick the particles into boxes and then calculate the attraction between the boxes instead and it should be a lot faster. So they tried it and it seemed to work great- it only takes more like 10,000 calculations to do 1000 particles.

    Anyway along came some other guys and they were a bit suspicious. They showed that some galaxies fell apart under some conditions with the recursive boxes method, when like they shouldn't. Back to the drawing board.

    There are some fixes for this now- they run more slowly, but still a lot faster than the boring way. Still, its better than the end of the universe. Even if it is only a toy universe.

    For descriptions of loadsa algorithms, including 'symplectics' which are able to predict the future of the solar system to 1 part in 10^13 ten million years in the future check out this link: [napier.ac.uk]

  • A beowulf cluster of these babies. ;^)
    (sorry, i had to.)

    Hmph... while some people worry that it is single purpose, they miss the fun... these people made a really fast computer. That's cool, by itself. It was created at the University of Tokyo, so it is obviously research, and not done as a cost-effective solution. I'm sure people can take lessons learned from this machine, and eventually apply it to a more broader market.

    And having it controlled by a PC is no stranger then having your accelerated video card controlled by your computer, and it just doing the 3D video calculations. =^)
    -legolas

    i've looked at love from both sides now. from win and lose, and still somehow...

  • Perhaps anything that requires liquid cooling
    and comes bundled with two onsite engineers
    should be called a "supercomputer"

    (the Jobs 'reality distortion field' G4 ads notwithstanding)

    or perhaps anything that can crunch thru a
    SETI [berkeley.edu] data block in 10 minutes!

    MAB
  • A beowulf cluster of these...a GRAPE-VINE?
  • Actually most of the mass in a typical galaxy is neither stars nor dust, but rather some as of yet unknown form of matter called 'dark matter'.

    As far as we know the 'gravity only' type of calculation that the Grape boards perform is sufficient to describe the motion of this matter.

    However, there is indeed great interest in performing hydrodynamical simulations of galaxies, mostly because then we can attempt to calculate where and how the stars are forming in the galaxy. Dealing with the gas expicitly also allows us to follow things like shock fronts in the gas and to attempt to calculate the thermal properties of the gas. Of course this is all rather complicated stuff so we have to make gross approximations. And remember, even with that massive grape board if you describe a galaxy with a million particles, they are still going to each be representing at least 10 to 100 thousand solar masses. We are still a long way off from being able to describe the milky way on a 'star by star' basis.

    Two really good URL's for people who are interested in reading some of the technical details of this stuff are the web pages of my advisor Matthias Steinmetz [arizona.edu] and one of the fathers of modern galaxy simulations Josh Barnes [hawaii.edu].

    Note that Matthias's simulations (check out the pictures and movies) are all done with a high end workstation and a handfull of Grape 3 boards.

    Cheers

  • A machine that massive is likely to have its own gravitational field and throw off all the calculations!

    tee hee
  • Seymour Cray's early supercomputers used DEC computers as front ends. The i/o for a Cray was a single connector. The i/o and housekeeping for the Cray, a vector computer, were done by the connected DEC. Seymour Cray was a pioneer in the field of making computers that do one thing, do it well, and do it very quickly.
  • We know 10^12 is tera. But did you know 10^15 is peta and 10^18 is exa?
  • What they meant by specialised is not that it is special because it uses a slower machine to feed it. When they mean is that the hardware is special in that it can only perform certain instructions. Normal computers can do general equations but this one has special hardware that makes it do certain operations faster. Think of it this way: If you were to study 10 different languages then you probably wouldn't be come very fluent in any given language. If you were to learn only 1 langauge, on the other hand, you would get really proficient in that langauge. This machine knows only one "language" which makes it faster and that is why it is special.
  • Make a "dust particle" the size of the Moon, stick it in deep space, and you have a lot of mass for your visible cross-section.

    Nobody really knows how much of that stuff is out there. We know something is there that we don't see from the gravity puts out, but that doesn't mean it has to be something truly exotic. :-)

    Cheers,
    Ben
  • Printers? :)
  • With all this talk of special gravity-computing pipelines, does anyone know if the hardware design is a systolic array? If not, what is it?
  • Gee I'm dumb. As several people pointed out, it's 100 Teraflops. Well, so much for my theory, "I don't get any dumber after a few beers."

    As a consolation, here's a link to IBM's Blue Gene [ibm.com] supercomputer. It's still about 5 years off, but it will likely be the first Petaflop computer. It's being built specifically to solve a single problem--modelling protein folding. The best bit is that even at a petaflop, it will take about a YEAR [rutgers.edu] to simulate a single protein.

  • The NSA has their own chip fabrication facility at Fort Meade.

    The Summer 2000 issue of American Heritage of Invention & Technology has a fascinating article on the specialized code breaking machines that were built and used during World War II.

  • The truth [zgravsk8.com] about gravity [gravitykills.com] is very interesting [uiuc.edu]. However, my [parascope.com] knowledge cannot be passed on [duke.edu] to you because my life holds greater value [hawaii.edu] than the dissemination [ontheinter.net] of this info [fightinggravity.com] (from my point of view). I apologize [nasa.gov] for my selfishness, but must point [psu.edu] out that this what society [slashdot.org] has taught me [aip.org].

    Search here [surfwax.com].
  • Is this a new thing?, newtek has some thing called a screamer a few years back that did the same things for rendering in lightscape.

    There is also an other product that i cant remember the name of that acts like a rendering farm for 3D studio it has some custom rendering chips and an alpha for controlling it all. It actually runs linux...

    Hey if we want to go on: the older multi processor Macs, had the second processor acting as a slave to the first one.

    Im shore there are lots more examples, the story just made me think back on some cool rendering farm solutions that i have come a cross.
  • Ummm, how exactly does one supercomputer that costs over a million dollars (US) that performs at the same level as a collection of computers that costs a few 10's of thousands of dollars metamorphose into "better price"? Sure if you have the money to burn, go custom. But most of the computing projects out there do not require that kind of "big iron" and couldn't even afford it if they did. Besides, most of the time (unless you are in the DoD or NSA or such-like) you only end up with a small slice of that "big iron" which may or may not be roughly equivalent to being able to run your proggies on a computer that is all yours 24/7.

    Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.

  • Need a special purpose computer for studying gravity? Here's a system that's hard to beat. All you need is -
    • One apple
    • One planet
    The calculations are all but immediate, and the results are impeccable.

    The planet is actually pretty expensive, but you can borrow it free of charge.
    --
  • It doesnt really say in the article, but it sounded like they didnt use relativity and only used newtonian forces. Any comments, like how accurate the results will be and whether definitive statements are possible (For example, This galaxy will never collide with this one, even with relativistic effects).
  • EFF's Deep Crack crypto supercomputer supplied 1/3 of the computing power in the latest distributed.net DES challenge [distributed.net]. Now, if it could be rebuilt for RC5-64...
  • im sort of from the graphics department, and I see the same problems. Right now, the biggest problem for all the graphics hard ware people is the bandwidth to the graphics cards, and basically there are 2 answers to that we are going to see faster memory types(duh!) and embedded ram. this means that the memory is inside the graphics chip.

    playstation 2 has this and that is why it has a massive bandwidth of 48 gigs per second. Bitboys [bitboys.fi] has the same technology for the pc so lets hope they can actually release some thing.

    I would like to know if any one is working on a processor whit embedded ram?

    An other thing is the AGP bus that is just getting way to slow, and i guess that's up to Intel to do some thing about.
  • Ummm, how exactly does one supercomputer that costs over a million dollars (US) that performs at the same level as a collection of computers that costs a few 10's of thousands of dollars metamorphose into "better price"?

    Simple: various tasks need different amounts of bandwidth between the nodes to perform the calculation. For distributed.net and SETI@home, every data block is completely independent - the nodes don't need to communicate at all, so you just pipe the work units over the Internet.

    Most problems don't break up this well, though - individual parts of the problem can interact with their neighbours, meaning individual nodes need to communicate with each other fairly quickly - a Beowulf cluster, for example. Lots of normal PCs on a fairly fast LAN.

    Then, you have a handful of BIG number-crunching problems - like this one - where every part of the problem interacts with every other one. Think of it like a Rubik's cube: you can't just work one block at a time, you need to look at the whole object at once. This take serious bandwidth: the top-end SGI Origin 2800s run at something like 160 Gbyte/sec between nodes (in total).

    Here in Cambridge, the Department of Applied Mathematics and Theoretical Physics has an SGI Origin 2000 series box with 64 CPUs - homepage here [cam.ac.uk]. (There's a photo of Stephen Hawking next to it somewhere on that site - this is his department.)

    Basically, there are jobs clusters of PCs just can't handle. If the choice is between a $100k Beowulf cluster that can't do the job, and a $10m supercomputer which can, the latter is much better value.

    Sure if you have the money to burn, go custom. But most of the computing projects out there do not require that kind of "big iron" and couldn't even afford it if they did. Besides, most of the time (unless you are in the DoD or NSA or such-like) you only end up with a small slice of that "big iron" which may or may not be roughly equivalent to being able to run your proggies on a computer that is all yours 24/7.

    You're right - most projects don't need this kind of hardware. Some projects - including this one - do need it - either they cough up the big $$$, or the job doesn't get done.

    Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.

    You can't build yourself a supercomputer out of PCs and Ethernet. You can build a cluster which will do almost all the jobs a supercomputer can - but not all of them. Some jobs need a supercomputer. A few very specialised jobs need even more muscle - like this one. It uses custom silicon, because that's the only way to get enough CPU horsepower.

  • On their highly specialized code it'll probably do ok, but on other calculations I'd be surprised if it got 10% of that speed

    Do you not get it? This object only does one thing, can only do one thing, and is unable to do anything else. "Other calculations" are not possible because the algorithms are coded in silicone.

  • Yes, but in binary tera is 2^40 and exa is 2^50.
  • Most of the matter in galaxies is found not in the stars, but in the gas between the stars (stars are rather like dust grains in water), which leads one to wonder whether standard hydro simulations may model these system more faithfully anyway for most purposes.

    The recursive algorithm you described isn't the only particle-in-cell (PIC) game in town, incidentally. Perhaps the PIC techniques used in plasma simulations could be useful here? Plasma PIC simulations routinely model one or more conducting fluids with hundreds of millions of mutually interacting particles, often with comparable (in the case of electrostatic codes) or more complicated mutual interactions (in electromagnetic codes) than the blobs of gravitationally attracting fluids exhibit. (Instead of Newton's force law, in plasma media one solves Maxwell's equations to obtain the electric and magnetic fields, and then the particles are advanced in time using the Lorentz force). One thing that has resulted from this research is an understanding that in many parameter regimes of interest the "nearest-neighbor" interactions are less important than the collective effects, so smearing out individual particles into spatially extended blobs of superparticles can be a very reasonable approximation.
  • No one seems to understand the gravity of the situation.

    --
  • Actually, the original Cray was a general purpose supercomputer. The front end machine basically told it what jobs to run. The I/O was basically a seperate computer, though it was very specialized. Crays have continued to use this philosophy. The I/O used by the J90/SV1/T90/T3E is called gigaring and uses (among other things) SPARC chips running VxWorks to handle all of the I/O. Their next machine (the SV2) may actually have I/O running on the mainframe, though I think that's very up-in-the-air at this point.

    Anyway, my point in all that was the the Cray's are designed for general purpose computation, even if they aren't designed to be as general as, say, database servers.

  • Grapes of Wrath, eh? Hrm, I wonder where GRAPE 1, 2, 3, 4, and 5 went to? Someone probably ate them as they became obsolete.

    -------
    CAIMLAS

  • From following that link, its amazing how cheap this thing is (grape 5).

    $40K including an Alpha host and software. Only $10K for the actual superCruncher. Plus its small, so it shouldn't suck up that much power. This is much more powerful than a cluster of 5-7 linux pcs
  • Its the grape boards which are specialized. All they can do is calculate gravitational potentials between particles, nothing else.

    The only problem with previous versions of grape (that I know of) is that their precision is a little lower than you'd really like or need for some applications, but otherwise they are very nice for doing large n-body sims.

    Doug
  • ... the Thinking Machines CM-5 ... used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.

    Yup. The Sun Enterprise 10000 [sun.com] (AKA "Starfire") uses a dedicated Ultra 5 [sun.com] as the console/management station. It connects via dedicated ethernet to the Starfire.
  • Do you think you are joking? See below.
    http://www.nec.co.jp/engl ish/today/newsrel/0005/3001.html [nec.co.jp]
    BTW, the NEC SX-5 - unlike massively parallel architectures - can effectively run near its theoretical peak performance for most applications, I'd say that the top 40 TFlops performance is a rather conservative estimate (NEC will have newer and faster technology by the time this beast starts being built).
  • Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop.

    I always liked the definition, "Any computer that is worth more then you are."

    ;-)
  • roughly eight years ago, i was taking plasma physics, and the professor, who i had a long relationship with (after grading homework for his pascal classes) revealed that galactic simulations (work done at boston university/harvard) had finally achieved something remarkable...

    entropy. though most simulations suffer from reversibility (i.e. the system dynamics can be reversed: the simulated system evolves from state_x to state_y, but state_x can be determine exactly from state_y), researchers finally designed simulations that were not reversible (and the entropy correlated so well with theory you could derive boltzman's constant).

    anyway, that's how i remember it, a passing comment from a class i really dug, but somewhere after debye shielding, i got lost--tensors can be rather difficult if you've spent most of your time writing code and designing circuits (hohlfeld, bless you, wherever you are ^_^;)
  • The CM-1 and CM-2 Connection Machines had the same basic idea. The CM-5 was a bit different -- it still had a front end, but the individual processors could be booted to run UNIX (SunOS), and in general were a bit more independent. The CM-1 and CM-2 were pure SIMD. This was actually quite a popular approach in the 1980's; there were lots of startups trying to do much the same thing, ultimately with even less success than Thinking Machines.

    A lot of us who had been at TMC in the 1980's liked the CM-2 much more than the CM-5. Architecturally it was very clean. The CM-5 was a much more complicated machine.
  • The term is "symplectic integrator." You can check out the book "Dynamical Systems and Numerical Analysis" by A.M. Stuart and A.R. Humphries for an introduction and some references. The term refers to an ordinary differential equation solver that preserves the symplectic structure of the evolution semigroup of a Hamiltonian system. (Compare with Hamiltonian conserving methods). Such methods can be more accurate than general ODE solvers applied to a Hamiltonian system.

    So, as far as I can tell, the poster made a typo but he isn't bullshitting. But you are probably a troll, so I'm not sure why I'm bothering.

  • They're coded in silicon, unless the makers of the machine have allied themselves with Dow Corning or something... :)
  • sorry for the formating. Disregard.
  • 500 billion is a lot of FLOPS. I wonder how it would handle overclocking?
  • The only question is: "Did they join the /. team on d.net?" ;o)
  • And another item... A FLOP is a FLOP is a FLOP. If it can do a floating-point operation for one thing, it can do it for another.
  • by JoeyLemur ( 10451 ) on Saturday June 03, 2000 @10:35PM (#1027320) Homepage
    Uh... running a supercomputer from a less-powerful computer is nothing new, and certainly doesn't make it 'specialised'. Historically, the Cray T3D used a Cray Y-MP as a front-end, and the Thinking Machines CM-5 (and CM-200, I think) used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
  • by IvyMike ( 178408 ) on Saturday June 03, 2000 @10:38PM (#1027321)

    If you re-read the article, you'll see that 500 billion is just ONE OF THE BOARDS in the GRAPE. There are going to be 200 boards in this puppy, making for a machine that's getting 100 petaflops.

    Damn fast!

  • That ain't very fast. I think general purpose computation is seriously kicking ass in the supercomputer arena. Not because it's faster, but because it is so much cheaper and yet still performance competitive. Even a few million dollars for a supercomputer pretty much limits their use to the most important projects capable. It also tends to force them into a "time lease" model where everybody shares the supercomputer time and you need to go through a lengthy process of application selection to get your measely share of computer time. Networked supercomputers using off the shelf hardware and free software (on the other hand) are well within many research budgets and can be bought and used solely by one group 24/7 for whatever they see fit. Giving them much more time to debug their code as well as run their calculations and gain more detailed / accurate / more complete results.

    The custom design ultra high performance on the order of a teraflops machines will still have their place at the top of the pile crunching stuff like quantum chromo-dynamics, simulated nuke blasts, and what-not, but the land of the middle of the line custom built number crunchers (from SGI, Sun, IBM, etc.) is quickly eroding.

  • Over Galactic scales you don't really need to use general relativity to model the motion of individual stars, as general relativistic effects only show up on short scales, where the force is strong, or on really large scale where the curvature of the spacetime of the universe begins to have an effect.

    The short scale problem can be resolved by various approximation methods, such as adding a softening distance term to the force calculation, and the long distance problem has pretty much been resolved by measurements (of the cosmic microwave background) that place the curvature of the universe at zero.

    Even if this wasn't true these would still be good calculations to make, using pure Newtonian gravity, as it would allow differentiation between the behaviour of galaxies under Newtonian gravity and in the real universe.

    I just wish I could have played with this baby for my computing project this year, which was the simulation of the collision of two galaxies (using very heavy approximations).
  • It's 100 teraflops
    ___
  • by 575 ( 195442 ) on Sunday June 04, 2000 @02:12AM (#1027325) Journal
    Installing Grape 6
    Processor of gravity
    Quake sure feels real now
  • When I spoke to David Ayd a few weeks ago, they were nearing the completion of a new generation , High Availability Public Access system based on a new supercomputer design meant for academic and research institutions. Unfortunately I can't release the details due to the NDA I'm under, but I can tell you that the project looks very exciting.

    They wanted my advice on the implementation of the time-sharing protocol they're working on. It will literally make everything else look sick, the technology is awesome. Unfortuantely the protocol's design was beyond even me, and although I gave them some advice on it, it will take a few more years to complete the implementation.

    Cheers,

    sw

  • This is a very old concept as it has been said, but if you want specicfic tasks done, you build a specialized processor. Now all we need to do is build a GRAPE 7 for SETI or Distributed.net.
  • Any plans for equipping new space-ships with this computer?

    That will help a lot...umm...while landing at Neptune some day.

  • I happen to be fortunate enough to work on a machine modeled off the GRAPE 4 architecture, so let me clarify. Grape performs ONE calculation. Period. It can't multiply, or divide, or do flow control (if/then). All it can do, is calculate gravitational force between two objects. FAST. We have a 64 node beowulf cluster, and a single GRAPE machine the size of a mid tower pc. For the work it was designed for, our grape machine is nearly a hundred times faster than the beowulf cluster. And the machine cost us less than 10,000 dollars (Compared to quite a bit more for the beowulf cluster)


    Tell a man that there are 400 Billion stars and he'll believe you
  • The link to the computer page directly: http://www.damtp.cam.ac.uk /cosmos/Public/tour_index.html [cam.ac.uk]
  • At Drexel University, I had an opportunity to work with a machine based on the GRAPE 4 architecture, and let me tell you, this thing is amazing. Granted it can only do one thing: take in initial conditions and spit out forces (no if/thens or even add/multiplies here), and FAST! We have two supercomputers in one server room: A 64 node beowulf cluster, and a GRAPE machine. For the type of calculations GRAPE is designed for, it is about a hundred times faster than the beowulf cluster, all in the size of a mid tower PC case. Abso-frickin-lutely amazing! Not to mention the fact that our GRAPE system cost us about 10,000 $US, compared to MUCH MORE for the beowulf cluster (I dont have the number on hand). THATS what I call price/performance.


    Tell a man that there are 400 Billion stars and he'll believe you
  • tera == 2^40; peta == 2^50; exa == 2^60; address space of a 64-bit machine == 16 exabytes
  • by slothbait ( 2922 ) on Sunday June 04, 2000 @06:34AM (#1027333)
    Processors with embedded RAM's have been under research for some time. Check out the IRAM project at Berkeley and the PIM project at University of Michigan and elsewhere. Despite all of the research, though, Processor-in-memory hasn't made it into general use yet.

    There are many problems with implementing a system like this in practice. The fabrication process used for DRAM's is completely different from that used for logic. In general, for DRAM you want a *high* capacitance process so that the wells holding your bits don't discharge very quickly -- that way you can refresh less often. In logic you want *low* capacitance so that your gates can switch quickly (high capacitance -> high RC time constant -> slow rise/fall time on gates -> slow clock speed).

    Fabricating both with the same set of masks doesn't work particularly well, so you really have to compromise -- you'll basically be making a processor with a RAM process, or vice-versa. Alternately, you could use SRAM, which is nice and fast and is built with a logic process, but is 1/6th the storage density of DRAM. This is why SRAM is used for caches and DRAM is used for main memory.

    Having the memory on the same die as the processor definately gives a bandwidth and latency advantage. For instance, when you are on the same die, you can essentially lay as many data lines as you like so that you can make your memory interface as wide as you like.

    But another large advantage is the power-savings. Processors consume a great deal of their power in the buffers driving external signals. Basically, driving signals to external devices going through etch is power-expensive, and introduces capacitances that kill some of your speed. Keeping things on die, no such buffers are needed, and a great deal of power is saved.

    The first commercial application of the processor-in-memory concept that I am aware of is Neomagic's video cards. They went with PIM not for bandwidth, but for power-conservation, and chip reduction. These characteristics are extremely appealing to portable computing, and thus Neomagic now pretty much owns the laptop market.

    In a limited application, such as a 2D graphics card, this is feasible because the card only needs perhaps 4 MB of memory. Placing an entire workstation's main memory (say, 128 MB) on a single die *with* a processor would lead to a ridiculously massive die. Big dies are expensive, lead to low yield and increase design problems with clock skew. Thus, having 128 MB of DRAM slapped onto the same die as your 21264 isn't going to happen in the near future.

    Placing a small (4-8 MB) amount of memory on-die, and leaving the rest external is possible, but leads to non-uniform access memory, which complicates software optimization and general performance tuning greatly. It is generally considered undesirable.

    Another approach is to build systems around interconnected collections of little processors, each with modest computing power and a small amount (say 8 MB) of memory. Thus, you are essentially building a mini-cluster, where each node is a single chip. This, too, leads to a NUMA situation, but it is more interesting, and many people are pushing it.

    PIM's are going to be used more and more, and the massive hunger for bandwidth in 3D-gaming cards very well may drive it to market acceptance. The power consumption adavantages will continue to appeal to portable and embedded markets as well. However, general purpose processors based on this design are unlikely in the near future. This style of design doesn't mesh well with current workstation-type architectures.

    A bit of a tangent, but I hope it was informative...
    --Lenny

  • "...I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists..."

    The solution is trivial.

    1. Carry Ultra-Sparc to building rooftop.
    2. Drop Ultra-Sparc off building rooftop.
    3. If results are disputed, request that critic stand at base of building. Repeat steps 1 & 2.
  • I see you've mastered the First Law of Technical Papers: if you're writing about something fairly math-heavy, your equations sure as hell better make pretty pictures.
  • Isn't this much like the thing ENIAC was built for: calculating ballastic missile paths?

    Somebody correct me if I'm wrong, but I'm pretty sure that the ENIAC was used for calculating artillery tables, not ballistic missile paths...

  • But who wants to print some 500 page document? its a waste of paper and ink......

  • Perhaps I'm naive, but when they say that this computer is exclusively for calculating gravitational interactions, why could you not make some data substitutions and use it for different calculations?

    Step 1) Acquire data on the purchasing behavior and demographic info of a couple of million consumers from some unscrupulous web retail site.

    Step 2) Get a few scaling variables on the front- and back-end, replace stellar mass with income, replace stellar velocity with purchasing habits, replace stallar cluster density with population density (or proximity to retail outlets), etc., etc.

    Step 3) Run the system to model consumer purchasing decisions for a product you're planning to introduce into the marketplace.

    Surveys measure economic activity on a large scale and make broad predictions. Could this be used to more accurately model and predict economic behavior on a more precise scale? The data would be constantly updated, and the models would be constantly rerun to get the most accurate picture possible of how you and I will spend our $$$. Just make sure the the observed isn't aware of the observation, or your models lose their viability.
  • Being an IBM employee, I feel the need to stand up for the good Mr. Ayd :).

    Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too.

    Seriously, I think David misclassified GRAPE 6 quite a bit. I don't think it's quite David's fault, because the article writers don't know the difference between 'supercomputer' and 'attached processor'. ABC News didn't really apply the term 'supercomputer' correctly either.

    The term 'supercomputer' is more of a marketing term than anything else. Technical people only use it when they want to describe a general capability. AFAIK there is no concrete definitions of 'supercomputer', and if there were they would likely change daily. GRAPE 6, from the information I can see, is really an attached processor.

    Attached processors can be an ARM chip on your network card [3com.com] to a GRAPE 6. Interanally, GRAPE 6 is a full custom, superscalar, massively pipelined, systolic array (say that 5 times fast). That basically means that data comes in one side of the board, and after n clock cycles the answer comes out the other side. There is no code other than a program running on the host computer which generates and consumes data, and every piece of the algorithm is done in hardware.

    "What happens when the algorithm changes?" you might ask. Well, then you're screwed. You have to do a whole new board. Many boards use programmable chips as their processing elements, and can reprogram them when bugs or features get added, but these guys appear to be using ASICs. Great for speed, bad for flexibility.

    Even though David Ayd was mistaken about the architecture, this idea has been around for quite a while also. The SPLASH 2 [ccic.gov] project was one of the first successes with this idea. There is also a commercial company [annapmicro.com] selling boards using that idea but with completely up to date components (compared to SPLASH).

    Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.

    Well, we really can't argue with that, can we, Mr. Ayd?


    This architecture lends itself to extremely high throughput. It's no surprise that these perform so well. NSA uses architectures just like this to do it's crypto crunching. Brute forcing doesn't look so bad after trying one of these :).
  • Around 100 GFLOPs, $5 million, these days.

    Considering a Mac G4 chip peaks at 4 GFLOPS ...
  • Several years ago, I wrote up a program to play with the n-body problem and I also wrote up an introductory type paper on the subject. The link you posted to Amara's N-body page is a good one, and I used it as one of my sources in my paper. Included in my paper, however, are a bunch of pictures to make some of the concepts clearer.

    I think most of the infomation is still fairly accurate. The paper is aimed at semi-techincal people, but not experts in the field of the n-body problem. That is, show some formulas that such that use basic, first year calculus, but you don't need to know how to use a "Hamiltonian Operator".

    You can find a copy of XStar program and paper can be found here [midwestcs.com].

    The original document was created in FrameMaker and I have been unable to fully HTML-ize it. To get a copy with pictures, you need to look at the postscript document.

  • Supercomputer sometimes means "limited computer".
    In exchange for increased performance in some
    repect, you lose something in general purpose
    computing, such as software tools, programming
    generality, adequate peripherals etc.

  • mmmmm 'all that cheese'

    Actually there are pretty reasonable arguements against that possibility. The strongest of which is cosmological. If you believe cosmology the relative abundance of the light elements (hydrogen, helium, and lithium) would be thrown all out of whack if the universe has more baryons (stars, dust, moons, gas and such) than about a third of the mass needed to make the universe flat. Basically this has to do with the fact that during the first few moments (of the universe) nuclear reactions are going on all of the time turning hydrogen into helium and back again. The forward and backwards reactions are density dependent and go on for as long as the universe is hot enough to sustain the reaction. So the relative abundances give you a measure of the density of the baryons in the universe at the moment that the universe cooled enough to stop the reaction.

    The universe appears to be flat, from redshift surveys and the ripples in the cosmic microwave background. So since we know that baryons can't do more than a third of that we are forced to postulate something wierd to account for at least the other two thirds.

    The most likely value of the baryon fraction is actually around 12 percent and the rest is split between dark matter and something even wierder called 'vacuume energy of space' (or 'lambda').

    Hope that made some sense.

    chris

    it's your universe, get used to it

  • I wonder how well this would underclock [overclockers.com]?

  • I defer to M. Godfrey on his/her description of later Cray computers. The only one I actually saw was being used as furniture in a museum computer exhibit. However, to avert a flame war, I said Mr. Cray designed a single purpose computer, and M. Godfrey said it was for general purpose computation. We do not disagree. Mr Cray designed his computers to do a vector calculation very quickly. This is not simple, and requires college level math to understand. And yes, this complex caculation can be programmed to do an enormous variety of math problems. It is a tool that is both very powerful and very general. Don't forget, the earliest computers were limited to addition and subtraction of integers. Multiplication, division, and real numbers were done with software. Putting multiplication, division, and real numbers into hardware was hard work. Seymour Cray put vector calculations into hardware, and that is awesome.
  • Ummm... did you bother to read the article? Not all FLOPS are created equal. These things appear to be hard-coded pipelines, and as such even though FLOPs are being used, they are entirely contained inside the chip. The input of one is taken directly from the output of another, so you can't use them for anything else.
  • First off, that's 100 teraflops, not petaflops.

    Secondly, that's "theoretical peak performance", otherwise known as the "guaranteed not to exceed" performance. On their highly specialized code it'll probably do ok, but on other calculations I'd be surprised if it got 10% of that speed, especially if a lot of cross-node communication is occuring. Don't forget, this is not a general purpose computer, it's like a really really big math co-processor that is optimized to run a very very specific type of program fairly well.

  • Okay, back in World War 2, they had this problem of having to compute trajectories of artillery. They ended up creating the first electronic computers. Now, years later, we have electronic computers doing almost anything imaginable, and the cutting edge:

    Computing trajectories.

    (Disclaimer: Yes, I know it's only one of the cutting edges, and yes, I know gravitational interactions aren't strictly the same as trajectories, but the irony remains, okay?)
  • 100 petaflops??? I think you mean 100 teraflops.
  • by BiggestPOS ( 139071 ) on Saturday June 03, 2000 @10:56PM (#1027350) Homepage
    My study of gravity has long been hindered by not enough computer power. I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists. I get laughed out of all the conferences, people don't return my calls, I must simply have this machine to prove my theories. I'll take 4, wait, make it 3, i'll just overclock them. Hmm, now, about porting linux to it.....

  • by Detritus ( 11846 ) on Saturday June 03, 2000 @11:06PM (#1027352) Homepage
    A paper (PDF format) on its predecessor, GRAPE-5, can be found here [sc99.org]. It has more technical detail but it doesn't describe the architecture of the specialized processors. It won the 1999 Gordon Bell price/performance prize.
  • Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop. Is that accurate? If not, what *is* the definition of a supercomputer these days?

    Which reminds me, if anyone is interested in the "flopsability," to coin a silly-sounding word, of common x86 processors, visit http://www.jc-news.com/parse.cgi?pc/temp/TW/linpac k --interesting, if practically useless, scores...
  • Why can't people ever read the gawl darn articles? This thing is for gravity _only_, more or less.
  • that means when i buy one of these suckers i can run seti@home and watch a dvd at the same time.

  • Forgot the cooling
    Temperature far too high
    Expensive doorstop

  • Well, if were going to exchange pictures of supercomputer then there is a nice one [unite.nl] of the O2000 at SARA here in the netherlands.
  • Stop trolling! The Steve Woston [mnc.co.za] is terribly annoyed at being impersonated by trolls on /. Read what the Real Steve Woston has to say about it here. [mnc.co.za]

  • by szyzyg ( 7313 ) on Sunday June 04, 2000 @03:21AM (#1027359)
    As an astornomer who does these kind of calculations I shuld point out that this system is not just specialised to solve one type of problem - The N body problems where N is very big - e.g. our galaxy has about 100, billion stars in it - fully specifying their position and velocity would require 4.8 terabytes of memory. We're still a long way away from that... but getting closer. Oh and that's neglectign things like molecular clouds and suchlike which have appreciable mass but aren't stars

    I have a cluster of alphas crunching away solar system models - Grape6 couldn't actually do this very well since it's designed for a certain N body algorithm which doesn't suit small N... Instead I use a syplectic integrator which takes advantage of a number of known factors in the problem.

    So - we still need bigger and faster machines, but we also need more general machines...

    Anyway... I want one of these to model EKO formation in the solar system
  • PDF format, you fucking NEED this supercomputer to get one of those things to scroll smoothly. I mean, GEEZ! What are people thinking when they "publish" something in PDF? Wow, this looks EXACTLY like I want it to. Too bad no one cares, ever try reading a lengthy document in PDF, it sucks ASS.

  • General purpose computers get their butt kicked in price and performance by custom silicon, assuming the task is well-defined and not too complicated. These get used a lot in signal processing and decoders for error correction codes.
  • I'm sure there are others that used less-powerful system to run mathematical behemoths.
    The Sun Ultra Enterprise 10000's microcode and glue logic is loaded from an Ultra 5 with a JTAG card (collectively known as the System Support Processor). Not that the UE10k is particularly a mathematical behemoth, but lots of chip foundries like to use them for layout.

    -jhp

  • I would split them into two types, classic supercomputers like Cray vector systems, and massively parallel collections of microprocessor modules with high-speed interconnects.

    The problem with anything based on a microprocessor is the pathetic main memory bandwidth. If your program blows out the cache, the performance goes to hell.

    A vector supercomputer is designed to have massive memory bandwidth, enough to keep the vector processing units operating at high efficiency. No cache or VM to slow things down. An engineer once told me that a Cray was a multimillion dollar memory system with a CPU bolted on the side.

    See the STREAM benchmark [virginia.edu] web page for some measurements of sustained memory bandwidth. This separates the real computers from the toys.

  • David Ayd, a supercomputing manager at IBM, says "the GRAPE 6 computer appears to be based on a very old model. In the 1970s and '80s these vector models were developed in Japan for problems like simulating weather and plane mechanics, he said. The difference today is that the computers can do the jobs at 100 times the speed or faster."

    Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers [ibm.com] too. But:

    Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.

    Well, we really can't argue with that, can we, Mr. Ayd?

    --
    "Give him head?" ... "Be a beacon?"

    "One World, one Web, one Program" - Microsoft Ad
  • by Baldrson ( 78598 ) on Saturday June 03, 2000 @11:52PM (#1027365) Homepage Journal
    Back in 1989, I cut a deal with Datacube whereby, in exchange for testing their new image flow software, I was allowed to hang a bunch of their Finite Impulse Response filter boards together and achieve several billion operations per second doing neural image processing. FIR filters do sum of weighted product calculations on sequences of data (in this case, rectangular region of interest of video data) and do them all in hardware -- at a constant rate. So peak rate is the same as average rate. This allowed one to train the system to recognize features that could not be exctracted via analytic algorithms at a blazingly high speed. Unfortunately, even though the system would only cost around $200,000 at that time, the only market interest was from government shops who had some serious Not Invented Here cultures.

    I haven't followed the progress in the field since then, but I suspect present day hardware could handle a good fraction of the satellite image feeds affordably -- and dwarf the realized performance figures of this gravitation board.

    Of course, if you want to get really picky about it, there are lots of specialized circuits out there doing work all the time all over the place that could be viewed as "computation" at enormous rates -- it all depends on where you draw the line.

The most difficult thing in the world is to know how to do a thing and to watch someone else doing it wrong, without commenting. -- T.H. White

Working...