Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Sandia Wants To Build Exaflop Computer

Posted by Soulskill on Fri Feb 22, 2008 08:20 AM
from the an-exaflop-ought-to-be-enough-for-anyone dept.
Dan100 brings us an announcement that Sandia and Oak Ridge National Laboratories are setting their sights on an exaflop supercomputer. Researchers from the two laboratories jointly launched the Institute for Advanced Architectures to facilitate development. One of the problems they hope to solve is how to provide each core of each processor with enough data so that cycles aren't going to waste. "The idea behind the institute — under consideration for a year and a half prior to its opening — is 'to close critical gaps between theoretical peak performance and actual performance on current supercomputers,' says Sandia project lead Sudip Dosanjh. 'We believe this can be done by developing novel and innovative computer architectures.' The institute is funded in FY08 by congressional mandate at $7.4 million."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Aren't we getting a little bit ahead of ourselves, Sandia? What program would you run on this? This brings up the essential issue: what kind of program would YOU write to take advantage of this? I can only think of one: AI.
    • Aren't we getting a little bit ahead of ourselves, Sandia? What program would you run on this? This brings up the essential issue: what kind of program would YOU write to take advantage of this? I can only think of one: AI.
      Military simulations. That's what Sandia spends most of its supercomputing clock cycles doing. The Department of Energy funds supercomputing centers like Sandia National Laboratories in order to run simulations on military vehicles, nuclear weapons simulations, etc.
      • True. Sandia does a lot of nuclear systems simulation.

        However, Oak Ridge is an unclassified facility doing mostly academic research on climate change, fusion energy, biological systems modeling, geological systems, ... the list goes on, but almost any US researcher can get an account on their systems, and purchase cpu time. The trouble is that neither of these labs has quite enough resources to dramatically change computer architecture directions. They can both afford to have 1 or 2 very high end machines a
      • They probably need it to run Windows for Warships properly. You wouldn't want to run out of cycles in the middle of a battlefield.
        • by Albert Sandberg (315235) on Friday February 22 2008, @09:01AM (#22513928) Homepage
          ... they will be the best prepared for duke nukem forever that's for sure.
        • So basically, Sandia wants to run better games...
          No. Weapons modeling is actually fairly dull. But, since the kind of weapons that the DoE cares about can no longer be tested, detailed and computationally hungry simulations are the only way to predict performance.
          • This does not, of course, address the question of whether nuclear weapons modeling is the most worthwhile use of all those cycles, paid for by the American taxpayer.

            I don't think it is.
            • Maybe not - That's another debate entirely that I don't care to weigh in on. But, it is an application that demands a lot of processing and that will almost certainly be one of the many applications that SNL and ORNL will be using this system for. An AC below who claims to work at SNL points out that SNL has also done asteroid impacts modeling and says that SNL makes a lot of its resources available for outside research.
    • by uuxququex (1175981) on Friday February 22 2008, @08:30AM (#22513706)
      So far the advances in the field of AI have been non-spectacular. Yes, there have been somewhat succesful reasoning systems (rule based or probability based) and neural networks have made classification easier.

      However, at the moment there are no serious applications that will only become feasible by having more computer power.

      More speed in calculation has plenty of benefits, but AI as a research field will not be making major announcements soon because of this new machine.

      • Re: (Score:2, Interesting)

        Excellent point. I might add that I have been working on just such a code set for a few years but that's another discussion. The real reason that AI has been stuck is because it has only attempted to replicate the functions of one hemisphere: the left (linear sequential, where language is processed). The visual-simultaneous right hemisphere is the one that no computer today replicates. THAT, my esteemed friends, is where the work needs to be done. I have spent the better part of four years on just that pro
        • Re: (Score:3, Informative)

          Many people have worked on the area which they think AI is lacking in, but until we understand the brain (or even come up with a good definition of "intelligence") I don't think we'll get very far. But who knows, keep on looking!

          There are a lot of uses for extra computing power though, it's not like we've reached a point where we have too much. Protein folding and climate models are the first that come to mind, but I'm sure there are many others. Companies aren't building these things for fun.
        • IMHO, AI won't progress until we take a serious look at how intelligence arose on this planet, namely us. How did intelligence start with us? Where did it evolve from? Take for instance, the instinct for survival, I would say that this spurred sapien family tree to grow more intelligent letting us become more aware of our surroundings and ourselves (self-awareness). the pre-frontal cortex to simulate what we see and hear and feel and do some trial and error simulations in our head to determine the outcome (
        • I think as far as neural networks are concerned, pathways are only reinforced if the results they give are 'good' ones with regards to the result you were hoping for. In more general AI terms you'd have to code in something similar that made the computer disregard behaviours that equated with whatever you have specified as 'bad'. ie your robot chops someone's head off with a chainsaw, it would mark that as 'bad', and try to be more careful around humans in the future while wielding a chainsaw.
    • by PowerEdge (648673) on Friday February 22 2008, @08:31AM (#22513712)
      You don't usually run one program on these type of systems. The compute cycles are bidded out to researchers and they get x number of compute hours. The system is partitioned out to a few nodes and given to the researcher to run their codes on. You could have on a system like this hundreds of jobs running simultaneously. Also, with the tens of thousands of cores needed to reach this status, a node failure, or other hardware failure is inevitable. Right now if a node fails in the middle of the job, everything is lost from the last checkpoint. The chances of failures impeding work go up greatly the more nodes and cores you run the job on.
      • Sort of.

        Ten years ago the largest systems had ~1000 processors, and jobs would usually run on 100-300 nodes. Now they have ~20,000 processors, and jobs tend to typically use 4000 nodes. Presumably an exaflop machine will have ~1,000,000 processor cores, and typical jobs will use 200,000 nodes.

        I think this institute is being funded to deal with issues exactly like the problem you present. Checkpoint/restart was a decent solution for a YMP, but it has outlived its usefullness. I imagine there will someday be
    • What AI program, pray? I was unaware we had an AI program ready to think for us, only asking for a few hexaflops here or there.

      I think Sandia would probably like to run lattice QCD simulations. Those can chew through any amount of hexaflops you can throw at them. Otherwise we have the ever-demanding weather bureau for these elusive 15-day forecasts. It's not difficult to conjure up a problem that would take weeks to run on current hardware. Indeed neural simulations are a possibility, but not the only one.
      • Re: (Score:2, Funny)

        by Anonymous Coward
        I am a flopped hex, you insensitive clod!

        (or did you mean exaflops, as in 10^18 FLoating point Operations Per Second?)
        • Just in case someone needs a reminder. I get always confused:

          10^24 yotta Y 1 000 000 000 000 000 000 000 000
          10^21 zetta Z 1 000 000 000 000 000 000 000
          10^18 exa E 1 000 000 000 000 000 000
          10^15 peta P 1 000 000 000 000 000
          10^12 tera T 1 000 000 000 000
          10^9 giga G 1 000 000 000
          10^6 mega M 1 000 000
          10^3 kilo k 1 000
          10^2 hecto h 100
          10^1 deka da 10
          10^0 1
          http://en.wikipedia.org/wiki/SI_prefix

        • I am a flopped hex, you insensitive clod!
          Wouldn't a hexxed clod of dirt actually have nerve endings, and therefore be sensitive?
      • Re: (Score:3, Informative)

        Try Climate & Weather Codes, Fusion, Combustion, CFD, Bio (genomics), and
        a host of large science/engineering, partial differential equation=based applications
        requiring the solution of large systems of matrix equations,... Check out:

        http://www.ornl.gov/info/press_releases/get_press_release.cfm?ReleaseNumber=mr20061025-00 [ornl.gov]
    • by Huntr (951770) on Friday February 22 2008, @08:37AM (#22513756)
      What program would you run on this?

      Vista, with Aero enabled.
      • Some well written code... Just might go so fast that the space time continuim bends to accomodate a new class of ultimate power in the universe. ;) Unfortunatly, you aren't going to find a whole lot of that in Vista. =P
    • Um, the Towers of Hanoi?
    • by Anonymous Coward on Friday February 22 2008, @08:51AM (#22513854)
      I happen to work at Sandia and can assure you that much more than weapons work is done on the computers. In fact, recently a lot of work was done in modeling the huge asteroid that smashed into Russia in the early 20th century. The researchers we able to develop new understanding of the dynamics of such an event and discovered that much smaller asteroids than previously thought could do such damage.

      Also, a large portion of the computers are available to outside research (besides research done at the Labs).

    • I bet they partnered with 3dRealms to who needed help finishing Duke Nukem Forever.
    • If they had AI that could run on fast computers, then they'd have AI that could run on slow computers, just slowly. They don't, sorry.
    • They could always download BOINC and have their pick
    • What program would you run on this?

      The thing about supercomputers these days is that they aren't a single node. You don't boot them up and run a "program". Typically, what happens is that you've got a whole bunch of folks who want to run their codes on a distinct set of processor cores, and the more total processing power a supercomputer has, the more individual codes can be run simultaneously. At least, this is how it works on massively-parallel supercomputers like BG/L, Cray XT, etc. The vector-based
    • Duke Nukem Forever
    • It's not hard to come up with programs that need a lot of processing power to run. Most of the stuff currently being run on supercomputers are (relatively) small programs, but with huge data sets that are easily parallelized. Pretty much any kind of simulation falls into this category: climate, genetics, biology/pharmaceutical, plasma physics, particle physics, nuclear physics.

      You don't need a "smart" program to utilize a fast computer - in fact, they are most useful in situations where the smartest people
    • Lattice QCD.

      These guys casually throw around numbers like "32 million cpu-hours on this machine, 40 million cpu-hours on that machine", and are always needing more power.
      • Amen to that ma brotha
      • by Gospodin (547743) on Friday February 22 2008, @09:08AM (#22513990)

        And if a Democrat it would be: "Who would Marx tax?"

        • I don't think the Democrats have a monopoly on irresponsible spending lately. They're just the only ones responsible enough to actually realize that money in = money out.

          The Republicans, OTOH, are happy to borrow trillions of dollars from Japan and China. This is a good idea why, exactly?

          We may not be paying high taxes for Bush's military misadventures right now, but we will later, plus interest.
      • Re: (Score:2, Informative)

        We have elected a republican president... twice. In fact it is this republican president that has put the emphasis on American primacy in the super computing arena. Something called the earth simulator out of Japan put us in our place. This president opened up funding and in effect mandated the classes of systems you see today being built at NASA, DoE facilities, and academia. But, go on with your blind hatred and closed mindedness.
        • Clinton took us to war?

          I don't remember that.

          He tried to whack Bin Laden with a bunch of missiles once, but that's pennies compared to the Iraq War.
  • Flow Down? (Score:3, Funny)

    by webword (82711) on Friday February 22 2008, @08:35AM (#22513730) Homepage
    How long does it typically take for memory and sotrage advances to make to end consumers? For example, when we first heard about "gigabytes" back in the day, how long did it take to get there once it was being done in the laboratory?

    OK, here's the truth. I'm just wondering since I need more memory to carrying around the entire internet in my pocket. Right now, I can only fit Ron Paul fanatic postings on my USB stick. They are taking up a lot of room. (Nothing against Ron Paul, mind you.)
    • Shh, we already have the record/movie companies pissed off. Just imagine the lawsuits when people start torrenting whole internets!
  • by kazade84 (1078337) on Friday February 22 2008, @08:36AM (#22513738)
    it said Santa was building a super computer :)
  • by vstat (456161) on Friday February 22 2008, @08:37AM (#22513754)
    No cycles wasted there!
  • I want one (Score:4, Interesting)

    by sm62704 (957197) on Friday February 22 2008, @08:44AM (#22513808) Journal
    I don't know why, but I want one.

    Twenty years ago we had a Compaq portable that ran on a 16 mhz 286 at work, and it was HOT. Blazingly fast, could do anything. That is, for its time. The supercomputers then weren't as powerful as your laptop today.

    So if I can manage to stay alive for another 20 years, I'll probably have a laptop more powerful than the supercomputer in TFA. I guess I'll just have to wait a while.

    -mcgrew [kuro5hin.org] (link is to "Growing Up With Computers", a 2 year old K5 article)
  • by Lars Clausen (1208) on Friday February 22 2008, @09:00AM (#22513912)
    It'd be cool if TFA's headline was actually correct: Then we'd have a machine whose performance actually accelerated by a 10^15 floating point operations per second *per second*. That gets to be a lot of FLOPS real fast.

    OTOH, it might just be the singularity happening. We wouldn't notice until it was too late.
  • by Guybrush_T (980074) on Friday February 22 2008, @09:21AM (#22514134)

    Reading the article, the goal is nowhere near building a real exaflop computer, but more about thinking about issues (like processor data feeding).

    In a year and a half, we shouln't have more than 100 GFlops per socket, which means that you will still need 10 millions of processors (not cores!) to achieve the exaflop computer. No chance to build a cluster that big (at least these years).

    The all-times progression of the top500 [top500.org] shows that exaflop computers should arrive around year 2020, definetly not tomorrow. (x10 every ~4 years, 2008:1 PF, 2012:10 PF, 2016:100 PF, 2020:1 EF)

    • Intel [wikipedia.org] has recently managed to squeeze 80 cores onto a single chip, resulting in up to 2 TFlops [wikipedia.org] per socket. That's about 700 times as much as a single 700 MHz BlueGene/L processor or 350 times as fast as a BlueGene processing node. 250 (500) of those chips could take on LLNL's BlueGene at a mere 66 (31) kW of power consumption. (Values in brackets are for the slower, more power-efficient version).
      Back in the real world I don't know how real the TeraScale chips are. Yield probably is as low as it gets, I wou
  • A lot of so-called supercomputers only have some parts that may run at the theoretical speed, because they stint in other parts such as memory, or bus speed etc. A viable general-purpose computer usually has one flop = one byte of core = one second to completely write core. Thats pretty much the case with desktops in the single-gigaflop range.
  • Sandia and Oak Ridge are not coming up with an exaflops computer. They are contemplating how to write software, in a way that will effectively use exaflop computers when they become available. This little group has a budget of $4.7million, which is enough to pay a dozen high-level research scientists, and a half dozen software developers for a year. They're not going to reinvent high performance technical computing.

    They're going to rework some fundamental math libraries to deal with the obvious trend in HPT
    • Re: (Score:2, Informative)

      If your work tends to be I/O bound... it doesn't belong on an exaflop cluster. You know BlueGene/L? Most of its nodes don't even talk directly to the storage system--they're connected to special I/O nodes which then talk to the storage system.

      Scientific computing doesn't really deal with THAT much data. The scientists here at Sandia (yeah I work at Sandia CA) think they are just HUGE data creators. "We generate a PETABYTE per YEAR!" they say... not realizing that a petabyte is a drop in the bucket for the