Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Supercomputing Technology

Has Supercomputing Hit a Brick Wall? 185

anzha writes "Horst Simon, Deputy Director of Lawrence Berkeley National Laboratory, has stood up at conferences of late and said the unthinkable: supercomputing is hitting a wall and will not build an exaFLOPS HPC system by 2020. This is defined as one that passes linpack with a performance of one exaFLOPS sustained or better. He's even placed money on it. You can read the original presentation here."
This discussion has been archived. No new comments can be posted.

Has Supercomputing Hit a Brick Wall?

Comments Filter:
  • by cold fjord ( 826450 ) on Tuesday May 14, 2013 @11:18AM (#43721167)

    You can't really make factor 10 improvements indefinitely. Eventually the numbers overwhelm you and you hit roadblocks. The only real solution will ultimately be new computing technology, such as quantum computers.

    • The problem is extracting parallelism. What is there to stop one from building, say Itanium based MPP systems and tossing more CPUs into the mix, using either an unified memory architecture, or distributed memory architecture? Point is that it won't speed up computing beyond a point simply because there ain't that much of parallelism in most processes.
  • No? (Score:5, Informative)

    by oGMo ( 379 ) on Tuesday May 14, 2013 @11:19AM (#43721175)

    "Japan to develop new exaflop computer by 2020" [japandailypress.com] ... why not? And if it's even a few microseconds into 2021 I suppose that supercomputing has failed, will pack up, and go home.

    • Re:No? (Score:5, Informative)

      by gentryx ( 759438 ) * on Tuesday May 14, 2013 @12:18PM (#43721887) Homepage Journal

      Power consumption and MTBF: power consumption (high operating costs) be solved perhaps be solved by a larger budget, but the mean time between failures (MTBF) means, that the machine will fail before it can compute anything meaningful. Right know the machines we build, and even more importantly, the software we build rely on all parts of the machine to function. If even a single node fails, then the data it holds becomes inaccessible and the rest of the compute job crashes like a house of cards.

      This can be remedied by taking frequent snapshots and then restarting from the last snapshot, but the time for checkpoint/restart has been continuously growing for the last systems. No one really expects exascale systems to do full system checkpoint/restart in a reasonable time frame. They'd spend more time taking snapshots than actually computing.

      Source: I'm doing my PhD in supercomputing.

      • One of the factors that would allow the MTBF to go up exponentially is the operating frequency, and if a system tosses more CPUs into it, the system designer can underclock it so that the MTBF is significantly increased. In the meantime, work can be done in extracting more parallelism out of the software and running it on more CPUs, so that overall performance doesn't take a hit. Underclocking also helps a bit with the power consumption, although not enough to compensate for the extra CPUs being tossed in
        • by Nutria ( 679911 )

          Control units don't heartbeat individual nodes? They aren't designed to monitor and restart the unreported work of a failed node?

          Frankly, I'm shocked.

        • by gentryx ( 759438 ) *
          Citation needed? I don't see why nodes would suffer ("exponentially") fewer hardware failures if clocked lower.
          • I was wrong about exponentially, but I do recall reading that in college in our computer system design course. Googling for it, I found this equation [interfacebus.com], which shows how as the clock frequency of the flip-flops decrease, the MTBF would increase. Inversely proportional, though, not exponential.
            • You are misinterpreting and misapplying the data on metastability to computer systems. Once data is inside a synchronous system that isn't being clocked so fast that data isn't fully settled at flipflop inputs, slowing the system isn't going to enhance reliability. (Similarly, if you're running a synchronous system much too fast, not even a single instruction will execute properly.)

              Metastability is a concern for asynchronous inputs. There are techniques for dealing with it, although it becomes tricky as dat

      • What about the idea that was popular a few years ago about making stuff fail gracefully, both at the hardware level and the software level, so that the system could swallow the error and go on calculating without completely ruining the result? Could failures be reduced to essentially just another source of error?

        • Yes, in a way. We'll probably never be able to improve the hardware far enough that we can simply rely on it to fail gracefully (i.e. announce it's impending death a few seconds in advance). The reason is that ATM our systems contain approx. 20k nodes. Exascale systems will likely push this to 200k.Even if you assume a node will live 10 years in average, then you can estimate that every ~53 minutes one node of the system will fail.

          My money is on the software: we'll need some kind of redundancy (e.g. a simul

        • The thing about scientific computing is that scientists like to write MPI and Fortran. They just love that shit. And they are traditionally really resistant to any new programming model. So when you tell them they need to start using XYZ instead of MPI so their programs can actually complete at exascale *before* hardware failure, they get unhappy and instead implement things like checkpoint/restore that takes 70% of the runtime. Source: I work in HPC.
          • The thing about scientific computing is that scientists like to write MPI and Fortran. They just love that shit. And they are traditionally really resistant to any new programming model. So when you tell them they need to start using XYZ instead of MPI so their programs can actually complete at exascale *before* hardware failure, they get unhappy and instead implement things like checkpoint/restore that takes 70% of the runtime. Source: I work in HPC.

            Changing from FORTRAN would require us to actually try to comprehend decades worth of scientific data stored in FORTRAN data files. Ain't no one got time fo' that.

          • by tibit ( 1762298 )

            For good checkpoint/restore, you probably need a custom node design that would accommodate it efficiently, but I can't see why it'd have a 70% overhead. Doing a copy of your memory contents to another memory that has same bandwidth and capacity, and then lazily moving that off-node without the main CPU being involved is no biggie. You probably could implement the memory bridge and the recover CPU on a simple FPGA. The main, fast CPU is crunching numbers, then stops, the FPGA takes a memory copy, the main CP

      • You might need to broaden your research beyond what is available in the academic literature. Google handles redundancy. When they do a map/reduce, the clusters are self forming. If a cluster leader/master goes down, the cluster reelects a new master. They trust the integrity of nothing. Not even DRAM. They checksum everything. The actual architecture of Google's data centers is a closely guarded trade secret, but from what [little] I've been able to glean, they're light years ahead of "big iron" vend

        • by vidnet ( 580068 )

          When they do a mapreduce, each node might take minutes or hours to do work.

          I'm sure they have processes that require fine grained, millisecond parallelization, but mapreduce is not one of them.

        • Whenever someone on on /. likens Google's network to a supercomputer God kills a Pokemon. But honestly: the reason why Google can cope with these massive outages is that they're doing totally different computations from supercomputers. Google's compute jobs are losely coupled. They do data mining. That is fundamentally different from supercomputing where all compute jobs are tightly coupled. To give you a car analogy:

          • In the Google case millions of mechanics fix millions of cars in parallel. This is more
      • by tibit ( 1762298 )

        Presumably no matter what the memory size is on any node, it could be doubled, and presumably the bandwidth on that memory is such that duplicating the contents of one half of the memory to the other half would take a reasonable amount of time (say 0.1-1s). You can then dump the second copy over a dedicated bus without slowing down the computations. Even if the bus wasn't dedicated, the bandwidth will be curtailed by the hard drive array you use for long-term snapshot storage - so it may, say, eat 10% of yo

  • ...but I can pretty much guess where this is going. If you look at the massive parallelization improvements we've witnessed among supercomputers over the past couple decades, you can predict that at some point, most of the low hanging fruit would eventually be picked at which point the underlying latency between interconnects would start to become a limiting factor. Couple that with the fact that there's been a complete lack of significant performance improvement in desktop/server CPU space in say the past

  • He doesn't say it's not possible, rather we can't get there by just extending current technology. So by extension, 2020 is too soon to expect exaflops. He also presents arguments why exaflops is important and work to get there should continue.

    • by geekoid ( 135745 )

      Lie you ignored the article? And nothing i that link shows they are using it to build a supercomputer.

      • by quax ( 19371 )

        Have problems parsing your question "Lie you ignored the article?"

        Was that supposed to be "Like"?

        My point is that conventional super-computing is indeed facing a crisis, but that non CMOS based technologies may save the day.

        • Where does CMOS even enter the question? All modern fast processors use dynamic NMOS devices in the signal path; PMOS is only used to recharge nodes to start a new cycle. For a particular set of process dimensions, CMOS is about 1/4 the speed of dynamic NMOS.
          • by quax ( 19371 )

            Fair enough, I used CMOS as sloppy shorthand for all current silicon based field effect transistor integrated circuit technology. (See how much longer that is?)

    • Re: (Score:3, Insightful)

      by Anonymous Coward
      Even if you ignore all the controversy over D-Wave's system and its nature, and take it all at face value, it is still only applicable to a narrow class of problems. CMOS or not, it amounts to something similar in principle to an ASIC. It is no surprised that a custom built chip can solve a specific class of problems orders of magnitudes faster than a general purpose processor. This used to be slightly more popular for a while in the 80s, where a few custom computers were built that were specifically des
      • by quax ( 19371 )

        Think you miss the bigger picture here, in that they are pioneering non silicon based LSI circuits that operate adiabatically (no heat production). This technology could very well be extended to include conventional logic in paralel with their quantum circuitry.

  • Clarke's Three Laws (Score:5, Interesting)

    by Tokolosh ( 1256448 ) on Tuesday May 14, 2013 @11:45AM (#43721499)

    Clarke's Three Laws are three "laws" of prediction formulated by the British writer Arthur C. Clarke. They are:

    1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
    3. Any sufficiently advanced technology is indistinguishable from magic.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      So, since Freeman Dyson said "Faster-than-light travel is rubbish" [slashdot.org] that means he's probably wrong, and we'll be warping around the galaxy soon enough?

      • Re: (Score:2, Funny)

        by Anonymous Coward

        he should stick to building vacuum cleaners.

      • You don't seem to understand the "concept" behind "warp."

        You are not exceeding the speed of light, you are just not traveling the linear distance between the two points.

        • by tgd ( 2822 ) on Tuesday May 14, 2013 @01:07PM (#43722523)

          You don't seem to understand the "concept" behind "warp."

          You are not exceeding the speed of light, you are just not traveling the linear distance between the two points.

          That's like saying that he doesn't understand the concept behind a Stargate. Made up is made up is made up.

          You can't have an honest discourse on the speed if light when you're trying to involve fiction. You might as well go full star trek and say that thetalon radiation transmorphs subspace and changes the value of C, but only in the presence of an extradimensional rift, and if-and-only-if you have a humpback whale.

        • by gl4ss ( 559668 )

          You don't seem to understand the "concept" behind "warp."

          You are not exceeding the speed of light, you are just not traveling the linear distance between the two points.

          ah the good old futurama theory! you know it's a joke, right?

    • by geekoid ( 135745 )

      3 - I have always hated that one, becasue it's wrong. It's been wrong since the scientific method was put into place.

      I can see a floating disk and know it's science and engineering that created it. In fact, we could use that to figure out how it works.

      Not understanding something doesn't mean it's magic.

      • 3 - I have always hated that one, becasue it's wrong. It's been wrong since the scientific method was put into place.

        I can see a floating disk and know it's science and engineering that created it. In fact, we could use that to figure out how it works.

        Not understanding something doesn't mean it's magic.

        With sufficiently advanced technology I could emulate your brain and this planet within a much larger supercomputer. What if I counfound your scientific method and make the observable universe as crazy as the rules of magic. A being within said simulation could be given access to the voice-activated debugger mode, and when he spoke the 'magic' words, you could suddenly find yourself crawling about as a newt.

        So, no, you're wrong. Clark's third law is correct, and always shall be.

      • I disagree.

        Because you believe there is no magic, you say that.

        On the other hand if we wanted to be scientific, we would asume: there is magic, and we would try to distinguish this new technology from magic. Either finding it is technology or finding it is magic.

        As long as you have no single hint what kind of technology it is, your scientific method wont help you to formulate any theory and executes any tests on that theory. Therefore it is not distinguishable from magic. Plain simple.

        However when you have

  • A 30 MB google docs document. Oh joy. It even appears to break my ipad. Yes, it's worth reading, but would it kill you to write an interesting summary? Even a pithy one, such as "by 2020, the energy costs associated with moving bits around will exceed the costs of actually processing them.

    • by geekoid ( 135745 )

      It's not their fault you use an inferior device.

      • Maybe in 30 years an iPad will be useful for something more than Candy Crush Saga ... but I doubt it.
      • Inferior device? Nonsense. I happen to prefer Apple's "walled garden" to Google's "walled garden", but the world beyond those walls is often more interesting than what's contained within.

        It's just his slides, anyway You can watch Simon give his lecture here [illinois.edu].

        • The audio quality leaves something to be desired,

        • by celle ( 906675 )

          " I happen to prefer Apple's "walled garden" to Google's "walled garden" "

          And both suck versus FreeBSD's walled garden.

          • Free BSD has a walled garden? Where?

            The problem here is that pdf is an open, standardized format. Google docs is not. Yes, I have a google docs account. A slashdot story should not require me to

            1. Log into my third party account, or create one.
            2. Discover that there is "No Preview" available.
            3. Download a 30 megabyte file

            All to comment intelligently on a slashdot story consisting of two sentences.

    • by Animats ( 122034 )

      A 30 MB google docs document.

      61 pages of rather blah presentation slides.

      And we used to think PowerPoint was a bloated format.

  • I wish there was more discussion on the interconnect and routing challenge of these systems. I used to work on an InfiniBand SubnetManager. Exascale will require more complex topologies and more complex routing. Does anyone think today's systems are up to the task?

    • by Jamu ( 852752 )
      That's the first thing I thought about at the mention of a "brick wall". What else is going to stop you building a super-computer that has N-times the processing power of an existing one?
  • by Kaenneth ( 82978 ) on Tuesday May 14, 2013 @12:21PM (#43721923) Journal

    And still a little fuzzy headed, but the first thing I though of was arranging the racks for shortest maximim path, instead of one big football field sized room, stacking the datacenter into a cube shape... Then I thoght, "That's probably why Borg ships are Cubes."

    • And the individual borg minds are essentially ASIC...
    • by gl4ss ( 559668 )

      sphere.

      anyhow, there were reasons for some crays to be shaped like they were.

      • A sphere of cubical modules. Individually spheres are probably not the most optimal shape of CPUs and each blade of the super-computer. Across a large super-computer a spherical shape sets the best distance from center to edge for maximum latency.

  • The Nanosecond (Score:5, Interesting)

    by wcrowe ( 94389 ) on Tuesday May 14, 2013 @12:57PM (#43722417)

    Back in the early 80's I got the opportunity to hear Grace Hopper [wikipedia.org] speak. One of the stories she used to like to tell at her talks was about the time that she was having trouble visualizing a nanosecond. Eventually she sent a memo to her engineers which said, "Please send up one nanosecond." She waited, curious as to how they would respond. After a couple of days a response came back in the form of a metal rod 11-3/4 inches in length with the note attached, "One Nanosecond", and no other explanation. After puzzling over the metal rod she called down to the engineering department and asked, "I give up, what is it"? "That's the distance light travels in a nanosecond", was the response. Later, she sent another memo to the engineers with the request, "Please send up one picosecond." The engineers immediately responded with a memo instructing her to, "put the nanosecond in a pepper grinder and you can make picoseconds all over your desk."

    Grace Hopper's humorous anecdote underlines the serious problems faced by researchers when they push the boundaries. In her case, it was a real concern over how far a bit can travel at the speed of light. I have no idea if that has any bearing on the exascale problem, but it might illustrate the kinds of problems they might be running into.

  • so what? (Score:5, Insightful)

    by markhahn ( 122033 ) on Tuesday May 14, 2013 @01:05PM (#43722489)

    I'm an HPC professional, and do not see much value in these "hero" machines. Yes, you can go on all you want about the march of progress and tier-1 and grand challenges, but you're just reiterating an unquestioned manifest destiny-based view of history. Why do we need an Exaflop machine? is it because some particular set of applications need it? where is the threshold for those applications where the compute facility will be fast enough to achieve some breakthrough?

    it's hard to find areas that are primarily limited by compute facilities. for instance, genetics/proteomics/metabilomics/whatever are *not* compute-limited, especially at the high end. they're laboratory-limited, the same way weather simulations are good and getting better, but not past the quality of their input data.

    we need more compute in general, but not necessarily in one machine. a single exaflop machine will cost much more than a thousand petaflop machines. letting a thousand flowers bloom is much prettier than one excruciatingly beautiful flower...

    and no, hero machines do not provide an efficient way to improve the tech of lesser or later machines. they have to be justified by their own need.

    • you are silly. systems biology modeling of cells will require exascale computing, so will simulations in chemistry of miilions or more atoms for hundredth of a second or more. Lattice simulations for physics are demanding them too.

      • by dkf ( 304284 )

        Systems biology modeling of cells will require exascale computing

        No, it won't because we won't be modeling objects as large as cells at the atomic level. Instead, we will use lots of coupled coarser models, saving the finer ones for parts where "interesting" things are happening (e.g., at membrane interfaces). People are already doing this sort of thing, but at a very coarse scale and with only very limited numbers of fine simulations.

        Of course, I happen to think that the really interesting things happen when you scale up to modeling a whole tissue, or a whole organ, or

    • Re:so what? (Score:4, Insightful)

      by Nite_Hawk ( 1304 ) on Tuesday May 14, 2013 @02:03PM (#43723171) Homepage

      I'm an HPC professional too.

      I don't totally disagree with your premise, but what the heck are you doing talking about genetics and proteomics in reference to giant supercomputers? If you know anything about proteomics codes, you know that the commonly used search engines like sequest and mascot were never designed to run on systems like that. Hell, they barely run on small clusters and yet people are getting enough science done that they just don't care. That doesn't mean that it's hard to find problems that need supercomputers though.

      If you want to talk about the really big systems, you are talking about things like nuclear weapons simulations, astrophysics, molecular dynamics, and quantum mechanics. There are only a handful of guys that will actually make really good use of those systems and scores of folks that would otherwise be perfectly fine running on significantly smaller ones. Having smaller jobs backfill on the big machines when the really hardcore guys are off doing something else isn't such a bad situation though. It lets you get the big science done and still keep the machines being used efficiently in the interim.

      Beyond that, just because some researchers aren't scaling their codes to those levels yet doesn't mean we should give up on big systems. There will always be people pushing the envelop and others playing catch up. Our job is to help the slow guys scale their codes when possible so they can do even better and more intensive science. Yes, not all problems require the big systems, but there are many that do, many that can be made to scale even when they don't appear to at first, and others that can serve as backfill to keep the systems busy. They have their place just as smaller clusters, cloud resources, and big data resources do.

      • If you want to talk about the really big systems ... There are only a handful of guys that will actually make really good use of those systems and scores of folks that would otherwise be perfectly fine running on significantly smaller ones.

        That's what they all say. Don't worry, that's plenty for me! (until next year). Five computers are enough for the world. 640k ought to be enough for anybody. Of course I suppose a logical progression would be to get an Exaflop machine running before figuring out how to make one for the high school science lab.

        • Considering how much more powerful my phone is than supercomputers of 30 years ago, I can only imagine that in 2043 the iPhone 17QX will require multi-petaflop performance to create holographic picture and sound and touch. (And you'll have to hope it is available through Mobile Safari, 'cause Apple still won't allow porn apps.)

          • I can only imagine that in 2043 the iPhone 17QX will require multi-petaflop performance to create holographic picture and sound and touch.

            Considering how faster hardware always seems to lead to less efficient software, it'll probably need 1 petaflop just to flash an LED.

    • Some problems literally can't be parallelised.

      • Te problem itself can't. However sou can solve many problems of the same kind at the same time in parallel. (That actually is what most super computers in our days do)

    • Why do we need an Exaflop machine?

      If you build it, they will come.
      "640K ought to be enough for anyone." etc.

  • by Murdoch5 ( 1563847 ) on Tuesday May 14, 2013 @01:57PM (#43723113) Homepage
    I'm pretty sure at one point, someone stood up in a meeting and said "No one will ever make a 1MB memory chip" or "No one will ever achieve a 64 bit processor", so how about sit down and just wait.
    • I'm pretty sure at one point, someone stood up in a meeting and said "No one will ever make a 1MB memory chip" or "No one will ever achieve a 64 bit processor", so how about sit down and just wait.

      The author of the presentation didn't say we'd never get to Exaflops, just that it might take longer than anticipated. Second, the fact that some technologies have scaled incredibly well doesn't mean that all technologies do or that there are no limits. Chips are perhaps history's greatest example of a technology that scales well. However, we were also supposed to have flying cars and visit Jupiter by 2001. Sometimes the limits are practical rather than strictly technical. SST's were built designed in the 6

    • by Xyrus ( 755017 )

      You seem to be forgetting about the laws of physics. In fact, we are already hitting them. You can't shrink transistors much more or you get slapped with Schrodinger's cat. The interconnects are already using fiber optics. You can only put machines so close to one another. So on and so forth.

      When people have made claims before, it was due to either their idea of market forces or the limits of the current technology. Now, the actual physical limits are beginning to present roadblocks. Even if quantum computi

    • by tsotha ( 720379 )
      I remember sitting in a physics lecture where the professor assured us no computer would ever have more than about ten megabytes of RAM, since stray gamma radiation would cause bits to flip at an unacceptably high rate for larger memory "pools".
  • I don't see anything about this in the PDF, so I'll ask the Hive Mind here:

    How does this affect distributed computing efforts such as Folding@Home and the BOINC project?

    These have very little node-to-server and zero node-to-node communication. With F@H already on the petaFLOP scale I wouldn't think it all that unlikely that it would reach exaFLOP level in less than a decade if interest keeps up.

If you have a procedure with 10 parameters, you probably missed some.

Working...