Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Intel Supercomputing Hardware Linux

Cray Unveils XC30 Supercomputer 67

Nerval's Lobster writes "Cray has unveiled a XC30 supercomputer capable of high-performance computing workloads of more than 100 petaflops. Originally code-named 'Cascade,' the system relies on Intel Xeon processors and Aries interconnect chipset technology, paired with Cray's integrated software environment. Cray touts the XC30's ability to utilize a wide variety of processor types; future versions of the platform will apparently feature Intel Xeon Phi and Nvidia Tesla GPUs based on the Kepler GPU computing architecture. Cray leveraged its work with DARPA's High Productivity Computing Systems program in order to design and build the XC30. Cray's XC30 isn't the only supercomputer aiming for that 100-petaflop crown. China's Guangzhou Supercomputing Center recently announced the development of a Tianhe-2 supercomputer theoretically capable of 100 petaflops, but that system isn't due to launch until 2015. Cray also faces significant competition in the realm of super-computer makers: it only built 5.4 percent of the systems on the Top500 list, compared to IBM with 42.6 percent and Hewlett-Packard with 27.6 percent."
This discussion has been archived. No new comments can be posted.

Cray Unveils XC30 Supercomputer

Comments Filter:
  • by Jeremiah Cornelius ( 137 ) on Thursday November 08, 2012 @05:01PM (#41924097) Homepage Journal

    It's no Cray, unless it also doubles as stylish atrium furniture.

  • by Anonymous Coward

    That's almost enough to run Vista

    • by Anonymous Coward

      So then not enough left to play Crysis 2?

      • by Anonymous Coward

        But a beowulf cluster of those ALMOST could run it!

    • Yeah, if they are based on Xeon, shouldn't Vista run on them?
  • They've released the output of a raytracer, and little more by the looks of it.

    Things that don't exist are not "capable" of anything. (Well, unless you're of a religious persuasion...)
    • by suso ( 153703 ) *

      I'll be revealing my supercomputer that has finally broken the exaflop barrier in about an hour. (opens Blender)

  • While the article says they 'unveiled' it, it doesn't give any information about the hardware at all. I'm guessing it hasn't actually been built yet. Too bad. The Top 500 Supercomputers list is due to be updated this month.

    • Re: (Score:3, Informative)

      by whistl ( 234824 )

      The Cray website (http://www.cray.com/Products/XC/XC.aspx) has more details. 3072 cores (66 Tflops) per cabinet, initially, and the picture make it look like they have 16 cabinets, making 49152 cores total. Amazing.

      • by Desler ( 1608317 )

        They'll need more than 16 if this is a 100 petaflop computer. So either you are looking at the wrong machine or there's a typo somewhere.

        • The statement is that the xc30 can _make it_ to 100 PF. Nobody will build a 100 PF machine (i.e., 1600 cabinets, 8x more than Jaguar) with this product line, there will be upgrades before then. 32k sq ft of machine room space and cooling is too expensive.
    • by godrik ( 1287354 )

      Actually Super Computing is next week. So the ranking will be available probably on monday!

  • Damn (Score:2, Funny)

    by Anonymous Coward

    That shit cray.

  • by michaelmalak ( 91262 ) <michael@michaelmalak.com> on Thursday November 08, 2012 @05:37PM (#41924635) Homepage

    In November, 2001, the fastest supercomputer was 12 TFlops [top500.org]. You can achieve that today for less than $5,000 on your desktop by ganging together four GPGPU cards (such as the 3 TFlops Radeon 7970 for less than $500 each). Go back to 1999 and it's only 3 TFlops and to match today you wouldn't even need a special motherboard.

    So just wait 11 years for the prices to come down.

    • by Anonymous Coward

      Supercomputers measure double precision FLOPS while the GPGPU vendor cheat and report single precision. And that doesn't take into account the ugly "kernel" programming needed for GPGPU and memory synchronization.

      • by michaelmalak ( 91262 ) <michael@michaelmalak.com> on Thursday November 08, 2012 @06:13PM (#41925025) Homepage

        Supercomputers measure double precision FLOPS while the GPGPU vendor cheat and report single precision.

        Ah, OK, Radeon is then 1 TFlop [rpi.edu] for double precision (which is new to the Radeon). So four Radeon 7970's beat the top 1999 supercomputer.

        • by bws111 ( 1216812 )

          Except that 1999 supercomputer was capable of doing real work. You have 4 fast GPUs sitting in a box, doing nothing. What is feeding them work, coordinating their inputs/outputs, etc? That is where all the hard work is.

          • What is feeding them work, coordinating their inputs/outputs, etc? That is where all the hard work is.

            OpenCL uses C99. It's tricky, maybe even "hard", but far from impossible.

            • by bws111 ( 1216812 )

              What I meant was, once you add in all the overhead of scheduling work, passing messages etc, you will find that you are running at a much slower speed than the raw speeds of the GPUs would have you believe. A GPU waiting for work, or memory access, or IO, or whatever is running at 0 FLOPS, regardless of how fast the processor is capable of running. If you can't keep those 4 GPUs running full speed doing actual work at all times, you have nothing near a 3 TFLOPS machine.

              • once you add in all the overhead of scheduling work, passing messages etc, you will find that you are running at a much slower speed than the raw speeds of the GPUs would have you believe

                Would you happen to know how that compares to real supercomputers?

                I don't have any first-hand experience with supercomputers -- only hearing about and reading about that they also struggle against Amdahl's law.

                • Re: (Score:1, Informative)

                  by Anonymous Coward

                  Well with supercomputers, the benchmark in the TOP 500 is LINPACK. Which will spit out the amount of double precision FLops. The theoretical performance is Ghz*cores*floating point ops/cycle = GFLops. thats in Gflops. A regular supercomputer with CPUs should never be below 80% of the maximum theoretical performance, if it is, something is wrong. A well tuned CPU cluster can get over 95% of the theoretical performance, a well tuned GPU cluster around 60%.
                  Staying with a small scale ( 12 TFLops ), a real clu

              • by Meeni ( 1815694 )

                Correct in general, but extensive research in the last 5 years has lead to many production codes today. GPU accelerators can indeed live to (most) of their promises, and would typically reach 55 to 70% of peak in typical deployments (Tian-he is a good example ~55% efficient). Top notch designs can extract as good as 85% of peak in LINPACK, that is obtained by Sequoia, unvailed last year. We'll see how Titan will fare, its the new Supercomputer GPU giant, that will be announced this year to replace the Jagua

          • The linpack yield of current generation GPU clusters is about 50% [mcorewire.com]. So while your point is valid, "doing nothing" is a rather large exaggeration. For that matter, 50% is the yield on a cluster, so the yield on a single-bus machine is almost certainly higher.


            From the following, it sounds like 1 Teraflop - not theoretical, but on Linpack - Is available on a desktop [gfxspeak.com], now or very soon:

            Intel has been working hard on its many-integrated core (MIC), which it describes as a 50+ core capable of one teraflops rea

            • by bws111 ( 1216812 )

              I stand by what I said, although maybe I worded it poorly. I did not mean that the config he proposed was uncapable of doing work. I meant that the only way to achieve the speeds he is talking about is by doing no work (in other words, not benchmarking, just going by what the box says).

    • by etash ( 1907284 )
      wrong. top500 measures double precision performance, not single.
    • The ugly trick is interconnect performance, unless you aren't planning to scale up very much at all or have the (atypical) good fortune to be attacking nothing but hugely parallel problems.

      It's been a while since the supercomputer crowd found rolling their own esoteric CPUs to be worth it(with POWER the possible exception); but if all the silicon you want to devote to the problem won't fit on a single motherboard, you quickly enter the realm of the rather specialized.

      At very least, you are probably looking

      • At very least, you are probably looking at doing some networking as or more costly than a 10GbE setup

        There is no networking involved in a four-Radeon setup, just a special rackmount motherboard that has a dozen PCIe slots (because each Radeon is triple-width physically).

    • The Top500 reports actual performance as measured with LINPACK, hardware vendors report the theoretical performance of their chips, which in the case of GPUs is often quite a bit more than you'd be able to squeeze out with LINPACK.

      For comparison: Tsubame 2.0 consists of 1400 nodes with approx. 4200 NVIDIA Tesla C2075, which should yield -- according to your estimate -- 2.1 PFLOPS (4200 * 0.5 TFLOPS [nvidia.com]), yet it is listed at 1.2 PFLOPS [top500.org]. So just add two years to your estimate and you should be fine...

  • XC30 (Score:5, Funny)

    by Cid Highwind ( 9258 ) on Thursday November 08, 2012 @06:13PM (#41925033) Homepage

    "Originally named 'Cascade'" ... and now named for a midsize Volvo.

    It might not be the fastest supercomputer in the world, but at least it'll be safe.

    • by Anonymous Coward

      The Cray product may also be faster than the Volvo product!

  • But does it run linux?

    Imagine a beowulf cluster of those.

    And not first post!

  • That it runs Windows 8 nearly-acceptably

  • For a company with a market cap of less than half a billion to have made 1 in 20 of the Top500 is an extraordinary achievement. IBM -> $215 billion, HPQ ->$27 billion

    • Now lets asks how much power this computer will need? Lets say it can do a billion flops per watt. 100 petaflops is 100,000 trillion flops. A trillion flops is 1000 billion flops so a trillion flops is 1000 watts at a billion per watt. So 100,000 trillion flops would 100 million watts. So lets hope they can do at least 50 billion flops per watt so that would mean 20 watts per trillion flops or 2 million watts. At 10 billion flops per watt would mean 5 times that or 10 million watts. Now lets assume t
      • by Meeni ( 1815694 )

        $500 million is aprox. the entire budget over the lifetime of the computer (including the electric bill, which is becoming increasingly the dominant cost to amortize). Typical build cost is around $100M.

        However, there is a false dichotomy in your comparison. The supercomputer is not designed to perform the job of 1 billion workstations. It is designed to perform a single task that could not be done on another machinery. Just like you cannot build a supertanker in a million bathtubes but need a shipyard, you

    • by Anonymous Coward

      It's not really an achievement but a business model :-) they have 17% of the top 100, that is just their "sweet spot"
      Cray is the Ferrarri of Computing ....

  • Why does Cray still stick to Xeons? This would have been a perfect application for Itanium III, and they would have hit their petaflop goals easier

An elephant is a mouse with an operating system.