Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Supercomputing Hardware

Inside Tsubame, Japan's GPU-Based Supercomputer 75

Startled Hippo writes "Japan's Tsubame supercomputer was ranked 29th-fastest in the world in the latest Top 500 ranking with a speed of 77.48T Flops (floating point operations per second) on the industry-standard Linpack benchmark. Why is it so special? It uses NVIDIA GPUs. Tsubame includes hundreds of graphics processors of the same type used in consumer PCs, working alongside CPUs in a mixed environment that some say is a model for future supercomputers serving disciplines like material chemistry." Unlike the GPU-based Tesla, Tsubame definitely won't be mistaken for a personal computer.
This discussion has been archived. No new comments can be posted.

Inside Tsubame, Japan's GPU-Based Supercomputer

Comments Filter:
  • by cashman73 ( 855518 ) on Thursday December 11, 2008 @07:32PM (#26084485) Journal
    Imagine a beowulf cluster of one of these could do! Oh, wait! ;-)
  • On reading the article, the box has 30 thousand cores, of much the vast majority are AMD Opterons in Sun boxes. No mention of how/in what you'd program this to actually put the GPUs to good use.

    • by timeOday ( 582209 ) on Thursday December 11, 2008 @08:14PM (#26084889)

      No mention of how/in what you'd program this to actually put the GPUs to good use.

      That's why the supercomputer rankings are based on reasonably complex benchmarks instead of synthetic "cores * flops/core" types of numbers. Scoring well on the benchmark is supposed to be solid evidence that the computer can in fact do something useful. My question though is whether the GPUs contributed to the benchmark score, or were just along for the ride.

      • by ceoyoyo ( 59147 )

        As I recall, GPUs and other vector type processors do quite poorly on Linpack, so probably not.

        • by lysergic.acid ( 845423 ) on Thursday December 11, 2008 @11:47PM (#26086523) Homepage

          how would data parallelism negatively affect a test that is designed to measure a system's performance in supercomputing applications--a field which is dominated by problems which involve processing extremely large data sets?

          if vector processors do in fact perform poorly on LINPACK benchmarks then that would mean LINPACK performance is not a good indicator of real-world performance, but that clearly isn't the case as vector processors consistently perform quite well in LINPACK suite measurements [hoise.com].

          vector processing began in the field of supercomputing, which during the 1980's and 1990's were essentially the exclusive realm of vector processors. it wasn't until companies, to save money, started designing & building supercomputers using commodity processors (P4s, Opterons, etc.) that general-purpose scalar CPUs began to replace specialized vector processors in high-performance computing. but now companies like Cray and IBM [cnet.com] are starting to realize that this change was a mistake.

          even in commodity computing the momentum is shifting away from general-purpose scalar CPUs towards specialized vector coprocessors like GPUs, DSPs, array processors, stream processors, etc. when you're dealing with things like scientific modeling, economic modeling, engineering calculations, etc. you need to crunch large data sets using the same operation; this is best done in parallel using SIMD. using specialized vector processors (and instruction sets) you can run these applications far more efficiently than you could using a scalar processor running at much higher clock speeds. the only downside is that you lose the advantage of using commodity hardware that's cheap because of their high volume production. but if companies like Adobe start developing their applications to employ vector/stream coprocessors, then that will boost the adoption of these vector processors in the commodity computing market, which will increase production volume and lower manufacturing costs.

    • by raftpeople ( 844215 ) on Thursday December 11, 2008 @08:49PM (#26085221)

      On reading the article, the box has 30 thousand cores, of much the vast majority are AMD Opterons in Sun boxes. No mention of how/in what you'd program this to actually put the GPUs to good use

      You may want to read the article again, if not here's a recap:
      655 Sun Boxes each with 16 AMD cores=10,480 CPU cores
      680 Tesla Cards each with 240 processors=163,2000 GPU processors

      As for how to use the GPU's, I use my GTX280 (almost same thing as Tesla) to crunch through lots of numeric calculations in parallel. I'm sure these guys are doing the same thing as that is the strength of the GPU. NVIDIA has made it easier to access the processing power of the GPU with CUDA. You create a program in C that gets loaded on the GPU and when you launch it you can tell it how many copies to run at one time, each one typically operates on a different portion of the data. Because you can launch more threads than there are processors, the GPU can be reading data in from global vid mem while other threads are performing calculations.

      • This makes plenty of sense. I've personally dealt with several IBM BlueGene supercomputers (more than 200,000 cores) that didn't perform near this well.

        The GPUs definitely made a huge difference in this case.
  • Clever name (Score:5, Funny)

    by subStance ( 618153 ) on Thursday December 11, 2008 @07:48PM (#26084657) Homepage

    Ironic name: tsubame means sparrow in japanese, and also has the slang usage of toy-boy (as in a cougar's toy-boy).

    Not sure what to read into that ...

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Tsubame is actually 'swallow', not 'sparrow', which is suzume.

      • Re: (Score:3, Funny)

        by TeknoHog ( 164938 )

        Tsubame is actually 'swallow',

        Is that an African, a European, or an Asian swallow?

    • by CODiNE ( 27417 )

      I'm imagining Pirates of the Caribbean in Japanese... featuring the lovely Captain Jack Boy-Toy. Fitting.

    • by Saffaya ( 702234 )

      Tsubame is also a female first name. And a nice one at that.
      No need to dig further than that imo.

    • as in a cougar's toy-boy

      You say that as though we're supposed to know what it means...

      • Cougar is an American idiomatic term for a sexually active older woman who actively looks for younger males. "Toy-boy" is usually written as boy-toy, and refers to those young males who are selected spefically for sexual fun.
  • What is a GPU? (Score:3, Interesting)

    by hurfy ( 735314 ) on Thursday December 11, 2008 @08:03PM (#26084783)

    When it has no graphics out? It is still a GRAPHICS Processing Unit when it doesn't calculate any graphics and doesn't display any graphics. HUH? ;)

    They have a whole lot of these boosting a whole lot of quad-cores.

    • Comment removed based on user account deletion
    • by mikael ( 484 )

      They want the GPU's for their number-crunching ability. Since each GPU would be working on a small portion of the simulation being processed, you are going to need a separate system to fetch whatever item of data you want to visualize. This system is going to have to talk to every GPU in order to this data and render it.

    • I thought of GPUBAP (GPU-based auxiliary processor), but that seems unwieldy. Maybe matrix-oriented processor [wikipedia.org] (MOP) or vector+matrix-oriented processor (VMOP)? I dunno.
  • I think it's only a matter of time before many of these clusters will start using all processing power available to them, hell, even desktops and whatever app you build should detect, and use your GPU! If compilers were to get even smarter, they could automatically route pieces of code that include calculations the GPU could do faster to the GPU, and otherwise just use one of the other cores available. This *should* be the future imo.
    • Nvidia/ati and a bunch of others just built an open spec (library?) that will allow this to happen

    • Re:Ofcourse (Score:5, Informative)

      by dgatwood ( 11270 ) on Thursday December 11, 2008 @08:27PM (#26085017) Homepage Journal

      Indeed, that's the whole idea behind the recently ratified OpenCL [wikipedia.org] specification. Design a C-like language that provides a standard abstraction layer for the ability to perform complex computations on a CPU, GPU, or conceivably on any number of other devices lying around (e.g. idle I/O Processors, the DSP core in your WinModem, your printer's raster engine...).

      • by maxume ( 22995 )

        I thought the whole point of a winmodem was that there wasn't a DSP in it (and that junky printers don't have raster engines, it's in the driver).

        • by bucky0 ( 229117 )

          Wow, that nit you managed to pick is tiny.

          • by maxume ( 22995 )

            I'm not nit picking. The other post is pretty confident in what it says, so I'm actually curious if I am misinformed.

        • You're right, WinModems don't have DSPs. I don't know about printers without rasterizing engines being junky, some may be. I haven't heard much about this issue lately. Frankly, I don't know if some of my printers have them or not. I know I have one that supports PCL 6, but it was a high end business printer when it was new. DSPs can be a bit expensive, so it can make some printer tech more affordable. I think the main objection now might be that they didn't support a printing standard, so there was o

          • Comment removed based on user account deletion
            • by Tycho ( 11893 )

              On the other hand there is the DSP core in Creative X-Fi cards (not that anyone should own one). Modern TV tuner cards have MPEG-2 encoding units, these must be worth something. Higher end, professional video hardware like HD video capture cares and real-time video effects rendering cards often have Xilinx FPGA, most of which probably have a built in POWER CPU core. In this case, the CPU and the programmability of the FPGA are useful. Actually useful SATA RAID cards that support RAID 5 and RAID 6, like

          • by dgatwood ( 11270 )

            Perhaps the Winmodem thing was a poor example, but according to this post [osdir.com], some of them do have DSP hardware, but lack a hardware UART. Whether that poster was correct or not, I'm not sure, but that is consistent with my vague memory on the subject. In any case, that's straying pretty far from the subject at hand. :-)

  • The missing numbers (Score:3, Informative)

    by Anonymous Coward on Thursday December 11, 2008 @09:16PM (#26085447)

    just to get a perspective, the GPUs provide about 10 out of 77 TFLOPs benchmarked in LINPACK HPC article [sun.com]

  • ATI's latest cards give more punch for the cost apiece. and they are designed specifically for being clustered/linked/xfired and whatnot.
    • by Jeff DeMaagd ( 2015 ) on Thursday December 11, 2008 @09:47PM (#26085737) Homepage Journal

      ATI's latest cards give more punch for the cost apiece. and they are designed specifically for being clustered/linked/xfired and whatnot.

      I thought the nV Teslas were designed for HPC.

      Performance going up, cost going down happens so quickly something like that can easily happen between the time it's ordered and the time it's installed.

    • Re: (Score:3, Insightful)

      by Molochi ( 555357 )

      They could do it cheaper with anything at the current price. However, this wasn't just slopped together last month with the latest hardware off newegg.

      No doubt, there's a SC being built up right now around all the latest AMD parts. By the time it gets benchmarked, we'll be able to complain that something else is a better deal.

  • a Tesla comes in two form factors, a pci express card or a rack mount 1U system that contains 4 of the tesla cards and connects to a server or cluster node with two pci e cards. Not sure how you could confuse that with a PC. Also, I was just ad a conference with the gentleman in charge of Tsubame, and if I recall correctly they had some of the 1U tesla systems in the cluster, although they may have used high end graphics cards too - they may have only had a limited number of the rack mount tesla systems f
  • by marciot ( 598356 ) on Thursday December 11, 2008 @09:47PM (#26085739)
    What makes a supercomputer *a* supercomputer, as opposed to a network of not-necessarily-super computers which all happen to be in the same building and connected to the same high-speed network? By the way this is described, it certainly seems to be a network of many computers working together, rather than one single almighty computer.
    • It's just super, thanks for asking!
    • by mcrbids ( 148650 )

      Well, IANASE (Supercomputer Expert) but I *am* a programmer....

      I'm assuming that you have a supercomputer when all those otherwise individual computers are working together in a coordinated fashion on a common problem.

      A great example of a supercomputer is SETI @ Home [berkeley.edu] which easily meets the definition of a "supercomputer" in many (most?) circles, although they usually refer to it as "distributed computing".

      • by nerk88 ( 204690 )

        The usual distinction between a supercomputer (that may be a cluster) and distributed computing is that in a supercomputer, all the individual computers are under central control. In a distributed computing environment you control your computer and provide resources to someone else's cluster.

        The difficulty arises because so many people use similar phrases for slightly different things. You can argue that the second you have more than one processor you are in a 'distributed' computing environment as you are

    • Re: (Score:3, Informative)

      by dlapine ( 131282 )

      Wikipedia claims that a supercomputer "is a computer at the forefront current processing capability" http://en.wikipedia.org/wiki/Supercomputer/ [wikipedia.org]. The top500 list implies that a supercomputer is a system that can run Linpack really fast, while noting that the system must also be able to run other applications. http://www.top500.org/project/introduction [top500.org]

      Given that NCSA has run many supercomputers over the years, and that I've personally run three while working there, I'd say that a good rule of thumb is that

  • Can I run Crysis now?
    • According to TFA, the whole point of this experiment was to see if it was possible to run Crysis at the highest settings at maximum resolution with FSAA and anisotropic filtering.
  • With this much computing power one should be able to take advantage of higher math to determine when the optimal times are to invest in the stock market to take advantage of trends. Unfortunately, since things are headed downward, this technology can be used most efficiently to help you lose money at the optimum rates. Actually I have an idea, but have to work out with CUDA more before I know if it is real. Unfortunately trying to put these cards in Mac Pros is problematic. You would think Apple would have
    • Successful traders don't use higher maths. You can't beat the market with maths, because the market is a complex adaptive system that cannot be predicted. You can, however, find out some likely scenarios using your insight, which is what successful traders use.
      • Actually I was trying to make a sort of joke, but I thank you for telling me what the technical term was for the kind of system the stock market it. I didn't know that and found that interesting. I had this idea that a polynomial with as many terms as there are stocks could be created with a core for each stock... and in some way might be useful. An approach that could only be taken practically with hardware of an unusually parallel nature. But limitations in my understanding of math and statistics keep me

news: gotcha

Working...