Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Supercomputer Built With 8 GPUs

Posted by kdawson on Sat May 31, 2008 12:22 PM
from the let-the-games-begin dept.
FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • I guess... (Score:4, Funny)

    by LordVader717 (888547) on Saturday May 31 2008, @12:26PM (#23610779)
    They didn't have enough dough for 9.
  • Re-birth of Amiga? (Score:5, Interesting)

    by Yvan256 (722131) on Saturday May 31 2008, @12:30PM (#23610811) Homepage Journal
    Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?
    • by Quarters (18322) on Saturday May 31 2008, @01:03PM (#23611113)
      The Amiga design was, essentially, dedicated chips for dedicated tasks. The CPU was a Motorola 68XXX chip. Agnus handled RAM access requests from the CPU and the other custom chips. Denise handled video operations. Paula handled audio. This cpu + coprocessor setup is roughly analogous to a modern X86 PC with a CPU, northbridge chip, GPU, and dedicated audio chip. At the time the Amiga's design was revolutionary because PCs and Macs were using a single CPU to handle all operations. Both Macs and PCs have come a long way since then. 'Modern' PCs have had the "Amiga design" since about the time the AGP bus became prevalent.

      nVidia's CUDA framework for performing general purpose operations on a GPU is something totally different. I don't think the Amiga custom chips could be repurposed in such a fashion.

      • by porpnorber (851345) on Saturday May 31 2008, @04:28PM (#23612551)
        I think the parent was seeing the same situation a little differently. You ever code up Conway's Life for the blitter? Whoosh! Now CUDA does floating point where the Amiga could only do binary operations, and the GPU has a lot more control onboard, but the analogy is not unsound. After all, CPUs themselves didn't even do floating point in the old days (though of course they did do narrow integer arithmetic).
  • by arrenlex (994824) on Saturday May 31 2008, @12:34PM (#23610843)
    This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?
  • by sticks_us (150624) on Saturday May 31 2008, @12:36PM (#23610863) Homepage
    Ok, probably a paid NVIDIA ad placement, but check TFA anyway (and even if you don't read, you gotta love the case). It looks like heat generation is one of the biggest problems--sweet.

    I like this too:

    The medical researchers ran some benchmarks and found that in some cases their 4000EUR desktop superPC outperforms CalcUA, a 256-node supercomputer with dual AMD Opteron 250 2.4GHz chips that cost the University of Antwerp 3.5 million euro in March 2005...

    ...and at 4000EUR, that comes to what (rolls dice, consults sundial) about $20000 American?
      • Re: (Score:3, Funny)

        by Anonymous Coward
        WHOOOOOSH - over your head it went.
  • Tomography (Score:5, Informative)

    noun a technique for displaying a representation of a cross section through a human body or other solid object using X-rays or ultrasound.


    In other news Graphics cards are good at . . . graphics.

    • Re:Tomography (Score:5, Insightful)

      by jergh (230325) on Saturday May 31 2008, @01:01PM (#23611093)
      What they are is doing is reconstruction, basically analyzing the raw data data from a tomographic scanner and generating a representation which can then be visualized. So its more doing numerical methods than graphics.

      And BTW even rendering the reconstructed results is not that simple, as current graphics card are optimized for geometry, not volumetric data.
  • by bobdotorg (598873) on Saturday May 31 2008, @12:39PM (#23610891)
    ... 3D Realms announced this as the minimum platform requirements to run Duke Nuke'em Forever.
  • Finally... (Score:5, Funny)

    by ferrellcat (691126) on Saturday May 31 2008, @12:40PM (#23610901)
    Something that can play Crysis!
  • by poeidon1 (767457) on Saturday May 31 2008, @12:41PM (#23610917) Homepage
    this is an example of acceleration architecture. Anyone who have used FPGAs knows that. Ofcourse, making sensational news is a too common thing on /.
  • by gweihir (88907) on Saturday May 31 2008, @12:44PM (#23610957)
    It is also not difficuult to find other tasks where, e.g., FPGAs peform vastly better than general-purpose CPUs. That does not make an FPGA a "Supercomputer". Stop the BS, please.
    • Re: (Score:3, Interesting)

      aren't most of the supercomputers designed to perform some very specific tasks ? You don't buy a supercomputer to run the Super edition of Excel.
  • Brick of GPUs (Score:5, Interesting)

    I love this picture: http://fastra.ua.ac.be/en/images.html [ua.ac.be]

    Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
    • They spent 4000 EUR for the computer, but use two boxes in order to situate the monitor higher. I guess they spent everything they had on the computer.
  • by bockelboy (824282) on Saturday May 31 2008, @12:54PM (#23611043)
    Wave of the Future? Yes*. Revolution in computing? Not quite.

    The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.

    The researchers took a specialized piece of hardware, rewrote their code for it, and found it was faster than their original code on generic hardware. The problems here are that you have to rewrite your code (High Energy Physics codebases are about a GB, compiled... other sciences are similar) and you have to have a problem which will run well on this scheme. Have a discrete problem? Too bad. Have a gigantic, tightly coupled problem which requires lots of inter-GPU communication? Too bad.

    Have a tomography problem which requires only 1GB of RAM? Here you go...

    The standard supercomputer isn't going away for a long, long time. Now, as before, a one-size-fits-all approach is silly. You'll start to see sites complement their clusters and large-SMP machines with GPU power as scientists start to understand and take advantage of them. Just remember, there are 10-20 years of legacy code which will need to be ported... it's going to be a slow process.
    • by 77Punker (673758) <spencr04 AT highpoint DOT edu> on Saturday May 31 2008, @01:03PM (#23611109)
      Fortunately, Nvidia provides a CUDA version of the basic linear algebra subprograms, so even if your software is hard to port, you can speed it up considerably if it does some big matrix operations, which can easily take a long time on a CPU.
    • Re: (Score:3, Interesting)

      The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.

      Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell

  • The price ! (Score:3, Funny)

    by this great guy (922511) on Saturday May 31 2008, @01:48PM (#23611427)
    The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than 4,000 EUR to build

    What's more crazy: calling something this inexpensive a supercomputer, or 4 video cards costing a freaking 4,000 EUR.
  • by Chris Snook (872473) on Saturday May 31 2008, @02:56PM (#23611977)
    I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.
    • by Anonymous Coward on Saturday May 31 2008, @12:29PM (#23610793)

      By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?
      Looking at TFS the benchmark of their own tomography code taking a couple of hours instead of weeks.
      • by cheier (790875) on Saturday May 31 2008, @02:04PM (#23611545)
        Too bad this isn't really news. I guess it is news if you consider that someone else has had their application accelerated by NVIDIA GPUs. I guess the only other reason that this could be news is by virtue of having 8 GPU cores.

        Unfortunately, this setup won't work ideally for a lot of other CUDA based applications. For the past 6 months, I had a system with 6 GPUs (actual physical GPUs). This is the system that I showed at CES [ocia.net]. We are easily able to do 8 physical GPUs, and now I've been solely focused on utilizing Tesla.

        Given that NVIDIA released the GX2 series, I was not surprised that someone would announce an 8GPU system. I'm surprised it took this long for someone to do it, and almost equally surprised that slashdot took this long to publish any news that is decent in the realm of GPU super computing. I've been cranking out close to 228 billion atom evals. per second in VMD [uiuc.edu] for months now, versus about 4 billion on dual quad core 3.0GHz Xeons.
    • by cromar (1103585) on Saturday May 31 2008, @12:33PM (#23610839)
      I am guessing it has something to do with floating point calculations vs. integer calculations, but if I read the article, this wouldn't be Slashdot, would it? Think about it. We have GPUs to perform vector maths, flops, etc. because the CPU is not all that great at that sort of thing typically. A general purpose CPU is not necessarily going to be the fastest if your problem domain is more suited to an "inferior" chip; general purpose CPUs are not designed to be the fastest chip in every situation.
      • by 77Punker (673758) <spencr04 AT highpoint DOT edu> on Saturday May 31 2008, @12:46PM (#23610985)
        The GPU's are better at floating point than integer; if I remember correctly it takes 4 cycles on current GPU's to do a float operation, but it takes 16 to do an int. No, I don't understand why.

        Also, the "multiply" and "add" instructions exist in a "madd" opcode which essentially doubles the theoretical floating point performance, even if you don't use "madd" very often.
        • by Calinous (985536) on Saturday May 31 2008, @01:54PM (#23611471)
          Because floating point operation goes on a dedicated path, while the integer operations does not have a dedicated integer-only path.
          Also, it's possible that loading floating points operands and storing results in actual code can be pipelined, while integer operations are not pipelined.
            (and yes, I don't know what I'm talking about)
        • by AlecC (512609) <aleccawley@gmail.com> on Saturday May 31 2008, @04:30PM (#23612559) Homepage
          It takes 4 cycles to do a floating point operation, and 4 cycles to do an integer add/subtract. It takes 16 cycles to do an integer multiply because it only has a 24-bit hardware multiplier (needed to achieve the 4-cycle flops, so it has to do long multiplies as four madds, This was for the first generation CUDA CPUs; the second generation, which should be out by now, was going to have double length floating and would be able to do 32 bit multiplies in the same four cycles.

          While they can do integer, these machines are not very happy with it, and I found it much easier to do everything in floating point, even if you are talking about 8-bit colour data. It goes no slower, and everything is much better adapted to floating point. Then there are special instructions to get back to integer at the output.

          While each operation takes 4 cycles, they are fully pipelined, so that it launched a new instruction per cycle, times 32 pipes per unit, times 8 units per GPU.

          And madd is very useful for the sort of tasks for which supercomputers are traditionally used.
      • Re: (Score:3, Insightful)

        When you get into inverting matricies, or doing matrix vector multiplication the algo is very easily in parallel, but I always wonder where the full matrices live. i.e. they could easily be tens of GBs of matrix, so the CPU would seem to have to be heavily involved as well.
      • by pablomme (1270790) on Saturday May 31 2008, @01:33PM (#23611319)
        As far as I know, GPUs are amazingly fast at matrix operations and other things allowing vectorized evaluation. I guess these tomography applications must make massive use of these. After all, tomography is in essence image processing..
        • by Calinous (985536) on Saturday May 31 2008, @02:04PM (#23611543)
          Even more: if you don't optimize the code specifically for the GPU-based supercomputer, your performance goes down the drain. I wouldn't be surprised if they obtained a speedup of an order of magnitude or more from the aggressive code optimisation.
                The idea is: the original code would run faster on a 8 Core2Duo machine than on the 8 GPUs. Even more optimising of the code will do little for the Core2Duos, due to limited memory bandwidth, FSB bandwidth, and so on.
                Meanwhile, optimising a pipelining sistem (load, compute, store) in the GPU would be greatly improved by huge bandwidth (50GB/s on current systems), huge number of computation units (128 or more) and so on.
        • by tdelaney (458893) on Saturday May 31 2008, @04:57PM (#23612723)
          Sure - but at 4000 euros, you can afford to do a one-off purchase and write custom software for a limited application. The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.
          • The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.

            No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a w
    • by symbolset (646467) on Saturday May 31 2008, @12:35PM (#23610859) Journal

      By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?

      By the benchmark that they solve the particular problem of this specific application in 1/300th of the time?

      • by Jaime2 (824950) on Saturday May 31 2008, @01:16PM (#23611199)
        I think the GP (and myself) were objecting to the use of the fairly general word "power" and the use of this one problem as a "power benchmark". While it is obviously true that 8GPUs is as fast as 300 C2Ds for this problem, this system isn't as fast as a supercomputer for most problems. All this does is point out that the recent trend of building supercomputers out of inexpensive general purpose CPUs may not be a good idea for all applications.
        • by symbolset (646467) on Saturday May 31 2008, @01:40PM (#23611379) Journal

          All this does is point out that the recent trend of building supercomputers out of inexpensive general purpose CPUs may not be a good idea for all applications.

          And... a screwdriver is not always a prybar. A tool's a tool - they have preferred usage but if your requirement is specific and you're creative enough, you can do some fine work outside of the tool's intended purpose. Like this guy. Kudos to him.

          Perhaps some more creative people finding this information will now discover if their specific requirements can be met by this interesting configuration. That will save them large quantities of cash or possibly enable some facility that was not previously available because supercomputers cost a grip-o-cash.

          Of course for general purpose supercomputing you would want to use modified PS3s [wired.com].

        • by mcrbids (148650) on Saturday May 31 2008, @06:44PM (#23613417) Journal
          Which is faster? A Lamborghini or a 5-ton flatbed truck?

          Depends on what you're after! If you are trying to get yourself from point A to point B, the Lamborghini is the obvious choice. But if you need to move 4.5 tons of stuff from point A to point B, the Lamborghini would suck ass when compared to the flatbed truck.

          It's just a question of what you are trying to accomplish. There is no absolute framework for "power" to solve problems, even if you define it fairly narrowly. For example, let's talk about 'pattern matching': A free database (like PostgreSQL) on cheap hardware can search through millions of records to deliver a query result in a tenth of a second. In that respect, Postgres is WAY faster than, say, the human brain. But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.

          Use the right tool for the job, and you'll be amazed at the results. That 8 GPUs handily outperform 512 CPU cores at a specific task is not surprising - the GPUs are designed from the beginning to solve the kind of problem that's needed!

          Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive?
      • Re: (Score:3, Insightful)

        Please, please, please do the math.

        8 GPUs are being compared to 300 CPUs. So the single GPU for this pupose isn't 300 times as powerful as the CPU.

        It is doing the operation in 1/37th the time approximately. This isn't news or unbelievable. GPUs are dedicated to performing certainly types of tasks far better than a CPU.
    • by 77Punker (673758) <spencr04 AT highpoint DOT edu> on Saturday May 31 2008, @12:37PM (#23610879)

      By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?
      By any SIMD problem. For reference, fire up a game that's capable of using a software renderer and do some sort of benchmark, then use the 3D hardware on the same benchmark. That's the difference between SIMD on hardware that is designed to do SIMD and SIMD on hardware that's designed to do everything (or in the case of the Duo, multitasking).
      • Re: (Score:3, Interesting)

        Just to expand on this stuff: Different tools are (obviously) designed for different workloads. I have a project I was contemplating porting to the Cell. Unfortunately only 40% of my performance bottleneck could take advantage of SIMD, but that 40% could have taken advantage of an enormous number of SIMD instructions just like the workload from TFA.

        The other critical 40% of my project would have gained absolutely nothing from SIMD and on the Cell would have lost time due to branches. In this case 300
      • by Xyrus (755017) on Saturday May 31 2008, @11:25PM (#23614693) Journal
        You're being overly simplistic.

        In order to utilize this "super computer", your problem has to be refactored in such a way that it can utilize the hardware efficiently. This can be either be fairly easy or incredibly difficult depending on the problem, tool-set available, etc. .

        Their benchmark is good for them, but it is most likely meaningless to the general super-computing community. Porting something like LINPACK over and running that as a benchmark however would give a whole lot more insight into what kind of performance boost a typical scientific app might gain from said hardware.

        Nice to see someone utilizing this functionality though.

        ~X~
    • by hansraj (458504) * on Saturday May 31 2008, @12:39PM (#23610889)
      As far as my understanding goes, comparing a GPU's performance to a CPU's performance is very very task dependent and the comparison with 300 CPUs should not be taken to mean that a 8GPU system is more powerful than a 300 core duo system in general.

      If the application requires solving a small task many times over and over and all of these tasks can be done in parallel then using a GPU works great because a GPU has many cores each of which can handle a simple routine. Also the GPU is designed to spend very little time on the way code is hadled (load, switch etc) and spend more time actually running the code (hence the requirement of only very simple functions).

      Such problems frequently arise in tomography, physics, astronomy etc and I hear GPUs are a great success in these areas. But don't hold your breath for running your favorite distro blazingly fast using GPUs.
      • by TheThiefMaster (992038) on Saturday May 31 2008, @02:23PM (#23611701)
        The 9800 GX2's GPUs have 128 1.5GHz "shader processors". 8 of these is like having 1024 vector-processing-specialised processor cores at your command.

        I could easily believe that it performed comparably to 300 2.4GHz Core 2 Duos (aka 600 "over 1.5x faster but not vector-specialised" cores).

        Theoretical performance is 576 GFLOPS per 9800 GX2 GPU (4.608 TFLOPS total) vs 19.2 GFLOPS per Core 2 CPU (5.760 TFLOPS total). However in tests the Core 2 gets as low as 6 GFLOPS instead of it's 19 theoretical, and the 9800 GPU gets a lot closer to it's full power.