Catch up on stories from the past week (and beyond) at the Slashdot story archive

NVIDIA's $10K Tesla GPU-Based Personal Supercomputer 236

Posted by timothy on Sunday November 23, 2008 @04:25AM from the plugs-into-standard-power-strip dept.

gupg writes "NVIDIA announced a new category of supercomputers — the Tesla Personal Supercomputer — a 4 TeraFLOPS desktop for under $10,000. This desktop machine has 4 of the Tesla C1060 computing processors. These GPUs have no graphics out and are used only for computing. Each Tesla GPU has 240 cores and delivers about 1 TeraFLOPS single precision and about 80 GigaFLOPS double-precision floating point performance. The CPU + GPU is programmed using C with added keywords using a parallel programming model called CUDA. The CUDA C compiler/development toolchain is free to download. There are tons of applications ported to CUDA including Mathematica, LabView, ANSYS Mechanical, and tons of scientific codes from molecular dynamics, quantum chemistry, and electromagnetics; they're listed on CUDA Zone."

This discussion has been archived. No new comments can be posted.

NVIDIA's $10K Tesla GPU-Based Personal Supercomputer

Load All Comments

Search 236 Comments Log In/Create an Account

Comments Filter:

Graphics (Score:5, Funny)

by Anonymous Coward writes: on Sunday November 23, 2008 @04:34AM (#25863373)

Wow, that's some serious computing power! I wonder if anyone has thought of using these for graphics or rendering? I imagine they could make some killer games, especially with advanced technology like Direct 3D.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by GigaplexNZ ( 1233886 ) writes:
  
  I wonder if anyone has thought of using these for graphics or rendering?
  These are effectively just NVIDIA GT280 chips with the ports removed. Their heritage is gaming.
  I imagine they could make some killer games
  If you can find some way to get the video out to a monitor... but then you effectively just have Quad SLI GT280.
  especially with advanced technology like Direct 3D
  Uh... what? Direct 3D has been commonly used for years, you make it sound like some new and exotic technology. It is also effectively Windows only, whereas this hardware is more likely to use something like Linux.
  - Comment removed (Score:5, Funny)
    
    by account_deleted ( 4530225 ) * writes: on Sunday November 23, 2008 @05:29AM (#25863591)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
  - Re:Graphics (Score:5, Funny)
    
    by Gnavpot ( 708731 ) writes: on Sunday November 23, 2008 @06:05AM (#25863697)
    
    "I wonder if anyone has thought of using these for graphics or rendering?"
    These are effectively just NVIDIA GT280 chips with the ports removed. Their heritage is gaming.
    
    We need a "+1 Whoosh" moderation option.
    No, I do not mean "-1 Whoosh". I want to see those embarrassingly stupid postings. But perhaps this moderation option should subtract karma.
    
    Parent Share
    twitter facebook
    - Re:Graphics (Score:5, Funny)
      
      by GigaplexNZ ( 1233886 ) writes: on Sunday November 23, 2008 @07:15AM (#25863909)
      
      I suppose I'm one of those guys now. Hook, line and sinker.
      
      Parent Share
      twitter facebook
    - Re: (Score:2, Interesting)
      
      by xonar ( 1069832 ) writes:
      
      So being naive to the ways of the world is bad karma now? I thought Buddhism stressed being free from the material things of the world.
    - Re: (Score:2)
      
      by aj50 ( 789101 ) writes:
      
      I'd suggest -1 since that's the most likely preference.
      It doesn't really matter which it is as you can add a modifier for each of the moderation types in your preferences (should you dislike reading funny posts or enjoy a good bit of flamebait.)
  - Re: (Score:2)
    
    by Provocateur ( 133110 ) writes:
    
    If you can find some way to get the video out to a monitor
    Yup, time to break out those ol' CGA monitors out from the garage...knew they'd come in handy again one day, and with Linux' oh-so-retro CLI mode, I'm set!
  - Re: (Score:3, Informative)
    
    by evilbessie ( 873633 ) writes:
    
    In much the same way that the current Quadro FX cards are based on the same chip as the gaming gforce cards. But still the most expensive gaming card is ~£400, but you'll pay ~£1500 for the top of the line FX5700.
    It's because workstation graphics cards are configured for accuracy above all else, where as gaming cards are configured for speed. Having a few pixels being wrong does not affect gaming at all, getting the numbers wrong in simulations is going to cause problems.
    Mostly the people who us
- Heck with games, I want a holodeck ! (Score:2)
  
  by OneInEveryCrowd ( 62120 ) writes:
  
  10 Gs ? I'd pay that.
Heartening... (Score:3, Interesting)

by blind biker ( 1066130 ) writes: on Sunday November 23, 2008 @04:37AM (#25863381) Journal

...to see a company established in a certain market, to branch out so aggressively and boldly into something... well, completely new, really.
Does anyone know if Comsol Multiphysics can be ported to CUDA?

Share
twitter facebook
- - Re:Heartening... (Score:5, Interesting)
    
    by mangu ( 126918 ) writes: on Sunday November 23, 2008 @06:10AM (#25863719)
    
    Can you imagine a Beowulf cluster of these?
    Yes, I can. My first thought when I saw the article was to calculate how many of them one would need to simulate a human brain in real time. The answer is: with 2500 of these machines one could simulate a hundred billion neurons with a thousand synapses each, firing a hundred times per second, which is the approximate capacity of a human brain.
    People have paid $20 million to visit the space station, now who will be the first millionaire hobbyist to pay $25 million to have his own simulated human brain?
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by swamp_ig ( 466489 ) writes:
      
      Would the interconnects be fast enough? There's a lot of non-locality in the synaptic connections, so you're going to need some pretty heavy comms between the cores.
      Also a selection of neurons are far more heavily connected than 1000s of synapses, and they're fairly essential ones. Might these be a critical path?
      Sure would be cool to build such a beast, do some random connections, and see what happens...
      - Re: (Score:3, Interesting)
        
        by HiThere ( 15173 ) writes:
        
        I think your post was intended humorously, but I'm going to pretend otherwise. (Note, I'm not a specialist in computational mentalistics, or whatever the field would be called, but:)
        I'm fairly certain the interconnects are fast enough. The brain is no speed demon on individual connections. It's basically chemical, with only a little electrical stuff on top that's still based on ions floating in liquid.
        The problem is the software. And the sensoria. And the effectors.
        Each of those problems is being addre
    - Re:Heartening... (Score:5, Interesting)
      
      by smallfries ( 601545 ) writes: on Sunday November 23, 2008 @07:38AM (#25863971) Homepage
      
      Your figures are off by several orders of magnitude. 2500 of these is roughly 10,000T/flops. As a Tflop is 10^12 operations, and we have 10^11 neurons that leaves 10^5 floating point operations per neuron. If each has 1000 synapses to process then we are down to 100 operations per connection, per second.
      At this point it seems obvious that you've assumed a really simplistic model of a neuron that can compute a synaptic value in a single floating point operation. These simple neuron models don't behave like a real brain, and scaling up simulations of them doesn't produce anything interesting. Real neurons are capable of computing much more complex functions than these models. The throughput on the interconnect is going to be a major factor, and simulating each neuron will require from 10s to 1000000s of operations depending on the level of biological realism that is required. The Blue Brain project has a lot of interesting material on different models of the neuron and the tradeoff between performance and realism.
      Their end goal is to dedicate a large IBM Blue Gene to simulating an entire column within the brain (roughly 1,000,000 neurons) using a biologically-realistic model.
      
      Parent Share
      twitter facebook
      - Re:Heartening... (Score:5, Informative)
        
        by LeDopore ( 898286 ) writes: on Sunday November 23, 2008 @10:13AM (#25864527) Homepage Journal
        
        You're right unless there's a computational way to take advantage of the fact that most neurons in cortex pretty much never fire (1), and that a small minority of synapses are responsible for nearly all of the excitation in a slab of cortical tissue (2). If not active == not important == not necessary to simulate with a 100% duty cycle (these are big "ifs"), then we could be literally about 3-5 orders of magnitude closer to being able to simulate whole brains than anyone realizes.
        (1) How silent is the brain: is there a "dark matter" problem in neuroscience? Shy Shoham, Daniel H. O'Connor, Ronen Segev. J Comp Physiol A (2006)
        (2) Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits. Sen Song, Per Jesper Sjostro, Markus Reigl, Sacha Nelson, Dmitri B. Chklovskii. PLOS biology March 2005
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by dkf ( 304284 ) writes:
        
        At this point it seems obvious that you've assumed a really simplistic model of a neuron that can compute a synaptic value in a single floating point operation. These simple neuron models don't behave like a real brain, and scaling up simulations of them doesn't produce anything interesting. Real neurons are capable of computing much more complex functions than these models. The throughput on the interconnect is going to be a major factor, and simulating each neuron will require from 10s to 1000000s of operations depending on the level of biological realism that is required.
        The real question from an AI perspective is whether all that detail is necessary. Do we need to simulate individual signaling molecules? Do we need to simulate individual synapses? Can we simulate things at a higher level than neurons and still get a functionally similar model?
        Obviously, for some things we definitely can't abstract (such as the effect of certain kinds of small molecules on consciousness). But nobody really has any idea whether general AI needs that level of detail. My hunch is that it doesn
      - Re: (Score:2)
        
        by ThisNukes4u ( 752508 ) * writes:
        
        Also don't forget that the 10 Tflops figure quoted is PEAK Tflops. I would be very surprised if the hardware could sustain even half of that in any realistic simulation.
      - It gets worse... (Score:2)
        
        by raftpeople ( 844215 ) writes:
        
        Also left out of the calculations are the glial cells. There are 10x more glial cells than neurons. They were previously thought to not be part of brain calculation but have since been shown to modulate the activity of the neurons. We've got a long way to go.
    - Re: (Score:2)
      
      by deander2 ( 26173 ) * writes:
      
      the problem w/ simulating the human brain is more of the software than the hardware. if you have any unique insight into how to program the thing, i think it would make a dissertation topic that would bring you almost instant fame and fortune. =P
4 TFLOPS? (Score:5, Insightful)

by Anonymous Coward writes: on Sunday November 23, 2008 @04:38AM (#25863385)

A single Radeon 4870x2 is 2.4 TFLOPS. Some supercomputer, that.
Seriously, why is this even news? nVidia makes a product, which is OK, but nothing revolutionary. The devaluation of the "supercomputer" term is appalling.
Also, how much of that 4 TFLOPS you can get on actual applications? How's FFT? Or LINPACK?

Share
twitter facebook
- Re:4 TFLOPS? (Score:5, Informative)
  
  by GigaplexNZ ( 1233886 ) writes: on Sunday November 23, 2008 @05:30AM (#25863599)
  
  A single Radeon 4870x2 is 2.4 TFLOPS.
  A single Radeon 4870x2 uses two chips. This Tesla thing uses 4 chips that are comparable to the Radeon ones. It should be obvious that they would be in a similar ballpark.
  Seriously, why is this even news?
  It isn't. Tesla was released a while ago, this is just a slashvertisement.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Interesting)
    
    by X-acious ( 1079841 ) writes:
    
    A single Radeon 4870x2 uses two chips
    2.4 / 2 = 1.2
    Each Tesla GPU has 240 cores and delivers about 1 TeraFLOPS single precision...
    Each Radeon HD 4870 produces 1.2 TFLOPS, about 0.2 more than one Tesla GPU.
    "NVIDIA announced...the Tesla Personal Supercomputer -- a 4 TeraFLOPS desktop...
    Two 4870 X2s equal 4.8 TFLOPS, 0.8 more than four Tesla GPUs.
    I think the parent's point was that even when an HD Radeon 4870 X2 is made up of two cards they're still connected and recognized as one. Thus, with "fewer" cards and fewer slots you could achieve more performance. Or you could use the other two vacant slots for yet another two 4870s: Four of them in crossfire would equal 9.6 TFLOPS, 5.6 more than four Tesla GP
  - It's news because... (Score:3, Interesting)
    
    by raftpeople ( 844215 ) writes:
    
    NVIDIA has done a good job of making the processing power accessible to programmers that are not GPU coding experts. In addition, they have made hardware changes to better support the type of scientific computation being done on these devices.
    
    So, while in theory you could put together some Radeon's, work with their API and achieve the same thing, NVIDIA has significantly reduced the level of effort to make it happen.
- Comment removed (Score:4, Interesting)
  
  by account_deleted ( 4530225 ) writes: on Sunday November 23, 2008 @10:41AM (#25864667)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by MostAwesomeDude ( 980382 ) writes:
  
  Depends on the kind of precision you want. Also the big limiting factor in these kinds of apps is actually feeding the GPUs. Y'know that little glxgears test app that everybody uses to test their FPS? The glxgears framerate is actually just the number of times per second that the driver can properly set up the card, prepare a display list, flush it to the card, and then swap the buffers. The card usually can go much faster than that.
  (And, of course, the point is, glxgears is probably the fastest thing that
What, no coil? (Score:5, Funny)

by dgun ( 1056422 ) writes: on Sunday November 23, 2008 @04:48AM (#25863409) Homepage

What a rip.

Share
twitter facebook
- Nor turbine. (Score:2, Interesting)
  
  by BOFHelsinki ( 709551 ) writes:
  
  Shameless exploitation of the good name of one of the greatest inventors of all time. :-)
- Re:What, no coil? (Score:4, Funny)
  
  by geekmux ( 1040042 ) writes: on Sunday November 23, 2008 @09:03AM (#25864233)
  
  What a rip.
  Yeah, no shit. First bastard that tries to put a "Tesla Capable" sticker on the front, I'm gonna sue.
  
  Parent Share
  twitter facebook
What a disappointment (Score:2, Interesting)

by dleigh ( 994882 ) writes:

At first glance I thought these used actual Tesla coils [wikipedia.org] in the processor, or the devices were at least powered or cooled by some apparatus that used Tesla coils.

Turns out "Tesla" is just the name of the product.

Drat. I demand a refund.
- Re: (Score:2)
  
  by rhyder128k ( 1051042 ) writes:
  
  They should at least come up with a "mad scientist lab pack" that includes some Tesla coils. Perhaps they presume that mad scientists will have their own gear.
  I just spent an entire morning trying out massive single throw switches.
  "Now, we'll SEE who's mad! [thunk]"
  "Now, we'll see who's MAD! [thunk]"
  In all fairness, these things can be pretty personal.
  - Your probably right about the "mad scientist" ... (Score:3, Insightful)
    
    by PolygamousRanchKid ( 1290638 ) writes:
    
    . . . that's probably exactly the person who would buy one of these.
    Folks who are professionally working on mainstream problems that require supercomputers, well, they probably have access to one already. (Maybe one of the supercomputing folks might want to chime in here; do you have enough access/time? Would a baby-supercomputer be useful to you?)
    But there is certainly someone out there who was denied access, because his idea was rejected by peer review. He is considered a loopy nut bag, because he
    - Re: (Score:2)
      
      by rhyder128k ( 1051042 ) writes:
      
      Perhaps there will be a resurgence in mad, unethical experimentation. In 20 years, this computer might acquire a status similar to that of the Altair 8800 home computer kit.
      I still say that 640 human embryos should be enough for anybody.
- Re: (Score:2)
  
  by David Gerard ( 12369 ) writes:
  
  I thought of the car first. I figured that's how much battery you'd need to run it in a laptop.
Binary-only toolchain (Score:5, Informative)

by Anonymous Coward writes: on Sunday November 23, 2008 @04:50AM (#25863421)

The toolchain is binary only and has an EULA that prohibits reverse engineering.

Share
twitter facebook
- Re:Binary-only toolchain (Score:5, Informative)
  
  by FireFury03 ( 653718 ) writes: <{gro.kusuxen} {ta} {todhsals}> on Sunday November 23, 2008 @05:23AM (#25863561) Homepage
  
  has an EULA that prohibits reverse engineering.
  Not really a big deal to those of us in the EU since we have a legally guaranteed right to reverse engineer stuff for interoperability purposes.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ScrewMaster ( 602015 ) * writes:
    
    has an EULA that prohibits reverse engineering.
    Not really a big deal to those of us in the EU since we have a legally guaranteed right to reverse engineer stuff for interoperability purposes.
    Don't get cocky. It's only presently guaranteed. Laws change, and there's a whole lot of pressure to make that change.
  - Re: (Score:2)
    
    by devman ( 1163205 ) writes:
    
    We do in the US as well, it's listed in the exceptions part of the DMCA and has been part of the U.S. Code for awhile.
  - - Re: (Score:2)
      
      by FireFury03 ( 653718 ) writes:
      
      So not only are you bragging with your confederation's legislation
      Not really - I'm making people aware of a fairly sensible piece of legislation. There are two good reasons for this:
      1. Some people in the EU may not be aware of this legislation, and it may be in their interests to know about it.
      2. Some people not in the EU may not be aware of this legislation and may want to try and get similar legislation adopted in their locality.
      something you personally most likely had no influence on, at all
      And my influence on the legislation is relevant how exactly?
      you're also claiming other confederation's legislations have no influence on you
      No, please cite anything I said which even implies this, let alone expressly state
    - Re: (Score:2)
      
      by Jeremi ( 14640 ) writes:
      
      So not only are you bragging with your confederation's legislation
      Was he bragging, or merely stating a fact? Your assumption of the former suggests a certain defensiveness about our country's wise and glorious IP law... ;^)
- Re: (Score:2)
  
  by JamesP ( 688957 ) writes:
  
  The toolchain is binary only and has an EULA that prohibits reverse engineering.
  Show me a non-free EULA that doesn't.
- - NVCC intermediate assembly (Score:2)
    
    by DrYak ( 748999 ) writes:
    
    This is relevant because the compiler creates device specific binaries that you can't get the assembler code for.
    
    Yes you can. Just give the proper switch to ask NVCC to keep all intermediate files.
    You'll both get the high level shaders that got compiled. And the resulting assembler which subsequently code compiled into op-codes.
    (Just don't have cuda handy at home to check what the options where).
    My main objection is that CUDA is nVidia hardware-specific only, and ties you to a single provider.
    The various incarnation of Brook (currently supported by ATI's card) are much more interesting as they are vendor neutral and s
- - Re: (Score:2)
    
    by Wesley Felter ( 138342 ) writes:
    
    There is some truth to what you say; I know the national labs in particular are working on a completely open source HPC stack. But many others in the HPC world have been using proprietary compilers, debuggers, filesystems, etc. for decades.
And the worst timing ever award goes to... (Score:2, Insightful)

by CryptoJones ( 565561 ) writes:

While the inner nerd in me screams to take out a loan against my house to buy one, I can't imagine this being very popular outside academia. Most users don't use the power of their crappy computers, let alone this. And then there is the whole "ECONOMY" thing.
- Re: (Score:3, Insightful)
  
  by Yetihehe ( 971185 ) writes:
  
  It IS marketed for academia. Normal users don't really need to fold proteins or simulate nuclear weapons at home.
  - Re: (Score:2, Informative)
    
    by palegray.net ( 1195047 ) writes:
    
    I'm perfectly normal, and I fold proteins all the time [webshots.com].
  - Re: (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    according to http://folding.stanford.edu/English/Stats about 250.000 "normal" users are folding proteins at home.
    Personally, I would use it as a render farm, but Blender compatibility could take a while if Nvidia keeps the drivers and specification locked up.
    What they don't seem to mention is the amount of memory/core (at 960 cores). I'd guess about 32 MB/core, and 240 cores sharing the same memory bus...
    - Re: (Score:2)
      
      by evilbessie ( 873633 ) writes:
      
      err, you seem to have missed something fairly major in your understanding. Specifically about what constitutes a 'core'. These cards are based on the same chip in the GT280, so they have 240 stream processors, which are very good at specific types of calculation (If I was wiser I could tell you what types but I'm sure you can use google yourself). I believe that each of the chips has a 512 bit wide bus to 4GiB of memory. I'm not sure what the memory allocation per stream processor is but I think the other p
      - CUDA memory structure (Score:3, Informative)
        
        by DrYak ( 748999 ) writes:
        
        but I don't know enough about it to be able to give useful information on the subject.
        I do write some CUDA code, so I'll try to help.
        I believe that each of the chips has a 512 bit wide bus to 4GiB of memory.
        Indeed each physical package has entirely access to its own whole chuck of memory, regardless of who many "cores" the package contains (between 2 for the lowest end laptops GPUs and 16 for the highest end 8/9800 cards. Don't know about GT280. But the summary is wrong 240 is probably the amount of ALUs or the width of the SIMD) and regaless of how many "stream processor" there are (each core has 8 ALUs, which are exposed as 32-wide SIMD processing units, which i
- Yes but (Score:3, Funny)
  
  by Colin Smith ( 2679 ) writes:
  
  And then there is the whole "ECONOMY" thing.
  The whole reason the ECONOMY is in the tank is because there are not enough people like you taking loans out against their house to buy random stuff like this.
  Basically... IT'S ALL YOUR FAULT!
Let me be the first to say... (Score:5, Funny)

by rdnetto ( 955205 ) writes: on Sunday November 23, 2008 @05:08AM (#25863503)

4 Terraflops should be more than enough for anybody...

Share
twitter facebook
- 4 Terraflops? (Score:3, Funny)
  
  by yfkar ( 866011 ) writes:
  
  As opposed to astroflops?
- Re: (Score:2)
  
  by Trogre ( 513942 ) writes:
  
  You can keep your Terraflops. I demand Martianflops!
Scientist speak (Score:2, Interesting)

by jnnnnn ( 1079877 ) writes:

So many scientists use the word "codes" when they mean "program(s)".
Why is this?
- Re: (Score:3, Interesting)
  
  by Anonymous Coward writes:
  
  It's cultural.
  You're not even allowed to say that you're "coding", but only that you produce "codes".
  Maybe it's because analytic science is basic on equations which become algorithms in computing, and you can't say that you're "equationing" nor "algorithming".
  In practice it's actually dishonest, because the algorithms don't have the conceptual power of the equations that they represent (they would if programmed in LISP, but "codes" are mostly written in Fortran and C), so the computations are often question
weak DP performance (Score:5, Informative)

by Henriok ( 6762 ) writes: on Sunday November 23, 2008 @05:53AM (#25863655)

I supercomputing circles (i.e. Top500.org) double precision floating point operations seems to be what is desired. 4 TFLOPS single precision, while impressive, is overshadowed by the equally weak 80 GFLOPS double precision, beaten by a single PowerXCell 8i (successor to the Cell in PS3) or the latest crop of Xeons. I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.

Share
twitter facebook
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  I'm just amazed that the performance loss from single to double precision is more than a factor of 10! It's only 2x the bits, what's the holdup?
boring apps... let's have some realtime raytracing (Score:4, Insightful)

by Lazy Jones ( 8403 ) writes: on Sunday November 23, 2008 @05:59AM (#25863679) Homepage Journal

there were a lot of early efforts trying to implement realtime rayracing engines for games (e.g. at Intel recently [intel.com]), let's port that stuff and have some fun.

Share
twitter facebook
- Developement Platform (Score:3, Insightful)
  
  by dreamchaser ( 49529 ) writes:
  
  On that note, it would be a good development platform for realtime raytraced game engines. That way the code would be mature when affordable GPU's come out that can match that level of performance.
Can I have a smaller version? (Score:2)

by Fuzuli ( 135489 ) writes:

Is it possible to build a smaller version of this configuration? I do not have 10K, but I can come up with something smaller for my PhD research. In that case, is this a package that can be replicated via off the shelf nvidia hardware, or do I need to wait for NVidia to release a smaller version?
- Re: (Score:2)
  
  by JamesP ( 688957 ) writes:
  
  Well, buy any card that supports CUDA (pretty much all offers by nVidia today - except you probably want to stay off the cheapest stuff)
  You can also try running a PS3 + Linux or try the similar offers from AMD/ATI
  - Re: (Score:2)
    
    by Fuzuli ( 135489 ) writes:
    
    Sorry, I should have been clearer. I'm aware of those solutions, but would it be the same in terms of processing power, software support (cuda, related libraries etc..)
    I mean is this a convenient repackaging of what is already out there, or does it have something extra?
    - Re:Can I have a smaller version? (Score:4, Informative)
      
      by SpinyNorman ( 33776 ) writes: on Sunday November 23, 2008 @09:44AM (#25864379)
      
      From NVidia's CUDA site, most of their regular display cards support CUDA, just with less cores (hence less performance) than the Tesla card. The cores that CUDA uses are what used to be called the vertex shaders on your (NVidia) card. The CUDA API is designed so that your code doesn't know/specify how many cores are going to be used - you just code to the CUDA architecture and at runtime it distrubutes the workload to the available cores... so you can develop for a low end card (or they even have an emulator) then later pay for th hardware/performance you need.
      
      Parent Share
      twitter facebook
    - Re: (Score:3, Informative)
      
      by kramulous ( 977841 ) * writes:
      
      The 10K refers to a rack mount solution containing 4xGPUs. You can still buy a single GPU and try and put it in a standard machine (provided it doesn't melt - I'd read the specs) for about a quarter of the price.
Erlang (Score:3, Interesting)

by Safiire Arrowny ( 596720 ) writes: on Sunday November 23, 2008 @06:25AM (#25863763) Homepage

So how do you get an Erlang system to run on this?

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by eggnoglatte ( 1047660 ) writes:
  
  By writing an Erlang-to-CUDA compiler?
  More seriously though, it is probably not worth even trying, since the GPUs used in the Tesla support a very limited model of parallelism. Shoehorning the flexibility of Erlang into that would at the very leas result in a dramatic performance loss, if it is possible at all.
And in other news... (Score:5, Funny)

by bsDaemon ( 87307 ) writes: on Sunday November 23, 2008 @06:58AM (#25863841)

... AMD has annouced today it new Edison Personal Supercomputer technology.
The game is on.

Share
twitter facebook
cold hard facts about cuda (Score:3, Interesting)

by Gearoid_Murphy ( 976819 ) writes: on Sunday November 23, 2008 @07:16AM (#25863913)

it's not about how many cores you have but how efficiently they can be used. If your CUDA application is any way memory intensive you're going to experience a serious drop in performance. A read from the local cache is 100 times faster than a read from the main ram memory. This cache is only 16kb. I spend most of my time figuring out how to minimise data transfers. That said, CUDA is probably the only platform that offers a realistic means for a single machine to tackle problems requiring gargantuan computing resources.

Share
twitter facebook
- Re:cold hard facts about cuda- unbalanced (Score:5, Insightful)
  
  by anon mouse-cow-aard ( 443646 ) writes: on Sunday November 23, 2008 @07:52AM (#25864025) Journal
  
  People are always coming out of the wood work to claim supercomputer performance with such and such a solution, go back and look at GRAPE (which is really cool.) http://arstechnica.com/news.ars/post/20061212-8408.html [arstechnica.com] or a lot of other supercomputer clusters. When you want something flexible, you look for "balance" that means a good relationship between memory capacity, latency & bandwidth, as well as computer power. in terms of memory capacity, the number people talk about is: 1 byte/flop... that is 1 Tbyte of memory is about right to keep 1 TFLOP flexibly useful. this thing has 4 G of memory for 4 TF... in other words: 1 byte / 1000 flops. it's going to be hard to use in a general purpose way.
  
  Parent Share
  twitter facebook
- BrookGPU (Score:2)
  
  by Skinkie ( 815924 ) writes:
  
  In the paste I was not very impressed by things as http://www-graphics.stanford.edu/projects/brookgpu/ [stanford.edu] because of the latency that is involved in actually transferring data back and forth from CPU to GPU memory. Thus I observed the same thing. But now it seems to the actual latency for transfer is reduced because of PCI-e, one might wonder if decent compiler technology is able to optimise 'normal' code for GPU instructions.
Patmos International (Score:3, Interesting)

by Danzigism ( 881294 ) writes: on Sunday November 23, 2008 @08:57AM (#25864197)

ahh yes the idea of personal supercomputing. Back in '99 I worked for Patmos International. We were at the Linux Expo for that year as well if some of you might remember. Our dream was to have a parallel supercomputer in everyone's home. We used mostly Lisp and Daisy for the programming aspect. The idea was wonderful, but eventually came to a screeching halt when nothing was being sold. It was ahead of it's time for sure. you can find out a little more about it here. [archive.org] I find the whole ideal of symbolic multiprocessing very fascinating though.

Share
twitter facebook
- - Re: (Score:2)
    
    by Danzigism ( 881294 ) writes:
    
    basically just data centralization, optional linux terminal services, and a directory server for controlling user policies on your kids' computers.. backing up is something people are not capable of doing, and USB hard drives didn't even exist yet, and computers were expensive.. we really focused more on a commercial market for weather stations that needed large amounts of computing power to perform predictions and calculations.. sold a few small units to schools and small internet providers as well.. there
Yes but, (Score:2)

by Landshark17 ( 807664 ) writes:

Will it run Duke Nukem For... eh, you all know where this is going...
The network is the computer (Score:2)

by wikinerd ( 809585 ) writes:

Personal supercomputer? Surely it's cool, but how about turning the whole Internet into a supercomputer?
Make Internet fast enough and equip every node with a network operating system to share its resources with all other nodes. Sounds like a security nightmare, but let's focus on the performance part for now. Every one of us has a CPU, a storage device (eg SSD), and some RAM. But not all of us use all of our CPU, SSD, or RAM at the same time. While I play a game effectivelly making my CPU to work at 100
- Re: (Score:2, Interesting)
  
  by Surreal Puppet ( 1408635 ) writes:
  
  Port john the ripper/aircrack-ng? Buy a few terabyte drives and start generating hash tables?
- FTFL (Score:3, Informative)
  
  by mangu ( 126918 ) writes:
  
  now what the heck to do with it...
  All you need to do is follow the fscking link [nvidia.com]. Plenty of examples there.
  - - Re: (Score:3, Interesting)
      
      by neomunk ( 913773 ) writes:
      
      Neural nets.
      This setup sounds ideal for a training bed for fann programs. I can't recall if there's a port of fann for CUDA, but I think there might be.
  - - Re: (Score:3, Funny)
      
      by SmokeyTheBalrog ( 996551 ) writes:
      
      Once CUDA has deep consumer penetration the 3D CGI furry anime loli porn will come! In droves if not herds.
      
      Oh crap, I forgot to click Post AC.
- Re: (Score:2)
  
  by karstux ( 681641 ) writes:
  
  Real-time radiosity rendering?
- Re: (Score:2)
  
  by neokushan ( 932374 ) writes:
  
  Port Doom to it.
  - Re: (Score:2)
    
    by drik00 ( 526104 ) writes:
    
    I'm so proud of the /. community... even almost ten years later, I still spotted the "beowulf" and "doom" references :) *tear*
    J
    - Re: (Score:2)
      
      by neokushan ( 932374 ) writes:
      
      Is it really a reference if it's stated outright?
- Re: (Score:2, Funny)
  
  by itsybitsy ( 149808 ) * writes:
  
  Not yet.... darn NVidia, no Vista Drivers yet...
  Come on NVidia GET WITH IT!!!
- Re: (Score:2, Insightful)
  
  by GigaplexNZ ( 1233886 ) writes:
  
  OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff. Why should we care if OO fanboys are scared off? Decent developers know to use the right tool for the job, not try to shoehorn whatever their personal favourite is into every situation.
  - Re: (Score:2)
    
    by Joce640k ( 829181 ) writes:
    
    "...isn't particularly well suited for algorithms and other maths oriented stuff"
    Yeah, all that operator overloading is a real pain in the ass for numerical work.
    - Re: (Score:3, Informative)
      
      by HuguesT ( 84078 ) writes:
      
      Actually yes it is. For instance nobody has yet figured out an efficient matrix class in C++ that uses operator overloading. This is basically an impossible task to write B=A*X*A^t efficiently, which occurs all the time in linear analysis, because in C++ the transpose would require a copy operator, whereas one ought to get the job done simply with a different iterator. C++ is not equipped for this yet.
      - Re: (Score:2)
        
        by eh2o ( 471262 ) writes:
        
        The trick to that sort of optimization is to defer the evaluation until the =, at which point the optimal execution plan is selected. But, you can't overload "=" in C++. The workaround is basically to provide another function to explicitly force the evaluation when you need it.
  - Re:Only in C? Oh dear. (Score:5, Informative)
    
    by xororand ( 860319 ) writes: on Sunday November 23, 2008 @06:26AM (#25863765)
    
    OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.
    The term OO is too general to make a statement about its usefulness for mathematics oriented problems. The powerful templating features of modern C++ are indeed very useful for numerical simulations:
    It's called C++ Expression Templates, an excellent tool for numerical simulations. ETs can get you very close to the performance of hand optimized C code while they're much more comfortable to use than plain C. Parallelization is also relatively easy to achieve with expression templates.
    A research team at my university actually uses expression templates to build some sort of meta compiler which translates C++ ETs into CUDA code. They use it to numerically simulate laser diodes.
    Search for papers by David Vandevoorde & Todd Veldhuizen if you want to know more about this. They both developed the technique independently.
    Vandevoorde also explains ETs to some degree in his excellent book "C++ Templates - The Complete Guide".
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by Fred_A ( 10934 ) writes:
    
    OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.
    Absolutely, that's what Fortran is for !
- It also runs Python (Score:4, Informative)
  
  by mangu ( 126918 ) writes: on Sunday November 23, 2008 @06:15AM (#25863729)
  
  Look, there's Python here [nvidia.com]. You can do the low-level high-performance core routines in C, and use Python to do all the OO programming. This is how God intended us to program.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Funny)
    
    by BOFHelsinki ( 709551 ) writes:
    
    Ah, Parseltongue. So you are of the Slytherin school of programmers?
  - Re:It also runs Python (Score:4, Funny)
    
    by OriginalArlen ( 726444 ) writes: on Sunday November 23, 2008 @08:28AM (#25864113)
    
    This is how God intended us to program.
    Then why did he write Perl?
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Anpheus ( 908711 ) writes:
      
      The Old Testament God was vindictive and angry, and Perl is the unspoken 11th Plague.
  - Re: (Score:2)
    
    by rhsanborn ( 773855 ) writes:
    
    *sigh*...if you insist
    
    XKCD [xkcd.com]
- Re: (Score:2)
  
  by MROD ( 101561 ) writes:
  
  Actually, OOP is a bit rubbish for number crunching, far too much overhead.
  What is disappointing is that there isn't a high performance FORTRAN compiler. That's where most scientific number crunching is done. (After all, that's what the language was designed for.)
  - Re: (Score:3, Informative)
    
    by cnettel ( 836611 ) writes:
    
    OOP with virtual and all, yes. OOP with template magic to allow the compiler to do specializations can beat the heck out of even quite tediously hand-written C or FORTRAN, with much superior readability.
- Weird options (Score:4, Insightful)
  
  by mangu ( 126918 ) writes: on Sunday November 23, 2008 @06:03AM (#25863691)
  
  I went to the site and tried to configure one. The disk partition options are: "General Purpose, Internet Server, Developer's Workstation, File Server". I wonder, who needs three Tesla cards in a file server or an internet server?
  
  Parent Share
  twitter facebook
- Re: (Score:2, Informative)
  
  by BOFHelsinki ( 709551 ) writes:
  
  BTW, TFS makes a mistake calling this Tesla rig a supercomputer. Nvidia correctly just calls it a cluster replacement. A cluster is not a supercomputer, the interconnect makes all the difference, no matter how much FP crunching power there is. See NEC NX-9 or Cray's Seastar for a real supercomputer interconnect. Can't be arsed to check (this is Slashdot after all) but that Penguin Computing system likely has only InfiniBand or 10GbE for the switch network, making it "only" a cluster. :-)
- Re: (Score:2)
  
  by Provocateur ( 133110 ) writes:
  
  Owooooo.... (at the full moon)
- Re:FLOPS not FLOP! (Score:5, Funny)
  
  by TeknoHog ( 164938 ) writes: on Sunday November 23, 2008 @08:55AM (#25864191) Homepage Journal
  
  What's the plural of FLOPS then? My preciouss FLOPSes?
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Graphics (Score:5, Funny)

Re: (Score:2, Funny)

Comment removed (Score:5, Funny)

Re:Graphics (Score:5, Funny)

Re:Graphics (Score:5, Funny)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Heck with games, I want a holodeck ! (Score:2)

Heartening... (Score:3, Interesting)

Re:Heartening... (Score:5, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re:Heartening... (Score:5, Interesting)

Re:Heartening... (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

It gets worse... (Score:2)

Re: (Score:2)

4 TFLOPS? (Score:5, Insightful)

Re:4 TFLOPS? (Score:5, Informative)

Re: (Score:2, Interesting)

It's news because... (Score:3, Interesting)

Comment removed (Score:4, Interesting)

Re: (Score:2)

What, no coil? (Score:5, Funny)

Nor turbine. (Score:2, Interesting)

Re:What, no coil? (Score:4, Funny)

What a disappointment (Score:2, Interesting)

Re: (Score:2)

Your probably right about the "mad scientist" ... (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Binary-only toolchain (Score:5, Informative)

Re:Binary-only toolchain (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

NVCC intermediate assembly (Score:2)

Re: (Score:2)

And the worst timing ever award goes to... (Score:2, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2, Informative)

Re: (Score:2, Interesting)

Re: (Score:2)

CUDA memory structure (Score:3, Informative)

Yes but (Score:3, Funny)

Let me be the first to say... (Score:5, Funny)

4 Terraflops? (Score:3, Funny)

Re: (Score:2)

Scientist speak (Score:2, Interesting)

Re: (Score:3, Interesting)

weak DP performance (Score:5, Informative)

Re: (Score:2)

boring apps... let's have some realtime raytracing (Score:4, Insightful)

Developement Platform (Score:3, Insightful)

Can I have a smaller version? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Can I have a smaller version? (Score:4, Informative)

Re: (Score:3, Informative)

Erlang (Score:3, Interesting)

Re: (Score:3, Insightful)

And in other news... (Score:5, Funny)

cold hard facts about cuda (Score:3, Interesting)

Re:cold hard facts about cuda- unbalanced (Score:5, Insightful)

BrookGPU (Score:2)

Patmos International (Score:3, Interesting)

Re: (Score:2)

Yes but, (Score:2)

The network is the computer (Score:2)

Re: (Score:2, Interesting)

FTFL (Score:3, Informative)

Re: (Score:3, Interesting)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)