A Look Inside Oak Ridge Lab's Supercomputing Facility 59
1sockchuck writes "Three of the world's most powerful supercomputers live in adjacent aisles within a single data center at Oak Ridge National Laboratory in Tennessee. Inside this facility, technicians are busy installing new GPUs into the Jaguar supercomputer, the final step in its transformation into a more powerful system that will be known as Titan. The Oak Ridge team expects the GPU-accelerated machine to reach 20 petaflops, which should make it the fastest supercomputer in the Top 500. Data Center Knowledge has a story and photos looking at this unique facility, which also houses the Kraken machine from the University of Tennessee and NOAA's Gaea supercomputer."
Topology matters more than GFLOPS (Score:5, Insightful)
I really, really wish articles would stop saying that computer X has Y GFLOPS. It's almost meaningless, because when you're dealing with that much CPU power, the real challenge is to make the communications topology match the computational topology. That is, you need the physical structure of the computer to be very similar to the structure of the problem you are working on. If you're doing parallel processing (and of course you are, for systems like this), then you need to be able to break your problem into chunks, and map each chunk to a processor. Some problems are more easily divided into chunks than other problems. (Go read up on the "parallel dwarves" for a description of how things can be divided up, if you're curious.)
I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.
Here's a (somewhat dumb) analogy. Let's say a Formula 1 race car can do a nominal 250 MPH. (The real number doesn't matter.) If you had 1000 F1 cars lined up, side by side, then how fast can you go? You're not going 250,000 MPH, that's for sure.
I'm not saying that this is not a real advance in supercomputing. What I am saying, is that you cannot measure the performance of any supercomputer with a single GFLOPS number. It's not an apples-to-apples comparison, unless you really are working on the exact same problem (like molecular dynamics). And in that case, you need some unit of measurement that is specific to that kind of problem. Maybe for molecular dynamics you could quantify the number of atoms being simulated, the average bond count, the length of time in every "tick" (the simulation time unit). THEN you could talk about how many of that unit your system can do, per second, rather than a meaningless number like GFLOPS.