A Look Inside Oak Ridge Lab's Supercomputing Facility 59
1sockchuck writes "Three of the world's most powerful supercomputers live in adjacent aisles within a single data center at Oak Ridge National Laboratory in Tennessee. Inside this facility, technicians are busy installing new GPUs into the Jaguar supercomputer, the final step in its transformation into a more powerful system that will be known as Titan. The Oak Ridge team expects the GPU-accelerated machine to reach 20 petaflops, which should make it the fastest supercomputer in the Top 500. Data Center Knowledge has a story and photos looking at this unique facility, which also houses the Kraken machine from the University of Tennessee and NOAA's Gaea supercomputer."
Re:What? (Score:4, Informative)
The Top 500 is a specific list: http://top500.org/ [top500.org]
It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/ [graph500.org].
Re:What? (Score:4, Informative)
The Top 500 is a specific list: http://top500.org/ [top500.org]
It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/ [graph500.org].
The other specific consideration is that the list is ONLY for those that volunteer to run the Linpack benchmark and wish to publicize the results. It is presumed that governments with classified computing facilities withhold this information, for obvious reasons, so there are likely many "supercomputers" (perhaps even a "fastest") that will never be part of the Top 500. The US NSA, for example, is widely believed to operate facilities at or near the top of the list, but they are nowhere in sight for obvious reasons.
Re:Topology matters more than GFLOPS (Score:3, Informative)
I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.
Considering that computational fluid dynamics, molecular dynamics, etc., break down into linear algebra operations, I'd say that the FLOPS count on a LINPACK benchmark is probably the best single metric available. In massively parallel CFD, we don't match the physical topology to the computational topology, because we don't (usually) build the physical topology. But I can and do match the computational topology to the physical one.