IEEE Says Multicore is Bad News For Supercomputers 251
Richard Kelleher writes "It seems the current design of multi-core processors is
not good for the design of supercomputers. According to IEEE: 'Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16-core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores.'"
Re:Well doh (Score:3, Informative)
The summary mentions that the path to the memory controller gets clogged.
There is only so much bandwidth to go around.
Re:Time for vector processing again (Score:4, Informative)
There are really only 2 options for modern systems when it comes to memory you can have lot's of cores and a tiny cache like GPU's or lot's of cache and fewer cores like CPU's. (ignoring type of core issues and on chip interconnects etc.) So there is little advantage to paying 10x per chip to go custom vs using more cheaper chips when they can build supper computers out of CPU's, GPU's, or something between them like the Cell processor.
Re:Well doh (Score:2, Informative)
There's no real way to split the banks for each core, so the net effect is that you have 4-32 cores sharing the same lanes for memory.
The problem allegedly being.. (Score:5, Informative)
For a given node count, we've seen increases in performance. The claimed problem is that for the workloads that concern these researchers, they don't see people mentioning significant enhancements to the fundamental memory architecture projected to follow the scale at which multi-core systems go. So you buy a 16 core chip system to upgrade your quad-core based system and hypothetically gain little despite the expense. Power efficencies drop and getting more performance requires more nodes. Additionally, who is to say that clock speeds won't lower if programming models in the mass market change such that distributed workloads are common and single-core performance isn't all that impressive.
All that said, talk beyond 6-core/8-core is mostly grandstanding at this time. As memory architecture for the mass market is not considered as intrinsically exciting, I would wager there will be advancements that no one speaks to. For example, Nehalem leapfrogs AMD memory bandwidth by a large margin (like by a factor of 2). It means if Shanghai parts are considered satisfactory today to get respectable yield memory wise to support four cores, Nehalem, by a particular metric, supports 8 equally satisfactorily. The whole picture is a tad more complicated (i.e. latency, numbers I don't know off hand), but the one metric is a highly important one in the supercomputer field.
For all the worry over memory bandwidth though, it hasn't stopped supercomputer purchasers from buying into Core2 all this time. Despite improvements in their chipset, Intel Core2 still doesn't reach AMD performance. Despite that, people spending money to get into the Top500 still chose to put their money on Core2 in general. Sure, Cray and IBM supercomputers in the Top2 used AMD, but from the time of its release, Core2 has decimated AMD supercomputer market share despite an inferior memory architecture.
Re:It's so obvious... (Score:4, Informative)
You mean something like a CPU cache? I assume you know that every core already has a cache (L1) on multi-core [wikipedia.org] systems, and shares a larger cache (L2) between all cores.
The problem is that on/near-core memory is damn expensive, and your average supercomputing task requires significant amounts of memory. When the bottleneck for high performance computing becomes memory bandwidth instead of interconnect/network bandwidth you have something a lot harder to optimize, so I can understand where the complaint in IEEE comes from.
Perhaps this will lead to CPUs with large L1 caches specifically for supercomputing tasks, who knows...
Re:Well doh (Score:3, Informative)
Actually that is part of the problem. Most of the architectures have core-specific L1 cache, and unless a particular thread has its affinity mapped to a particular core, a thread can jump from a core where its data is in the L1 cache to a core where its data is not present, and is forced to undergo a cache refresh from memory.
Also, regardless of whether a system is multi-processing within a chip (multi-core) or on a board (multi-CPU), the number of communication channels required to avoid communication bottlenecks goes up as O(n^2) the number of cores.
So yes, we are probably seeing the beginning of the end of performance gains using general-purpose CPU interconnects and have to go back to vector processing. Unless we are somehow able to jump the heat dissipation barrier and start raising GHz again.
Re:Time for vector processing again (Score:3, Informative)
http://www.cray.com/products/XMT.aspx
Rest assured, there are still people who know how to build them. They're just not quite as popular as they used to be, now that a middle manager who has no idea what the hell they're talking about can go to an upper manager with a spec sheet that's got 8 thousand processors on it and say "look! This ones got a whole ton more processors than that dumb Cray thing!"
Re:Time for vector processing again (Score:5, Informative)
Sorry but that's not entirely correct, most super computers work on highly parallel problems [earthsimulator.org.uk] using numerical analysis [wikipedia.org] techniques. By definition the problem is broken up into millions of smaller problems [bluebrain.epfl.ch] that make ideal "small apps", a common consequence is that the bandwidth of the communications between the 'small apps' becomes the limiting factor.
"Back in the early-mid 90's we had different processors for Desktop and Super Computers."
The earth simulator was refered to in some parts as 'computenick', it's speed jump over it's nearest rival and longevity at the top marked the renaissance [lbl.gov] of "vector processing" after it had been largely ignored during the 90's.
In the end a supercomputer is a purpose built machine, if cores fit the purpose then they will be used.
Re:Time for vector processing again (Score:3, Informative)
Yes, that is what I want, a super computer designed by an English major...
Please get over yourself. This is slashdot, not something important like a resume or will.
Re:It's so obvious... (Score:3, Informative)
Re:Time for vector processing again (Score:3, Informative)
Cray did not stream vectors from memory. One of the advances of the Cray-1 was the use of vector registers as opposed to, for example, the Burroughs machines which streamed vectors directly to/from memory.
We know how to build memory systems that can handle large vectors. Both the Cray X1 and Earth Simulator demonstrate that. The problem is that those memory systems are currently too expensive. We are going to see more and more vector processing in commodity processors.
Re:Time for vector processing again (Score:3, Informative)
Bad English isn't something you can keep locked out of sight in the back closet of society; it's like a termite infestation. Allow it a foothold and it'll spread everywhere. It's a higher-entropy state.
There's a world difference between someone who's just writing casually (and goofing up), and someone else who is completely unable to grasp the tenets of grammar. The former are perfectly capable of writing well on a resume, as you say; the latter are functionally illiterate, and they should be told when their English isn't good enough to participate in a discussion. There's nothing wrong with that; it gives them the opportunity to improve.