IEEE Says Multicore is Bad News For Supercomputers 251

Posted by timothy on Friday December 05, 2008 @08:04AM from the unexpected-downsides dept.

Richard Kelleher writes "It seems the current design of multi-core processors is not good for the design of supercomputers. According to IEEE: 'Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16-core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores.'"

IEEE Says Multicore is Bad News For Supercomputers

This discussion has been archived. No new comments can be posted.

Search 251 Comments Log In/Create an Account

Comments Filter:

Re:Well doh (Score:3, Informative)

by cheater512 ( 783349 ) writes: <nick@nickstallman.net> on Friday December 05, 2008 @09:16AM (#26001741) Homepage

The summary mentions that the path to the memory controller gets clogged.
There is only so much bandwidth to go around.

Re:Time for vector processing again (Score:4, Informative)

by Retric ( 704075 ) writes: on Friday December 05, 2008 @09:17AM (#26001757)

Modern CPU's have 8+ Mega Bytes of L2/L3 cache on chip so RAM is only a problem when your working set it larger than that. The problem super computing folks are having is they want to solve problems that don't really fit in L3 cache which creates significant problems but they still need a large cache. However, because of speed of light issues off chip ram is always going to be high latency so you need to use some type of on chip cache or stream lot's off data to the chip.

There are really only 2 options for modern systems when it comes to memory you can have lot's of cores and a tiny cache like GPU's or lot's of cache and fewer cores like CPU's. (ignoring type of core issues and on chip interconnects etc.) So there is little advantage to paying 10x per chip to go custom vs using more cheaper chips when they can build supper computers out of CPU's, GPU's, or something between them like the Cell processor.

Re:Well doh (Score:2, Informative)

by jebrew ( 1101907 ) writes: on Friday December 05, 2008 @09:20AM (#26001771)

That's how a lot of boards are already done. The issue is with a single processor that has multiple cores.
There's no real way to split the banks for each core, so the net effect is that you have 4-32 cores sharing the same lanes for memory.

The problem allegedly being.. (Score:5, Informative)

by Junta ( 36770 ) writes: on Friday December 05, 2008 @09:21AM (#26001777)

For a given node count, we've seen increases in performance. The claimed problem is that for the workloads that concern these researchers, they don't see people mentioning significant enhancements to the fundamental memory architecture projected to follow the scale at which multi-core systems go. So you buy a 16 core chip system to upgrade your quad-core based system and hypothetically gain little despite the expense. Power efficencies drop and getting more performance requires more nodes. Additionally, who is to say that clock speeds won't lower if programming models in the mass market change such that distributed workloads are common and single-core performance isn't all that impressive.
All that said, talk beyond 6-core/8-core is mostly grandstanding at this time. As memory architecture for the mass market is not considered as intrinsically exciting, I would wager there will be advancements that no one speaks to. For example, Nehalem leapfrogs AMD memory bandwidth by a large margin (like by a factor of 2). It means if Shanghai parts are considered satisfactory today to get respectable yield memory wise to support four cores, Nehalem, by a particular metric, supports 8 equally satisfactorily. The whole picture is a tad more complicated (i.e. latency, numbers I don't know off hand), but the one metric is a highly important one in the supercomputer field.
For all the worry over memory bandwidth though, it hasn't stopped supercomputer purchasers from buying into Core2 all this time. Despite improvements in their chipset, Intel Core2 still doesn't reach AMD performance. Despite that, people spending money to get into the Top500 still chose to put their money on Core2 in general. Sure, Cray and IBM supercomputers in the Top2 used AMD, but from the time of its release, Core2 has decimated AMD supercomputer market share despite an inferior memory architecture.

Re:It's so obvious... (Score:4, Informative)

by AlXtreme ( 223728 ) writes: on Friday December 05, 2008 @09:44AM (#26001959) Homepage Journal

You mean something like a CPU cache? I assume you know that every core already has a cache (L1) on multi-core [wikipedia.org] systems, and shares a larger cache (L2) between all cores.
The problem is that on/near-core memory is damn expensive, and your average supercomputing task requires significant amounts of memory. When the bottleneck for high performance computing becomes memory bandwidth instead of interconnect/network bandwidth you have something a lot harder to optimize, so I can understand where the complaint in IEEE comes from.
Perhaps this will lead to CPUs with large L1 caches specifically for supercomputing tasks, who knows...

Re:Well doh (Score:3, Informative)

by bwcbwc ( 601780 ) writes: on Friday December 05, 2008 @10:00AM (#26002105)

Actually that is part of the problem. Most of the architectures have core-specific L1 cache, and unless a particular thread has its affinity mapped to a particular core, a thread can jump from a core where its data is in the L1 cache to a core where its data is not present, and is forced to undergo a cache refresh from memory.
Also, regardless of whether a system is multi-processing within a chip (multi-core) or on a board (multi-CPU), the number of communication channels required to avoid communication bottlenecks goes up as O(n^2) the number of cores.
So yes, we are probably seeing the beginning of the end of performance gains using general-purpose CPU interconnects and have to go back to vector processing. Unless we are somehow able to jump the heat dissipation barrier and start raising GHz again.

Re:Time for vector processing again (Score:3, Informative)

by yttrstein ( 891553 ) writes: on Friday December 05, 2008 @10:04AM (#26002141) Homepage

We still have different processors for desktops and supercomputers.

http://www.cray.com/products/XMT.aspx

Rest assured, there are still people who know how to build them. They're just not quite as popular as they used to be, now that a middle manager who has no idea what the hell they're talking about can go to an upper manager with a spec sheet that's got 8 thousand processors on it and say "look! This ones got a whole ton more processors than that dumb Cray thing!"

Re:Time for vector processing again (Score:5, Informative)

by TapeCutter ( 624760 ) writes: on Friday December 05, 2008 @10:04AM (#26002145) Journal

"Multi-Core technology is good for desktop systems as it is meant to run a lot of relatively small apps Rarely taking advantage of more then 1 or 2 cores. per app.In other-words it allows Multi-Tasking without a penalty. We don't use super computers that way. We use them to to perform 1 app that takes huge resources that would take hours or years on your PC and spit out results in seconds or days."

Sorry but that's not entirely correct, most super computers work on highly parallel problems [earthsimulator.org.uk] using numerical analysis [wikipedia.org] techniques. By definition the problem is broken up into millions of smaller problems [bluebrain.epfl.ch] that make ideal "small apps", a common consequence is that the bandwidth of the communications between the 'small apps' becomes the limiting factor.

"Back in the early-mid 90's we had different processors for Desktop and Super Computers."

The earth simulator was refered to in some parts as 'computenick', it's speed jump over it's nearest rival and longevity at the top marked the renaissance [lbl.gov] of "vector processing" after it had been largely ignored during the 90's.

In the end a supercomputer is a purpose built machine, if cores fit the purpose then they will be used.

Re:Time for vector processing again (Score:3, Informative)

by Pharmboy ( 216950 ) writes: on Friday December 05, 2008 @10:54AM (#26002669) Journal

Yes, that is what I want, a super computer designed by an English major...
Please get over yourself. This is slashdot, not something important like a resume or will.

Re:It's so obvious... (Score:3, Informative)

by TheRaven64 ( 641858 ) writes: on Friday December 05, 2008 @11:36AM (#26003119) Journal

More likely is going to something like the Cell's design. Cache is by definition hidden from the programmer, but on-die SRAM doesn't have to be cache, it can be explicitly-managed memory with instructions to bulk fetch from the slower external DRAM. For supercomputer applications, this would probably be more efficient, and lets you get rid of all of the cache coherency logic and use the space for more ALUs or SRAM.

Re:Time for vector processing again (Score:3, Informative)

by David Greene ( 463 ) writes: on Friday December 05, 2008 @12:45PM (#26003991)

Cray did not stream vectors from memory. One of the advances of the Cray-1 was the use of vector registers as opposed to, for example, the Burroughs machines which streamed vectors directly to/from memory.
We know how to build memory systems that can handle large vectors. Both the Cray X1 and Earth Simulator demonstrate that. The problem is that those memory systems are currently too expensive. We are going to see more and more vector processing in commodity processors.

Re:Time for vector processing again (Score:3, Informative)

by dsanfte ( 443781 ) writes: on Saturday December 06, 2008 @11:18AM (#26013049) Journal

Bad English isn't something you can keep locked out of sight in the back closet of society; it's like a termite infestation. Allow it a foothold and it'll spread everywhere. It's a higher-entropy state.
There's a world difference between someone who's just writing casually (and goofing up), and someone else who is completely unable to grasp the tenets of grammar. The former are perfectly capable of writing well on a resume, as you say; the latter are functionally illiterate, and they should be told when their English isn't good enough to participate in a discussion. There's nothing wrong with that; it gives them the opportunity to improve.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

IEEE Says Multicore is Bad News For Supercomputers 251

IEEE Says Multicore is Bad News For Supercomputers More Login

IEEE Says Multicore is Bad News For Supercomputers

Re:Well doh (Score:3, Informative)

Re:Time for vector processing again (Score:4, Informative)

Re:Well doh (Score:2, Informative)

The problem allegedly being.. (Score:5, Informative)

Re:It's so obvious... (Score:4, Informative)

Re:Well doh (Score:3, Informative)

Re:Time for vector processing again (Score:3, Informative)

Re:Time for vector processing again (Score:5, Informative)

Re:Time for vector processing again (Score:3, Informative)

Re:It's so obvious... (Score:3, Informative)

Re:Time for vector processing again (Score:3, Informative)

Re:Time for vector processing again (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot