IEEE Says Multicore is Bad News For Supercomputers

IEEE Says Multicore is Bad News For Supercomputers 251

Posted by timothy on Friday December 05, 2008 @08:04AM from the unexpected-downsides dept.

Richard Kelleher writes "It seems the current design of multi-core processors is not good for the design of supercomputers. According to IEEE: 'Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16-core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores.'"

IEEE Says Multicore is Bad News For Supercomputers

This discussion has been archived. No new comments can be posted.

Search 251 Comments Log In/Create an Account

Comments Filter:

Time for vector processing again (Score:5, Insightful)

by suso ( 153703 ) * writes: on Friday December 05, 2008 @08:06AM (#26001307) Journal

Sounds like its time for supercomputers to go their own way again. I'd love to see some new technologies.

Well doh (Score:5, Insightful)

by Kjella ( 173770 ) writes: on Friday December 05, 2008 @08:17AM (#26001367) Homepage

If you make a simulation like that keeping the memory interface constant then of course you'll see diminishing returns. That's why we're still not running plain old FSBs as AMD has HyperTransport, Intel has QPI, the AMD Horus system expands it up to 32 sockets / 128 cores and I'm sure something similar can and will be built as a supercomputer backplane. The header is more than a little sensationalist...

Re:Well doh (Score:5, Insightful)

by cheater512 ( 783349 ) writes: <nick@nickstallman.net> on Friday December 05, 2008 @08:34AM (#26001449) Homepage

There are limits however to what you can do.
Its not like multi-processor systems where each cpu gets its own ram.

Re:Well doh (Score:1, Insightful)

by Anonymous Coward writes: on Friday December 05, 2008 @08:53AM (#26001557)

There are limits however to what you can do.
Its not like multi-processor systems where each cpu gets its own ram.
I like where your brain is. What if they reworked the board layouts so that each proc had its own bank, or two, or ram. 4 cpu's with 2gig each. I sense amazing possiblities here. Dual north/south bridges. I mean you write a smart enough CPU set you could do amazing things for the future.

So what does it mean for PCs? (Score:4, Insightful)

by theaveng ( 1243528 ) writes: on Friday December 05, 2008 @08:53AM (#26001559)

>>>"After about 8 cores, there's no improvement," says James Peery, director of computation, computers, information, and mathematics at Sandia. "At 16 cores, it looks like 2 cores."
>>>
That's interesting but how does it affect us, the users of "personal computers"? Can we extrapolate that buying a CPU larger than 8 cores is a waste of dollars, because it will actually run slower?

Memory (Score:5, Insightful)

by Detritus ( 11846 ) writes: on Friday December 05, 2008 @09:02AM (#26001639) Homepage

I once heard someone define a supercomputer as a $10 million memory system with a CPU thrown in for free. One of the interesting CPU benchmarks is to see how much data it can move when the cache is blown out.

Multiple CPUs? (Score:5, Insightful)

by Dan East ( 318230 ) writes: on Friday December 05, 2008 @09:03AM (#26001649) Journal

This doesn't quite make sense to me. You wouldn't replace a 64 CPU supercomputer with a single 64 core CPU, but would instead use 64 multicore CPUs. As production switches to multicore, the cost of producing multiple cores will be about the same as the single core CPUs of old. So eventually you'll get 4 cores from the price of 2, then get 8 cores from the price of 4, then 16 for the price of 8, etc. So the extra cores in the CPUs of a supercomputer are like a bonus, and if software can be written to utilize those extra cores in some way that benefits performance, then that's a good thing.

Re:Time for vector processing again (Score:5, Insightful)

by virtual_mps ( 62997 ) writes: on Friday December 05, 2008 @09:07AM (#26001679)

It's very simple. Intel & AMD spend about $6bn/year on R&D. The total supercomputing market is on the order of $35bn (out of a global IT market on the order of $1000bn) and a big chunk of that is spent on storage, people, software, etc., rather than processors. That market simply isn't large enough to support an R&D effort which will consistently outperform commodity hardware at a price people are willing to pay. Even if a company spent a huge amount of money developing a breakthrough architecture which dramatically outperformed existing hardware, the odds are that the commodity processors would catch up before that innovator recouped its development costs. Certainly they'd catch up before everyone rewrote their software to take advantage of the new architecture. The days when Seymour Cray could design a product which was cutting edge & saleable for a decade are long gone.

Re:Time for vector processing again (Score:5, Insightful)

by AlpineR ( 32307 ) writes: <wagnerr@umich.edu> on Friday December 05, 2008 @09:18AM (#26001759) Homepage

My supercomputing tasks are computation-limited. Multicores are great because each core shares memory and they save me the overhead of porting my simulations to distributed memory multiprocessor setups. I think a better summary of the study is:
Faster computation doesn't help communication-limited tasks. Faster communication doesn't help computation-limited tasks.

Re:Time for vector processing again (Score:4, Insightful)

by timeOday ( 582209 ) writes: on Friday December 05, 2008 @09:44AM (#26001955)

IMHO this study is not an indictment against the use of today's multi-core processors for supercomputers or anything else. They're simply pointing out that in the future (as cores continue to grow exponentially) some memory bandwidth advances will be needed. The implication that today's multi-core processors are best suited for games is silly - where they're really well utilized is in servers, and they work very well. The move towards commodity processors in supercomputing wasn't some kind of accident, it occurred because that's what currently gets the best results. I'd expect a renaisance in true supercomputing just as soon as it's justified, but I wouldn't hold my breath.

Re:Time for vector processing again (Score:2, Insightful)

by LingNoi ( 1066278 ) writes: on Friday December 05, 2008 @10:14AM (#26002255)

This is slashdot, our professions are computer related not literature based. You're on the wrong website.

Unganged channels = already non shared lanes today (Score:5, Insightful)

by DrYak ( 748999 ) writes: on Friday December 05, 2008 @10:24AM (#26002355) Homepage

The issue is with a single processor that has multiple cores.
There's no real way to split the banks for each core, so the net effect is that you have 4-32 cores sharing the same lanes for memory.
No, sorry. That's how Phenom processor are *Already* working.
Each physical CPU package has two 64-bit memory controllers, each controlling a separate bank of 64bits DDR-2 memory chips. (Each of the two bank in a dual channel mother board).
Phenom have two mode of function :
- Ganged : both memory controllers work in parallel, working as if they were a huge 128bits memory connection. That's how dual channel has worked since it was invented.
That's good for system running few very bandwidth-hungry applications (for example : benchmarks)
- Unganged : each memory controller work on its own. Thus you have two completely separate 64bits memory channel accessible at the same time. By correctly lying the applications in memory thanks to a NUMA-aware OS (anything better than Windows Vista), that means that two separate applications can simultaneously access each one's memory at the exact same moment, although at only half the bandwith *per process* (but still the same total of bandwidth for all processes running at the same time on a multi core chip).
This is perfect for systems running lots of tasks in parallel, and is the default mode on most BIOSes I've seen.
This gives a tremendous boost to heavily multi-tasked applications (a busy database server, for example), and it's what TFA's author are looking for.
Probably that at some point in the future, Intel will follow the same trend with its QPI processors.
Also, the future trend is to multiply the memory channels on the CPU: Intel has already planned Triple Channel DDR-3 for their high-end server Xeons (the first crop of QPI chips). AMD has announced 4 memory channels for their future 6- and 12- core chips targeting the G34 socket.
So the net effect of Unganged Dual Channel is that today you already have 4 cores having a choice of 2 sets of memory lanes to choose among, and within 1 year, you'll have 6-to-12 cores sharing 4 sets of memory lanes.
By the time you reach 32 cores on CPU, probably that almost each slot will have its own dedicated memory channel (probably with the help of some technology which communicates serially with fewer lines, like FB-DIMM). Or even weirder memory interfaces (who knows ? maybe DDR-6 will be able to give several simultaneous access to the same memory module).
So, well, once again, it proves that running stupid simulations without taking into account that other technologies will improves beside the number of cores* yields stupid non realistic results.
Shame on TFA's Author, because the trends to increase bandwith have already started. I little bit more background research would have avoided this kind of stupidity.
But on the other hand, they would have missed the opportunity to publish an alarmist article with an eye catching title.
--
*: Although, yes, the number of cores you can slap inside the same package seems to be the "new megahertz" in the manufacturers' race, with some like Intel trying to increase this number faster without putting so much efforts on the rest.

Re:Time for vector processing again (Score:5, Insightful)

by knails ( 915340 ) writes: <knailstheman@gmail.com> on Friday December 05, 2008 @10:34AM (#26002453)

No, proper spelling and grammar are important for everyone, not just english majors. With computers so important, if the computer professionals cannot use the language correctly, then who will? We cannot let ignorant people degrade the quality of language and therefore remove beauty and subtle distinctions between similar words just because they're too lazy to conform to standards. If a linguist misused/ignored computing standards, would you not correct them, even though it's not their chosen field of study?

Re:Time for vector processing again (Score:5, Insightful)

by postbigbang ( 761081 ) writes: on Friday December 05, 2008 @10:37AM (#26002471)

Look are deceptive.
The problem with multicores relates to the fact that the cores are processors, but the relationship to other cores and to memory aren't fully 'cross-bar'. Sun did a multi-CPU architecture that's truly crossbar (meaning that there are no dirty cache problems and semaphor latencies) among the processors, but the machine was more of a technical achievement than a decent workhorse to use in day to day stuff.
Still, cores are cores. More cores aren't better necessarily until you fix what they describe. And it doesn't matter what they look like at all. Like any other system, it's what's under the hood that count. Esoteric-looking shells are there for marketing purposes and cost-justification.

Re:It's so obvious... (Score:3, Insightful)

by Funk_dat69 ( 215898 ) writes: on Friday December 05, 2008 @10:58AM (#26002693)

Ofcourse things are different with supercomputers. If you have a 1000 'processing units', where each PU would consist of say, 32 cores and some GB's RAM on a single die, that would create a memory wall between 'local' and 'remote' memory. The on-die section of main memory would be accessible at near CPU speed, main memory that is part of other PU's would be 'remote', and slow. Hey wait, sounds like a compute cluster of some kind... (so scientists already know how to deal with it).
It also sounds like you are described the Cell Processor setup. Each SPU has local memory on-die - but cannot do operations on main memory(remote). Each SPU also has a DMA engine that will grab data from main memory and bring it into its local store. The good thing is you can overlap the DMA transfer and the computation so the SPUs are constantly burning through computation.
This does help against the memory wall. And is a big reason why Roadrunner is so damn fast.

Re:Time for vector processing again (Score:3, Insightful)

by ipoverscsi ( 523760 ) writes: on Friday December 05, 2008 @10:58AM (#26002699)

Faster computation doesn't help communication-limited tasks. Faster communication doesn't help computation-limited tasks.
Computation is communication. It's communication between the CPU and memory.
The problem with multicore is that, as you add more cores, the increased bus contention causes the cores to stall making so they cannot compute. This is why many real supercomputers have memory local to each CPU. Cache memory can help, but just adding more cache per core yields diminishing returns. SMP will only get you so far in the supercomputer world. You have to go NUMA for performance, which means custom code and algorithms.

Re:Time for vector processing again (Score:2, Insightful)

by Dishevel ( 1105119 ) writes: on Friday December 05, 2008 @11:07AM (#26002795)

I do not study literature. I do not like those that do. Come on though. Knowing the difference between adding an "s" in a plural or possessive situation is truly basic. If you want to sound like a complete idiot then don't mangle true English. Just speak Ebonics.

Re:Time for vector processing again (Score:3, Insightful)

by knails ( 915340 ) writes: <knailstheman@gmail.com> on Friday December 05, 2008 @11:22AM (#26002957)

Who said anything about a supercomputer?

Language is a tool, and everyone who uses the tool needs to use it properly. HTML is a tool, and there are proper use standards for it. Some, however, choose not to use those standards, and it only makes a mess for everyone else who do use them. If you're going to use a tool, you need to learn to use it correctly; language is no exception.

Well, duh.... (Score:3, Insightful)

by SpinyNorman ( 33776 ) writes: on Friday December 05, 2008 @11:28AM (#26003011)

It's hardly any secret that CPU speed, even for single core processors, has been running ahead of memory bandwidth gains for years - that's why we have cache, and ever increasing amounts of it. It's also hardly any relevation to realize that if you're sharing your memory bandwidth between multiple cores then the bandwidth available per core is less than if you weren't sharing. Obviously you need to keep the amount of cache per core and the number of cores per machine (or, more precisely, per unit of memory sybsystem bandwidth) within reasonable bounds to keep it usable for general purpose aplications, else you'll end up in GPU-CPU (e.g. CUDA) territory where you're totally memory constrained and applicability is much less universal.
For cluster-based ("supercomputer") applications, partitioning between nodes is always going to be an issue in optimizing performance for a given architecture, and available memory bandwidth per node and per core is obviously a part of that equation. Moreover, even if CPU designers do add more cores per processor than is useful for some applications, no-one is forcing you to use them. The cost per CPU is going to remain approximately fixed, so extra cores per CPU essentially come for free. A library like pthreads, and different implementations of it (coroutine vs LWP based), gives you the flexibility over the mapping of threads to cores, and your overall across-node application partitioning gives you control over how much memory bandwidth per node you need.

Re:Time for vector processing again (Score:3, Insightful)

by Zebra_X ( 13249 ) writes: on Friday December 05, 2008 @11:35AM (#26003105)

Yeah, if you buy Intel chips. Despite the fact that they are slower clock for clock than the new intel chips, amd's architecture was and is the way to go, which is of course why Intel has copied it (i7). If you properly architect the chips to contain all of the "proper" plumbing, then this becomes less of a problem. Unfortuantely Intel has for the past few years simply cobbled together "cores" that are nothing more than processors that are linked via a partially adequite bus. So when contention goes up they don't perform as well. Most users don't ever consistentnly utilize their cpu at 80% so this hasn't really been a problem for the market at large. This is why amd's solutions have scaled further and for less. As a result companies like Cray have been utilizing opteron chips for their newest super computers.

Re:Kill all engineering then! (Score:4, Insightful)

by Shamenaught ( 1341295 ) writes: on Friday December 05, 2008 @12:25PM (#26003749)

The phrase "By logical extension" is just another way of saying "This is a straw man argument"
I believe that the point he was making was not that it's pointless to go beyond X86 hardware, but that it's more cost-effective to use consumer hardware. Consumer hardware is not necessarily X86 hardware. See IBM's Roadrunner, presently the fastest supercomputer in the world, which uses an advanced version of the PS3's processor (the PowerXCell 8i).
In time, we'll probably see demand in consumer hardware for breaking past the boundaries and bottlenecks of multi-core processing, and so supercomputers will follow.

There are still vector processors out there. (Score:3, Insightful)

by flaming-opus ( 8186 ) writes: on Friday December 05, 2008 @12:47PM (#26004031)

NEc still makes the SX9 vector system, and cray still sells X2 blades that can be installed into their xt5 super. So vector processors are available, they just aren't very popular, mostly due to cost/flop.
A vector processor implements an instruction set that is slightly better than a scalar processor at doing math, considerably worse than a scalar processor at branch-heavy code, but orders of magnitude better in terms of memory bandwidth. The X2, for example, has 4 25gflop cores per node, which share 64 channels of DDR2 memory. Compare that to the newest xeons where 6 12 gflop processors share 3 channels of DDR3 memory. While the vector instruction set is well suited to using this memory bandwidth, a massively multi-core scalar processor could also make use of a 64-channel memory controller.
The problem is about money. These multicore processors are coming from the server industry. web-hosting, database-serving, and middleware crunching jobs tend to be very cache-friendly. Occasionally they benefit from more bandwidth to real memory, but usually they just want a larger L3 cache. Cache is much less useful to supercomputing tasks, which have really large data-sets. The server-processor makers aren't going to add a 64-channel memory controller to server processors; it wouldn't do any good for their primary market, and it would cost a lot.
Of course, you could just buy real vector processors, right? Not exactly. Many supercomputing tasks work acceptably on quad-core processors with 2 memory channels. It's not ideal, but they get along. This has put a lot of negative market pressure on the vector machines, and they are dying away again. It's not clear if cray will make a successor to the X2, and NEC has priced itself into a tiny niche market in weather forcasting, that is unapproachable by other supercomputer users, for price reasons.

Re:Time for vector processing again (Score:1, Insightful)

by Anonymous Coward writes: on Friday December 05, 2008 @01:06PM (#26004319)

We can invest our time writing our posts to an exacting standard or we can get our points across and move on with our lives. We're not writing professionally. We're having a conversation. Do you know what "lot's" means? Of course. It means "lots" and someone made a typo. Cut people a little slack.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

IEEE Says Multicore is Bad News For Supercomputers 251

IEEE Says Multicore is Bad News For Supercomputers More Login

IEEE Says Multicore is Bad News For Supercomputers

Time for vector processing again (Score:5, Insightful)

Well doh (Score:5, Insightful)

Re:Well doh (Score:5, Insightful)

Re:Well doh (Score:1, Insightful)

So what does it mean for PCs? (Score:4, Insightful)

Memory (Score:5, Insightful)

Multiple CPUs? (Score:5, Insightful)

Re:Time for vector processing again (Score:5, Insightful)

Re:Time for vector processing again (Score:5, Insightful)

Re:Time for vector processing again (Score:4, Insightful)

Re:Time for vector processing again (Score:2, Insightful)

Unganged channels = already non shared lanes today (Score:5, Insightful)

Re:Time for vector processing again (Score:5, Insightful)

Re:Time for vector processing again (Score:5, Insightful)

Re:It's so obvious... (Score:3, Insightful)

Re:Time for vector processing again (Score:3, Insightful)

Re:Time for vector processing again (Score:2, Insightful)

Re:Time for vector processing again (Score:3, Insightful)

Well, duh.... (Score:3, Insightful)

Re:Time for vector processing again (Score:3, Insightful)

Re:Kill all engineering then! (Score:4, Insightful)

There are still vector processors out there. (Score:3, Insightful)

Re:Time for vector processing again (Score:1, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot