Scaling To a Million Cores and Beyond

Scaling To a Million Cores and Beyond 206

Posted by kdawson on Wednesday June 30, 2010 @02:13AM from the can't-get-there-from-here dept.

mattaw writes "In my blog post I describe a system designed to test a route to the potential future of computing. What do we do when we have computers with 1 million cores? What about a billion? How about 100 billion? None of our current programming models or computer architecture models apply to machines of this complexity (and with their corresponding component failure rate and other scaling issues). The current model of coherent memory/identical time/everything can route to everywhere; it just can't scale to machines of this size. So the scientists at the University of Manchester (including Steve Furber, one of the ARM founders) and the University of Southampton turned to the brain for a new model. Our brains just don't work like any computers we currently make. Our brains have a lot more than 1 million processing elements (more like the 100 billion), all of which don't have any precise idea of time (vague ordering of events maybe) nor a shared memory; and not everything routes to everything else. But anyone who argues the brain isn't a pretty spiffy processing system ends up looking pretty silly. In effect, modern computing bears as much relation to biological computing as the ordered world of sudoku does to the statistical chaos of quantum mechanics.

Scaling To a Million Cores and Beyond

This discussion has been archived. No new comments can be posted.

Search 206 Comments Log In/Create an Account

Comments Filter:

Re:multi core design (Score:5, Interesting)

by jd ( 1658 ) writes: <[moc.oohay] [ta] [kapimi]> on Wednesday June 30, 2010 @02:30AM (#32741090) Homepage Journal

You cannot parallelize a serial task, any more than you can have 60 people dig one posthole in one second. On the other hand, there are MANY tasks that are inherently parallel but which are serialized because either the programmers aren't up to the task, the OS isn't up to the task or the CPU isn't up to the task.
(I don't know if kernel threads under Linux will be divided between CPUs in an SMP system, they certainly can't migrate across motherboards in any MOSIX-type project. That limits how parallel the bottlenecks in the program can ever become. And it's one of the best OS' out there.)

Re:Better be running OSS (Score:4, Interesting)

by jd ( 1658 ) writes: <[moc.oohay] [ta] [kapimi]> on Wednesday June 30, 2010 @02:38AM (#32741126) Homepage Journal

I don't know about this specific project, but Manchester is strongly Open Source. The Manchester Computer Centre developed one of the first Linux distributions (and - at the time - one of the best). The Advanced Processor Technologies group has open-sourced software for developing asynchronous microelectronics and FPGA design software.
Manchester University is highly regarded for pioneering work (they were working on parallel systems in 1971, and developed the first stored-program computer in 1948) and they have never been ashamed to share what they know and do. (Disclaimer: I studied at and worked at UMIST, which was bought by Manchester, and my late father was a senior lecturer/reader of Chemistry at Manchester. I also maintain Freshmeat pages for the BALSA projects at APT.)

green array chips - 144 cores per die (Score:1, Interesting)

by Anonymous Coward writes: on Wednesday June 30, 2010 @02:45AM (#32741152)

tiny Forth-based computers with up to 144 cores on a chip, and that's in a low tech 180 nanometer process. Each core has a rather fast ALU but just a few hundred(?) bytes of memory. Seems closer to neurons than the thing that guy is making where each core is a 32-bit ARM processor.
link [greenarraychips.com].

Link with IMAC ExaScience lab? (Score:2, Interesting)

by Hedon ( 192607 ) writes: on Wednesday June 30, 2010 @02:51AM (#32741184)

Is it coincidence that earlier this month there was a press release from IMEC regarding the issues of massively scaling up computational power ("exascaling")?
Press blurb can be found here [www2.imec.be].
Killer application would be "space weather prediction".

Last time I run a parallel program... (Score:3, Interesting)

by ctrl-alt-canc ( 977108 ) writes: on Wednesday June 30, 2010 @03:03AM (#32741248)

...it seemed to me that Amdahl's law [ameslab.gov] was still alive and kicking.

Dangerous idea (Score:1, Interesting)

by Anonymous Coward writes: on Wednesday June 30, 2010 @03:04AM (#32741258)

To make any computer mimic the design and function of the human brain would invite evolution and sentience. We want tools, not sentient machines.

Re:multi core design (Score:3, Interesting)

by Your.Master ( 1088569 ) writes: on Wednesday June 30, 2010 @03:13AM (#32741294)

Actually, you can to some extent serialize a parallel task, with sufficiently many cores.
For instance, you could just guess at all the intermediate results of halfway through a long sequence of operations, and execute from there, but discard the information if it's wrong. With lots of cores and a good chokepoint, you might be able to gain a 2x speedup a significant percent of the time (for a lower average speedup). 2x, that is, from billions of cores.
Kind of like branch prediction, or a dynamically generated giant lookup table.
It just isn't a very efficient speedup, at all, compared to the gains of even modestly parallelizable tasks.

Yep, its not like computers (Score:2, Interesting)

by Anonymous Coward writes: on Wednesday June 30, 2010 @03:22AM (#32741342)

The brain isn't like computers at all. The brain is compartmentalized. There are dozens of separate pieces each with its specialty. Its wired to other pieces in specific ways. There is no "Total Information Awareness"(tm) bullshit going on (what 1 million cores would give you). The problem with TIA is that there is too much crap to wade through. Too big a haystack to find the needle you need. What they found when analyzing Berger-Liaw speech recognition systems against other systems is that the Berger-Liaw system kept temporal (time-based) subtleties, in contrast to other speech recognition systems that were simply digital (with the clock/oscillator sampling in a Nyquist format, destroying or failing to capture temporal information). The Berger-Liaw system can best the best human listeners (which is why the US navy got it instead of it becoming an available commercial product). It could act as a 'sonic input device' using only a tiny neural network (20 to 30 nodes) for superhuman input, instead of the digital ones, giving crappy results with 2048 or 4096 nodes. The brain is wired with a lot of 'specialty components' which use a spare number of components to get the job done. Some of the excess appears to be redundant (although I am not a neural-scientist and could be wrong).

Re:multi core design (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday June 30, 2010 @03:25AM (#32741350)

If one is willing to use the transistors of a billion core machines to speed up a single problem anyway, some transistors could be better used to accelerate the specific problem in the form of custom circuits. There could be a similar layered structure in the system as the brain uses. The lowest, fastest and task customized levels could be build using hard-wired logic, the layer above it using slowly reconfigurable circuits with very fast switching speeds, the layers above easily reconfigurable circuits with slower switching speeds and the highest level using the normal general purpose logic. A higher level could train and use the services of a lower level just like the brains might do in a case of a phobia, trauma or some psychosomatic condition. New sub fields of computer science and computer engineering of a computer psychologist and a computer psychiatrist might be created out of necessity..

Where to start? software (Score:3, Interesting)

by CBravo ( 35450 ) writes: on Wednesday June 30, 2010 @03:32AM (#32741372)

My opinion is that you should not require software to be parallelized from the start. You parallelize it during runtime or at compile time.
This makes sense because parallelization does not add anything in functionality (the outcome should not change). My point is: program functionality and configure/compile parallelization afterwards (possibly by power-users). There could be a unique selling point for open source: parallel performance because you can recompile.

The Internet (Score:5, Interesting)

by pmontra ( 738736 ) writes: on Wednesday June 30, 2010 @03:37AM (#32741400) Homepage

The Internet is at least in the 1 billion cores range. The way to use many of them for a parallel computation has been demonstrated by Seti@home, Folding@home and even by botnets. They might not be the most efficient implementations when you have full control of the cores but they show the way to go when the availability of the cores and the communication between them is unreliable, when they have different times and different clocks and when they might be preempted to do different tasks.

Re:multi core design (Score:5, Interesting)

by William Robinson ( 875390 ) writes: on Wednesday June 30, 2010 @04:45AM (#32741760)

Not sure where MISD is used
Back in 1987, when I was part of team that was designing parallel processing machine, with 4 neighboring CPUs sharing common memory (apart from their own local memory, kind of systolic array), we were designing machine suitable to simulate aerodynamics or weather forecasting using diffusion equations. We believed that it was working on MISD model, where different algorithms running in different CPUs utilized same data for analysis, using bus arbitration logic.

Re:Reminds me of Hillis (Score:5, Interesting)

by pieterh ( 196118 ) writes: on Wednesday June 30, 2010 @05:19AM (#32741880) Homepage

You don't even need Erland, you can use a lightweight message-passing library like ZeroMQ [zeromq.org] that lets you build fast concurrent applications in 20 or so languages. It looks like sockets but implements Actors that connect in various patterns (pubsub, request-reply, butterfly), and works with Ruby, Python, C, C++, Java, Ada, C++, CLisp, Go, Haskell, Perl, and even Erlang. You can even mix components in any language.
You get concurrent apps with no shared state, no shared clock, and components that can come and go at any time, and communicate only by sending each other messages.
In hardware terms it lets you run one thread per core, at full efficiency, with no wait states. In software terms it lets you build at any scale, even to the scale of the human brain, which is basically a message-passing concurrent architecture.

Re:Human brain != computer (Score:2, Interesting)

by Xest ( 935314 ) writes: on Wednesday June 30, 2010 @05:21AM (#32741904)

I was going to post about this, but you've already covered part of it, so I'll reply instead.
I take issue with this statement from the summary:
"But anyone who argues the brain isn't a pretty spiffy processing system ends up looking pretty silly."
No, they don't look silly, they look smarter and more knowledgeable about the topic than the idiot who made this comment. The human brain has multiple flaws, and whilst it's excellent for some things, it's terrible for others. The human brain relies on emergence, which is great for solving some problems but not so for others. To give an example, the human brain is great for picking out an object in a scene, but hopeless for calculating with a reasonable degree of accuracy the actual distance to that object- the margin of error for most people is on average going to be quite large.
But also the human brain is prone to mistakes, minor changes to it's working can make it come up with odd results- notice how sometimes something you know inside out you just happen to get wrong? Look at how sometimes people are mid-conversation, talking about something they know in depth and suddenly they forget what they were going to say- this is because processing in the brain has gone completely off track.
The fundamental problem with copying the brain is that it's far from perfect, it doesn't give us the level of preciseness that classic computing generally does. It doesn't guarantee there wont be fluctuations in a task every single time it performs that task.
A brain like computer would certainly be useful, it could help solve some interesting problems, but it's most definitely not perfect, and it's most definitely not the ultimate solution to our computing problems- there are plenty of scenarios where it would be completely useless. This is after all, why we bother to have computers in the first place.
I do not see the brain as any more spiffy than computing technology, sure its amazingly powerful, but there's a lot that current computers can do that the brain can't- serious large scale number crunching for example. Could we ever trust a brain like model to crunch numbers when we know full well the brain could make a mistake giving erroneus results on a few of those numbers where the computer would not?

Re:Problems with this blog. (Score:1, Interesting)

by WheelDweller ( 108946 ) writes: <`moc.liamg' `ta' `rellewDleehW'> on Wednesday June 30, 2010 @05:31AM (#32741966)

I've felt the same way; just _because_ you can create 1,000,000 CPUs connected to a single bus doesn't mean it's the right thing to do. We're still limited to physics, and if they're x86 cores, we're still dealing with all that "address bus is multiplexed" crap that was a problem in 1985. (!)
I have the feeling that, if we're so enamored with multiple cores, they need to be smaller, simpler, and able to communicate amongst each other. This is how (in the real world) multiple CPUs actually map like the human brain.
Just how much memory and refresh cycles ARE there in synapse? (See the point?)
If we can arrange types of CPUs, give them the ability to communicate with every other node, put that 1,000,000 into a net with neural-net style layouts, and THEN we're talking.
Consider the Texas Instruments 1-bit CPU. Only 16 instructions and something like 7 of them are no-ops! All programs take the same amount of time to execute. You can make BANKS of these things, to create 2048-bit computers if you like. Of course, it helps to do this at the core-level, not the 8-pin DIP level. :)
Unfortunately for the traditional integrators, THIS is how we're going to the next level of operation. The PC has little in common with brains. It's a metaphor.

Re:Dangerous idea (Score:2, Interesting)

by perryizgr8 ( 1370173 ) writes: on Wednesday June 30, 2010 @05:43AM (#32742018)

on the contrary, i can't wait for sentient machines. at last we will be free of shoddy human programming. no vendor lockins and other such stuff. just tell your computer to write a specific program you desire for a specific purpose, and he will write it for you.

Re:multi core design (Score:3, Interesting)

by smallfries ( 601545 ) writes: on Wednesday June 30, 2010 @05:50AM (#32742052) Homepage

While turning intrinsically serial problems into a parallel form would certainly open up a new field it is doubtful that it would be a "a whole new field of mathematics" or did you just like the sound of your hyperbole?
On a slightly different note; every time there is any article about parallel architecture on slashdot someone raises the problem of inherently serial tasks. Can you name any? Or more to the point can you name an inherently serial task on a realistically sized data-set that can't be tackled by a single 2Ghz core?
It would seem that we have scaled single-core performance to the point that we don't care about serial tasks any more. All of the interesting performance hogs that we do care about can be parallelised.

Re:Where to start? software (Score:1, Interesting)

by Anonymous Coward writes: on Wednesday June 30, 2010 @06:45AM (#32742302)

It certainly is the easiest for the lazy programmer, but certainly not the best way.
Lazy programmer is fantastically negative. Remember the software industry has a massive amount of legacy sequential code and a massive number of people who are skilled here, rewriting all this is another language is difficult. Not necessarily lazy, not correctly skilled.
It will take years for training to catch up, especially with languages introducing their own specific set of parallel implementation, e.g. Java threads are awful, it will take Universities and Colleges years to figure out teaching Java Threading is a bad idea as general parallel concepts are not introduced.
The smart person should be looking into language and compiler tool design, since a lot of people will be needing better tools and compilers. The solution is not attempting to force everyone to work the new way overnight, that MIGHT come with time. Unfortunately programmers are stubborn and will stick religiously to their old ways.
I predict computing platforms becoming massively more diverse, I think we will be shifting towards numerous computing platforms, something the Linux community might be able to do well, the underlying architectures have to change, x86 has to die soon and with the right technology it is a case of letting tools crunch away at implementing this new parallel algorithm.

Re:Human brain != computer (Score:3, Interesting)

by Xest ( 935314 ) writes: on Wednesday June 30, 2010 @07:59AM (#32742636)

"I disagree. How can we learn to throw a basketball into a tiny hoop from far away without having very accurate estimates?"
That was precisely my point, they're still estimates. The human brain can judge based on experience how to throw the ball to get it through a hoop, but can it calculate the distance well enough, and consistently enough to calculated the angle from the feet of the thrower upto the net to perform some action such as a precise manufacturing task?
These are two very different things, and are useful for two very different purposes. Being able to throw a ball right some, or even most of the time may be fine for a game of basketball, but is this ability good enough to calculate the values for some complex and precise engineering application? Absolutely not.
"The information is never "lost" it's just unavailable for a time. If it was lost you wouldn't have the "oh yeah" moments when you remember it or look it up again. You recognize it because you already knew it."
Absolutely, it's not lost, the issue is that the brain depends on emergence, and emergent systems can be quite vulnerable to minor variations in the initial conditions. In this case the initial conditions for the moment in question will often be set by the senses, but could also occur as a result of the happenings in the brain at a prior moment, or could be caused by some chemical imbalance (i.e. taking drugs). The problem is that although the information isn't lost as such, it's just hard for the brain to track it down again when it doesn't have the conditions required to get back to that information, again, hence why it can take a while to remember what you were going to say again- it's not easy for the brain to get back to the state required, or a near enough state such that it can get the information it needs. In contrast, it's quite easy in computers, because we have explicit memory addresses and so forth to work with and that can be persisted and referred to- or to put it rather simplistically (and far from perfectly) we don't need to rely on running the execution process again to find the data if we've done things right.
"While I agree the brain isn't as effective at large scale number crunching I do believe it's something the brain can be trained to do. There are plenty of people out there who can do insanely complex arithmetic in their heads. I suspect the reason we all don't have such skills is because we don't need them."
Again, absolutely I agree, but the issue is consistently, you may be able to train it but when it's so vulnerable to minor changes in the way it works, can it do it consistently?
"So those resources in the brain were put to use on other tasks like accurately processing visual and audio data. I can hear or spot a predator very quickly and accurately in all types of environments and lighting conditions. If we use a computer to perform these tasks we realize just how much computation is required."
That's again really the key- the brain is brilliant for some things, so much so that a current style of computer just isn't really fit for the job. It's not so much that a lot of computation is required, it's just that the type of computation we do now is quite different from the type of computation a brain does and this is what TFA is getting at- to solve these sorts of problems we need a different paradigm. What I disagree with though is that we need a different paradigm in general- I don't believe we do because the paradigm being suggested isn't ideal for the things current computers are good at.
As with almost everything, I suspect it's a case of six of one, and half a dozen of the other- there isn't one perfect solution, we ultimately need both solutions for different types of problem, but shouldn't completely write one off at the expense of the other.
To cut a long story short, the fundamental difference is that current computers are largely predictable and formally provable. Brain style computing is chatotic and complex, we know it'll come up with a solution
Read the rest of this comment...

Re:Problems with this blog. (Score:2, Interesting)

by GPSguy ( 62002 ) writes: on Wednesday June 30, 2010 @08:53AM (#32742994) Homepage

In weather forecasting, we find ourselves starting, stopping and waiting. Some of the tiles on which we compute will be trivially simple to complete, while others will not run to something approaching a numerically complete solution for some larger number of iterations. Ttherefore, we have to wait for the slowest computation/solution before we advect results to the surrounding tiles and begin the process all over again.
The nature of parallel problems isn't so simple that you generalize about how synchronization isn't important. I've also got examples of two parallel codes which have to stop, synchronize data and restart, which is very inefficient. The ability to scale the combination of shallow water equations and non-hydrostatic weather codes into a large, embarassingly parallel system would be a Good Thing for earth-system modeling in general, but is practically difficult, both in its core nature and because message passing isn't trivial at some point in a large MPI system. It simply becomes unwieldy and breaks down.
When you're parallizing a Monte Carlo problem, you can achieve significant speedup with little concern for asynchrony.When you submit a problem where each computation requires waiting for all of its surrounding neighbors, asynchrony, using conventional approaches becomes problemmatical.
I suspect that, while this project might not demonstrate 1megacore high performance, that they're on the right path.
If you look at today's multicore processor HPC, you see some (relatively) small number of cores on a single board, communicating at higher speeds than inter-board links, but the interboard links tend to be sorta fast, and tied to an interconnect fabric of some sort. This can be faster or slower depending on the base technology, and little things like bisectional bandwidth, and subscription. If one's not running some form of OpenMP (essentially threaded) environment, which also often equates to SM but doesn't mandate it, one can take significant advantage of the on-board higher speed interconnects. What's off the board doesn't affect your outcome.
On the other hand, if you have to subscribe to DM-parallelism, today's offerings are similar with minor tweaks in performance: MPICH, MPICH2, OpenMPI, etc., all striving to give the best multi-NODE intercommunication one can broadcast, and then gather back in. One of the key drawbacks, however, is that most times, I/O is scatter-gather, or, essentially serial. One, or some small subset of intercommunicating nodes gathers all the results, decides how to parse them, and/or commits them to results.In general, during a big I/O operation, the nodes wait.
In the original author's description, there's a bunch of relatively slow interconnects to allow a new topology of interconnect. A bunch of gigabit doesn't equal a QDR InfiniBand connection, but by not tying the interconnect to a fixed, store and forward interconnect, you just MIGHT gain a bit of efficiency.Somehow, I think 6 GBE interconnects is light, but I'll let better minds than mine optimize that over time.
This may well be a "plain old message passing computer" but it's novel in its reduced dependence on scatter-gather and store-forward technology.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Scaling To a Million Cores and Beyond 206

Scaling To a Million Cores and Beyond More Login

Scaling To a Million Cores and Beyond

Re:multi core design (Score:5, Interesting)

Re:Better be running OSS (Score:4, Interesting)

green array chips - 144 cores per die (Score:1, Interesting)

Link with IMAC ExaScience lab? (Score:2, Interesting)

Last time I run a parallel program... (Score:3, Interesting)

Dangerous idea (Score:1, Interesting)

Re:multi core design (Score:3, Interesting)

Yep, its not like computers (Score:2, Interesting)

Re:multi core design (Score:3, Interesting)

Where to start? software (Score:3, Interesting)

The Internet (Score:5, Interesting)

Re:multi core design (Score:5, Interesting)

Re:Reminds me of Hillis (Score:5, Interesting)

Re:Human brain != computer (Score:2, Interesting)

Re:Problems with this blog. (Score:1, Interesting)

Re:Dangerous idea (Score:2, Interesting)

Re:multi core design (Score:3, Interesting)

Re:Where to start? software (Score:1, Interesting)

Re:Human brain != computer (Score:3, Interesting)

Re:Problems with this blog. (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot