IBM's Chief Architect Says Software is at Dead End 334
j2xs writes "In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures. Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors. Intel just stated in an SD Times article that 100% of its server processors will be multicore by end of 2007. We will never, ever return to single processor computers. Architect Catherine Crawford goes on to discuss some of the ways developers can harness the 'tiny supercomputers' we'll all have soon, and some of the applications we can apply this brute force to."
rendered deadly slow? (Score:4, Informative)
Concurrency in software (Score:5, Informative)
http://www.gotw.ca/publications/concurrency-ddj.h
Enjoy,
Multi-cores vs. internal parallelism (Score:4, Informative)
Re:Clearing things up a bit (Score:2, Informative)
IBM is not the only company releasing multi-core processors. Single core processors will soon go out the same way the 386SX did when 32-bit computing became the norm.
Concurrency is hard. (Score:5, Informative)
It's what made the Amiga look less reliable than its competitors... if you only ran one native program at a time it was a lot more stable than MacOS or MS-DOS, because the OS provided a much richer set of services so applications didn't have to replicate them... but most people took advantage of the multitasking and when something crashed in the background the lack of memory protection meant the whole thing went down, and non-native software that wasn't written with multitasking in mind could produce the most entertaining crashes.
These days we all have good protected mode multitasking operating systems, but we don't have good easy ways to distribute an application across multiple cores. Until we do, most applications are going to be written to run single-threaded and depend on the OS to use the other cores to speed up the rest of the system, both at the application level and doing things like running graphics libraries on another core.
Until we have so many cores that the OS can't make effective use of them I don't think there's even going to be much of an attempt to make use of them for more developers. And then we're going to go through a painful period like we went through before Microsoft discovered multitasking.
Re:Yeah, if you only run one program at a time.. (Score:4, Informative)
While that is true of multi-core general purpose processors like the x86, but I don't think that works too well when talking about the Cell processor. The OS can't just assign a Power-PC compiled app to a SPU and expect it to run. Apps have to be specifically coded to take advantage of the SPU's on the Cell.
Re:Compilers (Score:3, Informative)
Erlang [erlang.org] and Limbo [vitanuova.com] have concurrency primitives built-in. Both used CSP as a launching point. Both give the programming easy-to-use, lightweight processes and message passing. Processes share nothing.
However, neither have built-in support for multiple cores or multiple CPUs at the moment. It's just not a priority for the teams behind them. You can cheat such a setup with Erlang, however, as you can spawn processes on remote machines or remote Erlang instances. If you had two Erlang instances on the same machine, each would run on its own core, so all you'd need to do was spawn a process on each and then message pass between the two.
Re:Clearing things up a bit (Score:3, Informative)
The Cell will live on, but it will create new markets where its inexpensive supercomputing power will open new doors for data analysis and real-time processing..
I just wanted to quote these two sentences you said relevant to my comment. The problem with multicore computing power is not in the software developers but in the available tools. The problem with having say, 60 cores able to run in parallel is that our computation methods (turing based machine computation) are based on the basic "serial algorithm". The main problem resides in the programming languages. As far as I know (although I am *sure* I must be wrong) all the "major" available programming languages are based on this paradigm and they only provide "some functionality" for parallel processing. What we need are compilers and (more importantly and difficult to come up with I think) languages with which
we can create programs that are inherently parallel.
Did the submitter even read the article? (Score:2, Informative)
If you read the article, you will see that Mrs. Crawford does not even come close to saying that "Software is at Dead End". She says software needs to catch up with the hardware.
Computers have more and more processors (and different kinds of processors, like GPUs), and currently most software isn't designed for that kind of environment. IBM has developed some clever ways to program these types of systems in a "general purpose" way.
That's the worst summary of a headline that I've ever read.
Concurrency is... (Score:5, Informative)
programming for multi-core architectures (Score:5, Informative)
However what is more relevant to today's non-supercomputing needs is SMP scalability.
One of the challenges with SMP scalability is cache coherency; synchronizing the caches on the processors is a costly operation (this is necessary to ensure that each processor has the same view of certain memory at the same time), normally (always?) done with a cache invalidation.
So the more invalidations you do, the more often the processor has to fetch memory from main memory, and the less it's using its cache. Processing slows down dramatically.
I've tried to design the qore programming language http://qore.sourceforge.net/ [sourceforge.net] to be scalable on SMP systems. The new version (released today) has some interesting optimizations that have resulting in a large performance boost on SMP machines - the optimizations involve reducing the number of cache invalidations to the minimum (more than just reducing locking, although that is a part of it too - even an atomic update - for example on intel an assembly lock and increment - involves a cache invalidation and therefore is an expensive operation on SMP platforms). There is more work to be done, but in simple benchmarks of affected code paths the performance increase was between 2 and 3 times as fast with the optimizations on the same qore code.
Anyway it would be interesting to know if other high-level programming languages have also taken the same approach (or will do so); as we go forward, it's clear that SMP scalability will be an important topic for the future...
Re:Compilers need to be better. (Score:3, Informative)
Re:Clearing things up a bit (Score:3, Informative)
Take 3D programming as an example. Before I can render the screen, I have to run thousands of vertices through a matrix transformation so that they align with where the camera sees them. This is a bulk operation that can be run in parallel by multiple SIMD cores (each tranforming 2-4 vertices per instruction) by simply providing each core with a copy of the computational matrix. Simple, straightforward, and FAST. But utterly useless for general purpose code like multithreaded web servers.
So the problem is not a lack of tools. The problem is that the Cell is a specialized architecture that's very different from traditional multiprocessing. It's designed for long number-crunching applications that have traditionally been the forte of supercomputers. Ergo, it is a supercomputer on a chip.
Re:Clearing things up a bit (Score:5, Informative)
This is an often-repeated misconception. Cell abandons the practice of having different fp, integer, and vector registers... all registers are 128-bit and any instruction can be issued on any of them, and those instructions are generated by a C++ compiler. So saying that programmers code in these SIMD instructions is like saying that "x86's design counts on programmers to shuffle values between the fp stack, integer and vector registers, and code in separate fp, integer, and vector instructions to get the absolute fastest performance".
The reality is that Cell was targeted more at solving the memory problem than just doing SIMD stream processing. Engineers looked around and decided a 32kb L1 cache was silly... not having a cache-snooping DMA engine (or prefetch engine) would be silly. Putting nine cores on a bus with 7 GB/s bandwidth would be silly. Not being able to overlap memory latency with execution is silly. To solve all these problems, you give up having a single coherent address space.
But there is even more power in Java, .NET investments now... It is completely within the realm of possibility to write a runtime that executes your Java thread on SPU, or JITs the .NET to SPU code. It's a nice benefit that these are already handle-based rather than pointer based languages, so the memory-mapping is a task of the runtime and transparent to the code. And IBM is working hard on native C++ code generation that is agnostic of the address space problem.
Re:Concurrency is hard. (Score:3, Informative)
As the number of cores increases different algorithmic approaches will need to be pursued to get the maximum performance. Many algorithms which are great for serial processors will perform poorly on a parallel architecture.
I think many people don't realize just yet how big of a paradigm shift multi-core really represents. Think of all the billions of lines of legacy code that exists out which was written for sequential computing. Scalability of code is also important since code written today is tomorrow's legacy code code written that isn't scalable will eventually need to be revisited.
Multi-core will probably also require a new look at memory systems in PCs... To keep a lot of cores busy you have to feed them and that means possibly changes to the memory subsystems. It's not so bad now with so few cores on processors, but as they increase to 16, 32, etc. things start to get harder.
In any case, multicore is here to stay and it will be exciting to see what changes come about in the next few years.
Multiple Cores will lead to Neural Net Apps (Score:2, Informative)
1) The number of cores is going to increase
2) The current concept of an artificial Neuron having some sort of value, with weights attributed to it is too simple for how our human brains realy work, and therefore need more than a simple value and one algorithm, such that it will likely need to be replaced with a more complex model of values and algorithms, and the work on such that requires a mini-process or in this case "a core"
I expect that given that there will be an increased amount of cores, probably with an increase similiar to hard disc, processor, or memory increases of the past (1 10MB hard disc increasing to 500GB today), that we will have thousands or even hundreds of thousands of cores.
As we learn more about how the brain works I believe that 2) will be accepted as true at some point.
So I expect that more and more new software will attempt to be more intuitive, as more and more people begin to agree that the software we have now in general is crap, in that it doesn't help the layman as much as it could do their jobs.
This intuitiveness will likely be in the form of artificial Neural Nets, paving the way for computing systems to begin to act like the science fiction computer systems we think of in "the future".
Just my two-cents guess...
Re:Stephen Wolfram has a solution (Score:2, Informative)
That is not true. Our brains do not use Cellular Automata. Neural Networks [wikipedia.org] is a branch of Artificial Intelligence which attempts to model biological nervous systems. There are many reasons why it is not practical to design software such as office suites or whole video games to run on neural nets. Biological networks are trained, not designed. We also have no way of building efficient artificial neural net hardware at present. Wolfram himself does mention in his book that Cellular Automata are not the alpha and the omega, they are merely visually striking simple computing systems which illustrate many of his more general points well.
People scoffed when Wolfram published his tome with some justification. They were not arguing against the merit of many of his points. They were pointing out the fact that he has an arrogant tone and is blatantly plagiarising decades of established computer science, restating the ideas of many other scientists as though they were his own. That is a serious breach of scientific ethics and he deserves to be tarred and feathered for it.
Re:Yeah, if you only run one program at a time.. (Score:5, Informative)
In a word, no.
The more complicated answer is "Yes, in rare cases".
The problem is that programs written in your normal languages (C, C++, Java, C#, basically anything you've ever heard of) are totally synchronous; you can not proceed on to the next statement until the previous one completes.
Thus, trying to parallelize something at the API is virtually worthless. I don't win anything if my "drawWindow" or "displayMPEGFrame" function flies off to another processor to do its work, if I still have to wait for it to complete before I can move on.
(This can be helpful if you have two types of processors, so in fact 3D graphics APIs can be looked at as working just this way. But we already have that.)
You might say, "But there are some operations that I can do that with, like loading a webpage!" We already can do that. It's called asynchronous IO; you fire your IO request, the hardware (with software assist) does its thing, and you get the results later. You might even fire off a lot of things and process them in the non-deterministic order they come back. UNIX has been doing that for about as long as it has been UNIX, via the select call.
The easy stuff has been done. To write programs that actually fill a multi-core CPU's capacity is going to require a paradigm change. Shared-memory threading isn't looking very good (too complex for any human to correctly implement). There are several candidate paradigms, but there is no clear winner at the moment, some of them may never work, and they all have one thing in common: They look nothing like current coding practices with threads (because, as I said, that's looking pretty useless if we can't get it working in the decades we've had to play with it).
The claims I've seen so far:
This isn't exhaustive, it's off the top of my head, and there are endless variations on each of those themes.
If I had to lay money down, I'd go with "a language that used threadlets like Erlang and rigidly enforced no sharing, in an OO environment" winning, which does not really exist yet. (Probably the closest you get today would be Stackless Python with a manual enforcement of sending only immutables across t
Re:Concurrency is hard. (Score:3, Informative)
http://support.microsoft.com/kb/78326 [microsoft.com]
http://support.microsoft.com/kb/79749 [microsoft.com]
Re:You hit the nail right on the head (Score:3, Informative)
Regarding multicore CPUs, there already plenty of parallelization packages linkable directly into C (e.g. the various MPI implementations [open-mpi.org]). All you have to do is structure your for loops to make use of it. Once you do that, you can run each iteration of the loop on a separate core via MPI or something similar, thus fantastically improving your code execution time and making full use of all your cores.
Heck, with your OS's built-in threading calls, you can do this even without MPI, as long as you can make your for loops thread-safe [wikipedia.org]. Interestingly, OS X does this all the time---in Activity Monitor you can see the number of threads your processes have spawned. The kernel usually has the most, and iTunes, iMovies, etc usually have a few as well. Expect this number to go up when cpus with dozens of cores come out. And, of course, linux has the same functionality, though I've seen fewer linux apps that actually make significant use of multiple threads. They'll come soon, as long as the GNOME, etc. authors
I would imagine that if these new multi-multi core procs are released into the wild in mass numbers, new programming languages will be developed that will enable things to be done more efficiently and easier....or perhaps a hybrid language: One half of the language is for writing processes for individual cores, while the other half acts as a "hub"....or even better, say you have 16 cores, and then one "central" core that acts like a post office...it doesn't actually create any of the mail, it just makes sure it gets delivered to the correct place.
There is no way that the hardware would advance without the programming ability to back it up
Amdahl's law (Score:2, Informative)
If one truckdriver can drive a truck 50 miles in one hour, how far can two drivers then drive the truck in the same amount of time?
Re:Yeah, if you only run one program at a time.. (Score:3, Informative)
Scientific computing has gone from exhaustive and detailed simulation to exhaustive analyses of an entire parameter space with the advent of new life science branches such as Genomics, Proteomics and all the other broadly based omics-type endeavors. This means embarrassingly parallel computation at a massive scale. At my institution we keep roughly 3000 cores humming around the clock without any difficulty, largely using legacy code.
At home I have a recording studio where the bulk of the processing is happening through plugins. Again, embarrassingly parallel workload, very easy to keep a number of cores working at 100% TODAY.
That's just two examples, but I could mention many more, and so could many of you I'm sure. I fail to see why today's software cannot make productive use of multicore architectures. Unless we're talking productivity apps, which don't use much CPU anyway...
Re:Yeah, if you only run one program at a time.. (Score:3, Informative)
*cough* Fortran *cough*
Fortran has had support for this type of thing since F90 came out 15+ years ago. The language standard defines what are (and are not) asynchronous operations. For example, the WHERE()...OTHERWISE...END construct is designed to be implemented asynchronous. Dual-core is new enough that the free compilers don't implement parallel operations yet, but the support is already there with some of the commercial compilers.
F90, it's not your Grandfather's FORTRAN...
-JS
Re:Purely Functional Programming... (Score:3, Informative)