IBM's Chief Architect Says Software is at Dead End - Slashdot

Follow Slashdot blog updates by subscribing to our blog RSS feed

×

IBM's Chief Architect Says Software is at Dead End 334

Posted by Zonk on Tuesday January 30, 2007 @12:33PM from the its-da-end-of-da-woild dept.

j2xs writes "In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures. Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors. Intel just stated in an SD Times article that 100% of its server processors will be multicore by end of 2007. We will never, ever return to single processor computers. Architect Catherine Crawford goes on to discuss some of the ways developers can harness the 'tiny supercomputers' we'll all have soon, and some of the applications we can apply this brute force to."

This discussion has been archived. No new comments can be posted.

IBM's Chief Architect Says Software is at Dead End

Search 334 Comments Log In/Create an Account

Comments Filter:

rendered deadly slow? (Score:4, Informative)

by Dr Kool, PhD ( 173800 ) writes: on Tuesday January 30, 2007 @12:39PM (#17815064) Homepage Journal

If you look at single-thread performance on Intel and AMD's dual/quad core chips, they meet or beat the best that single-core has to offer. I don't see why a multi-core system in the future will run single-thread apps any slower than right now. If anything I'd expect single-thread performance to increase incrementally as Intel and AMD are able to increase clock speeds.

Share
twitter facebook
Concurrency in software (Score:5, Informative)

by NullProg ( 70833 ) writes: on Tuesday January 30, 2007 @12:44PM (#17815140) Homepage Journal

Herb Sutter wrote about this topic two years ago. A great read for anyone who is interested.
http://www.gotw.ca/publications/concurrency-ddj.ht m [www.gotw.ca]

Enjoy,

Share
twitter facebook
Multi-cores vs. internal parallelism (Score:4, Informative)

by BritneySP2 ( 870776 ) writes: on Tuesday January 30, 2007 @12:46PM (#17815156)

IMHO, multi-cores are good for multitasking, which does not cover the whole problem of parallelism. Software (at least, in principle) _is_ ready: pure functional languages, for example, are perfectly suited for parallel processing; it is the lack of the CPUs with architectures that support internal concurrency (using a single core - as opposed to those providing support for multi-threading using multiple cores) that is the problem...

Share
twitter facebook
Re:Clearing things up a bit (Score:2, Informative)

by ArcherB ( 796902 ) * writes: on Tuesday January 30, 2007 @12:51PM (#17815236) Journal

In other words, a spokesperson from $COMPANY is trying to convince the market that they'll soon need to use $PRODUCT if they want best results, conveniently which is sold by the $COMPANY?

IBM is not the only company releasing multi-core processors. Single core processors will soon go out the same way the 386SX did when 32-bit computing became the norm.

Parent Share
twitter facebook
Concurrency is hard. (Score:5, Informative)

by argent ( 18001 ) writes: <peter@slashdot . ... t a r o nga.com> on Tuesday January 30, 2007 @12:52PM (#17815252) Homepage Journal

Concurrency is a hard problem, and unexpected interactions between asynchronous events in concurrent environments has been a periodic bugbear for almost as long as computers have been interactive.

It's what made the Amiga look less reliable than its competitors... if you only ran one native program at a time it was a lot more stable than MacOS or MS-DOS, because the OS provided a much richer set of services so applications didn't have to replicate them... but most people took advantage of the multitasking and when something crashed in the background the lack of memory protection meant the whole thing went down, and non-native software that wasn't written with multitasking in mind could produce the most entertaining crashes.

These days we all have good protected mode multitasking operating systems, but we don't have good easy ways to distribute an application across multiple cores. Until we do, most applications are going to be written to run single-threaded and depend on the OS to use the other cores to speed up the rest of the system, both at the application level and doing things like running graphics libraries on another core.

Until we have so many cores that the OS can't make effective use of them I don't think there's even going to be much of an attempt to make use of them for more developers. And then we're going to go through a painful period like we went through before Microsoft discovered multitasking.

Parent Share
twitter facebook
Re:Yeah, if you only run one program at a time.. (Score:4, Informative)

by ArcherB ( 796902 ) * writes: on Tuesday January 30, 2007 @12:56PM (#17815312) Journal

The programs don't have to be specifically written to be multi-core aware as long as the OS is smart enough to send process to the idle cores.

While that is true of multi-core general purpose processors like the x86, but I don't think that works too well when talking about the Cell processor. The OS can't just assign a Power-PC compiled app to a SPU and expect it to run. Apps have to be specifically coded to take advantage of the SPU's on the Cell.

Parent Share
twitter facebook
Re:Compilers (Score:3, Informative)

by TodMinuit ( 1026042 ) writes: <todminuitNO@SPAMgmail.com> on Tuesday January 30, 2007 @12:57PM (#17815322)

Besides this, is there a solution to this in the form of new programming languages?

Erlang [erlang.org] and Limbo [vitanuova.com] have concurrency primitives built-in. Both used CSP as a launching point. Both give the programming easy-to-use, lightweight processes and message passing. Processes share nothing.

However, neither have built-in support for multiple cores or multiple CPUs at the moment. It's just not a priority for the teams behind them. You can cheat such a setup with Erlang, however, as you can spawn processes on remote machines or remote Erlang instances. If you had two Erlang instances on the same machine, each would run on its own core, so all you'd need to do was spawn a process on each and then message pass between the two.

Parent Share
twitter facebook
Re:Clearing things up a bit (Score:3, Informative)

by xtracto ( 837672 ) writes: on Tuesday January 30, 2007 @01:00PM (#17815370) Journal

(many of which are mentioned in the article), but I don't think that companies are ready to throw away their investment in Java, .NET, and PHP server systems. .

The Cell will live on, but it will create new markets where its inexpensive supercomputing power will open new doors for data analysis and real-time processing..

I just wanted to quote these two sentences you said relevant to my comment. The problem with multicore computing power is not in the software developers but in the available tools. The problem with having say, 60 cores able to run in parallel is that our computation methods (turing based machine computation) are based on the basic "serial algorithm". The main problem resides in the programming languages. As far as I know (although I am *sure* I must be wrong) all the "major" available programming languages are based on this paradigm and they only provide "some functionality" for parallel processing. What we need are compilers and (more importantly and difficult to come up with I think) languages with which
we can create programs that are inherently parallel.

Parent Share
twitter facebook
Did the submitter even read the article? (Score:2, Informative)

by jndale ( 896003 ) writes: on Tuesday January 30, 2007 @01:04PM (#17815436)

If you read the article, you will see that Mrs. Crawford does not even come close to saying that "Software is at Dead End". She says software needs to catch up with the hardware.

Computers have more and more processors (and different kinds of processors, like GPUs), and currently most software isn't designed for that kind of environment. IBM has developed some clever ways to program these types of systems in a "general purpose" way.

That's the worst summary of a headline that I've ever read.

Share
twitter facebook
Concurrency is... (Score:5, Informative)

by TodMinuit ( 1026042 ) writes: <todminuitNO@SPAMgmail.com> on Tuesday January 30, 2007 @01:04PM (#17815438)

Concurrency [wikipedia.org] is [erlang.org] easy [vitanuova.com].

Parent Share
twitter facebook
programming for multi-core architectures (Score:5, Informative)

by barnacle ( 522370 ) writes: on Tuesday January 30, 2007 @01:17PM (#17815624) Homepage

was an interesting article, particularly the part about the hybrid "roadrunner" architecture.

However what is more relevant to today's non-supercomputing needs is SMP scalability.

One of the challenges with SMP scalability is cache coherency; synchronizing the caches on the processors is a costly operation (this is necessary to ensure that each processor has the same view of certain memory at the same time), normally (always?) done with a cache invalidation.

So the more invalidations you do, the more often the processor has to fetch memory from main memory, and the less it's using its cache. Processing slows down dramatically.

I've tried to design the qore programming language http://qore.sourceforge.net/ [sourceforge.net] to be scalable on SMP systems. The new version (released today) has some interesting optimizations that have resulting in a large performance boost on SMP machines - the optimizations involve reducing the number of cache invalidations to the minimum (more than just reducing locking, although that is a part of it too - even an atomic update - for example on intel an assembly lock and increment - involves a cache invalidation and therefore is an expensive operation on SMP platforms). There is more work to be done, but in simple benchmarks of affected code paths the performance increase was between 2 and 3 times as fast with the optimizations on the same qore code.

Anyway it would be interesting to know if other high-level programming languages have also taken the same approach (or will do so); as we go forward, it's clear that SMP scalability will be an important topic for the future...

Share
twitter facebook
Re:Compilers need to be better. (Score:3, Informative)

by AKAImBatman ( 238306 ) * writes: <akaimbatman@gmaYEATSil.com minus poet> on Tuesday January 30, 2007 @01:28PM (#17815822) Homepage Journal

I think what you meant was "attempts to parallelize the operations incorrectly may yield incorrect results."
Indeed. Thanks for the correction. :)
The example that you had given above where you manual converted an algorithm from sequential to potentially parallel processed could easily be handled by a compiler.
Of course. Which is why I showed exactly how the code could be handled in parallel. The key is that compilers already do this. Adding a new core won't help. It will just add overhead that will slow the program down rather than speeding it up.

Parent Share
twitter facebook
Re:Clearing things up a bit (Score:3, Informative)

by AKAImBatman ( 238306 ) * writes: <akaimbatman@gmaYEATSil.com minus poet> on Tuesday January 30, 2007 @01:44PM (#17816164) Homepage Journal

he problem with multicore computing power is not in the software developers but in the available tools. The problem with having say, 60 cores able to run in parallel is that our computation methods (turing based machine computation) are based on the basic "serial algorithm".
I think you're misunderstanding the subject here. The Cell Processor is designed for chewing through massive data sets at an unprecidented rate. It has almost nothing to do with multiprocessing, and everything to do with fast transformations and data analysis.

Take 3D programming as an example. Before I can render the screen, I have to run thousands of vertices through a matrix transformation so that they align with where the camera sees them. This is a bulk operation that can be run in parallel by multiple SIMD cores (each tranforming 2-4 vertices per instruction) by simply providing each core with a copy of the computational matrix. Simple, straightforward, and FAST. But utterly useless for general purpose code like multithreaded web servers.

So the problem is not a lack of tools. The problem is that the Cell is a specialized architecture that's very different from traditional multiprocessing. It's designed for long number-crunching applications that have traditionally been the forte of supercomputers. Ergo, it is a supercomputer on a chip.

Parent Share
twitter facebook
Re:Clearing things up a bit (Score:5, Informative)

by adam31 ( 817930 ) writes: <adam31.gmail@com> on Tuesday January 30, 2007 @01:46PM (#17816198)

but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance.

This is an often-repeated misconception. Cell abandons the practice of having different fp, integer, and vector registers... all registers are 128-bit and any instruction can be issued on any of them, and those instructions are generated by a C++ compiler. So saying that programmers code in these SIMD instructions is like saying that "x86's design counts on programmers to shuffle values between the fp stack, integer and vector registers, and code in separate fp, integer, and vector instructions to get the absolute fastest performance".
The reality is that Cell was targeted more at solving the memory problem than just doing SIMD stream processing. Engineers looked around and decided a 32kb L1 cache was silly... not having a cache-snooping DMA engine (or prefetch engine) would be silly. Putting nine cores on a bus with 7 GB/s bandwidth would be silly. Not being able to overlap memory latency with execution is silly. To solve all these problems, you give up having a single coherent address space.
But there is even more power in Java, .NET investments now... It is completely within the realm of possibility to write a runtime that executes your Java thread on SPU, or JITs the .NET to SPU code. It's a nice benefit that these are already handle-based rather than pointer based languages, so the memory-mapping is a task of the runtime and transparent to the code. And IBM is working hard on native C++ code generation that is agnostic of the address space problem.

Parent Share
twitter facebook
Re:Concurrency is hard. (Score:3, Informative)

by texag98 ( 1057646 ) writes: on Tuesday January 30, 2007 @02:05PM (#17816562)

I agree... anyone who's developed multi-threaded code for an SMP has probably run into the problems of debugging asynchronous thread events. This can make debugging, which is already tedious even more tedious and time consuming.

As the number of cores increases different algorithmic approaches will need to be pursued to get the maximum performance. Many algorithms which are great for serial processors will perform poorly on a parallel architecture.

I think many people don't realize just yet how big of a paradigm shift multi-core really represents. Think of all the billions of lines of legacy code that exists out which was written for sequential computing. Scalability of code is also important since code written today is tomorrow's legacy code code written that isn't scalable will eventually need to be revisited.

Multi-core will probably also require a new look at memory systems in PCs... To keep a lot of cores busy you have to feed them and that means possibly changes to the memory subsystems. It's not so bad now with so few cores on processors, but as they increase to 16, 32, etc. things start to get harder.

In any case, multicore is here to stay and it will be exciting to see what changes come about in the next few years.

Parent Share
twitter facebook
Multiple Cores will lead to Neural Net Apps (Score:2, Informative)

by Geeko Roman ( 304803 ) writes: on Tuesday January 30, 2007 @02:14PM (#17816674)

This comment does not exactly apply to the question put forth about performance of existing apps under multiple cores. However, I would like to bring up that, in my opinion, given my experience with artificial Neural networks and related work, that I expect, in some form or another, that it is likely that one could fairly easily argue:

1) The number of cores is going to increase
2) The current concept of an artificial Neuron having some sort of value, with weights attributed to it is too simple for how our human brains realy work, and therefore need more than a simple value and one algorithm, such that it will likely need to be replaced with a more complex model of values and algorithms, and the work on such that requires a mini-process or in this case "a core"

I expect that given that there will be an increased amount of cores, probably with an increase similiar to hard disc, processor, or memory increases of the past (1 10MB hard disc increasing to 500GB today), that we will have thousands or even hundreds of thousands of cores.

As we learn more about how the brain works I believe that 2) will be accepted as true at some point.

So I expect that more and more new software will attempt to be more intuitive, as more and more people begin to agree that the software we have now in general is crap, in that it doesn't help the layman as much as it could do their jobs.

This intuitiveness will likely be in the form of artificial Neural Nets, paving the way for computing systems to begin to act like the science fiction computer systems we think of in "the future".

Just my two-cents guess...

Share
twitter facebook
Re:Stephen Wolfram has a solution (Score:2, Informative)

by Noiprox ( 809254 ) writes: <dieter@buys.gmail@com> on Tuesday January 30, 2007 @02:16PM (#17816714)

That is not true. Our brains do not use Cellular Automata. Neural Networks [wikipedia.org] is a branch of Artificial Intelligence which attempts to model biological nervous systems. There are many reasons why it is not practical to design software such as office suites or whole video games to run on neural nets. Biological networks are trained, not designed. We also have no way of building efficient artificial neural net hardware at present. Wolfram himself does mention in his book that Cellular Automata are not the alpha and the omega, they are merely visually striking simple computing systems which illustrate many of his more general points well.
People scoffed when Wolfram published his tome with some justification. They were n
ot arguing against the merit of many of his points. They were pointing out the fact that he has an arrogant tone and is blatantly plagiarising decades of established computer science, restating the ideas of many other scientists as though they were his own. That is a serious breach of scientific ethics and he deserves to be tarred and feathered for it.

Parent Share
twitter facebook
Re:Yeah, if you only run one program at a time.. (Score:5, Informative)

by Jerf ( 17166 ) writes: on Tuesday January 30, 2007 @02:16PM (#17816716) Journal
Couldn't the programs inherit the benefits of a multi-core system if the APIs they call are written to distribute the work to the cores? I know this probably isn't optimal but there must be some benefits from this.
In a word, no.

The more complicated answer is "Yes, in rare cases".

The problem is that programs written in your normal languages (C, C++, Java, C#, basically anything you've ever heard of) are totally synchronous; you can not proceed on to the next statement until the previous one completes.

Thus, trying to parallelize something at the API is virtually worthless. I don't win anything if my "drawWindow" or "displayMPEGFrame" function flies off to another processor to do its work, if I still have to wait for it to complete before I can move on.

(This can be helpful if you have two types of processors, so in fact 3D graphics APIs can be looked at as working just this way. But we already have that.)

You might say, "But there are some operations that I can do that with, like loading a webpage!" We already can do that. It's called asynchronous IO; you fire your IO request, the hardware (with software assist) does its thing, and you get the results later. You might even fire off a lot of things and process them in the non-deterministic order they come back. UNIX has been doing that for about as long as it has been UNIX, via the select call.

The easy stuff has been done. To write programs that actually fill a multi-core CPU's capacity is going to require a paradigm change. Shared-memory threading isn't looking very good (too complex for any human to correctly implement). There are several candidate paradigms, but there is no clear winner at the moment, some of them may never work, and they all have one thing in common: They look nothing like current coding practices with threads (because, as I said, that's looking pretty useless if we can't get it working in the decades we've had to play with it).

The claims I've seen so far:
- Erlang-style concurrency: This is a ton of little threads that communicate solely through message passing, no shared state. On the plus side, it's got a working implementation that you can use today. On the down side (and this is my personal opinion), I'm not sure you really need the "functional" part of Erlang to use it (I think you just need threads that share nothing, and if you did that in a more conventional OO language it'd be fine), and Erlang's still quite short on libraries for anything outside of its core competency of network programming.
- Pure functional programming: Pure functional programming has the idea of no mutable state, which allows you to do certain things out-of-order automatically without fear of the system behaving non-deterministically. A lot of people are still making bold claims about this one, but I tend to agree with the papers that show the amount of implicit parallelism in real-world programs is fairly minimal; you're going to need to tell the system where the parallelism for the forseeable future.
- Stream programming: Probably ultimately a special case of Erlang-style processes, and only useful in certain domains (like sound processing).
- And of course, I'd be remiss to not mention the "suck it up and use threads" school of thought, but my feeling is that if programmers in general haven't gotten it right after 20 years, the claim that programmers are especially stupid becomes less plausible, and "the technology is uselessly complex in practice" must be the right answer.
This isn't exhaustive, it's off the top of my head, and there are endless variations on each of those themes.

If I had to lay money down, I'd go with "a language that used threadlets like Erlang and rigidly enforced no sharing, in an OO environment" winning, which does not really exist yet. (Probably the closest you get today would be Stackless Python with a manual enforcement of sending only immutables across t
Read the rest of this comment...
Parent Share
twitter facebook
Re:Concurrency is hard. (Score:3, Informative)

by operagost ( 62405 ) writes: on Tuesday January 30, 2007 @02:32PM (#17816968) Homepage Journal

Windows 3.1, 95, and 98 didn't run in real mode either. Starting with 3.1, Windows required a 286 because it only ran in protected mode (called "Standard") and virtual 8086 ("386 enhanced") mode. Unless, of course, you mean during the DOS boot sequence which has no bearing on system performance once the 32-bit kernel is loaded.

http://support.microsoft.com/kb/78326 [microsoft.com]
http://support.microsoft.com/kb/79749 [microsoft.com]

Parent Share
twitter facebook
Re:You hit the nail right on the head (Score:3, Informative)

by Pausanias ( 681077 ) writes: <pausaniasx@NOspAm.gmail.com> on Tuesday January 30, 2007 @03:40PM (#17818032)

Er, C is in "limited use?" How about all of GNU software, the linux kernel, and all of GNOME, which are written in C (not C++ or anything else), and even large parts of Apple's OS X Darwin kernel?

Regarding multicore CPUs, there already plenty of parallelization packages linkable directly into C (e.g. the various MPI implementations [open-mpi.org]). All you have to do is structure your for loops to make use of it. Once you do that, you can run each iteration of the loop on a separate core via MPI or something similar, thus fantastically improving your code execution time and making full use of all your cores.

Heck, with your OS's built-in threading calls, you can do this even without MPI, as long as you can make your for loops thread-safe [wikipedia.org]. Interestingly, OS X does this all the time---in Activity Monitor you can see the number of threads your processes have spawned. The kernel usually has the most, and iTunes, iMovies, etc usually have a few as well. Expect this number to go up when cpus with dozens of cores come out. And, of course, linux has the same functionality, though I've seen fewer linux apps that actually make significant use of multiple threads. They'll come soon, as long as the GNOME, etc. authors

Well, programming languages come and go...of course, some of the "classics" are still in limited use (cobol, Pascal, C) but for the most part programming languages go the way of the dodo eventually.

I would imagine that if these new multi-multi core procs are released into the wild in mass numbers, new programming languages will be developed that will enable things to be done more efficiently and easier....or perhaps a hybrid language: One half of the language is for writing processes for individual cores, while the other half acts as a "hub"....or even better, say you have 16 cores, and then one "central" core that acts like a post office...it doesn't actually create any of the mail, it just makes sure it gets delivered to the correct place.

There is no way that the hardware would advance without the programming ability to back it up

Parent Share
twitter facebook
Amdahl's law (Score:2, Informative)

by hoegh ( 306704 ) writes: on Tuesday January 30, 2007 @04:46PM (#17818876)

Some tasks can't be done in parallel and this is the Achilles heel of massive parallel architectures. See for instance http://en.wikipedia.org/wiki/Amdahl's_law [wikipedia.org]. No parallel hardware and no parallel algoritms - no matter how clever - will help you, if you have a task of sequential nature (and even only help you somewhat no matter how massively parallel it is, if your task is partly sequential).

If one truckdriver can drive a truck 50 miles in one hour, how far can two drivers then drive the truck in the same amount of time?

Parent Share
twitter facebook
Re:Yeah, if you only run one program at a time.. (Score:3, Informative)

by cweber ( 34166 ) writes: <cwebersd@@@gmail...com> on Tuesday January 30, 2007 @04:58PM (#17819082)

Exactly, and as it stands right now this is already happening.

Scientific computing has gone from exhaustive and detailed simulation to exhaustive analyses of an entire parameter space with the advent of new life science branches such as Genomics, Proteomics and all the other broadly based omics-type endeavors. This means embarrassingly parallel computation at a massive scale. At my institution we keep roughly 3000 cores humming around the clock without any difficulty, largely using legacy code.

At home I have a recording studio where the bulk of the processing is happening through plugins. Again, embarrassingly parallel workload, very easy to keep a number of cores working at 100% TODAY.

That's just two examples, but I could mention many more, and so could many of you I'm sure. I fail to see why today's software cannot make productive use of multicore architectures. Unless we're talking productivity apps, which don't use much CPU anyway...

Parent Share
twitter facebook
Re:Yeah, if you only run one program at a time.. (Score:3, Informative)

by jstott ( 212041 ) writes: on Tuesday January 30, 2007 @08:47PM (#17822276)

The problem is that programs written in your normal languages (C, C++, Java, C#, basically anything you've ever heard of) are totally synchronous; you can not proceed on to the next statement until the previous one completes.

*cough* Fortran *cough*

Fortran has had support for this type of thing since F90 came out 15+ years ago. The language standard defines what are (and are not) asynchronous operations. For example, the WHERE()...OTHERWISE...END construct is designed to be implemented asynchronous. Dual-core is new enough that the free compilers don't implement parallel operations yet, but the support is already there with some of the commercial compilers.

F90, it's not your Grandfather's FORTRAN...

-JS

Parent Share
twitter facebook
Re:Purely Functional Programming... (Score:3, Informative)

by Tracy Reed ( 3563 ) writes: <treed@ultraviolet.oMONETrg minus painter> on Tuesday January 30, 2007 @09:25PM (#17822556) Homepage

A lot of the support for Haskell *used* to be in Academia. But #haskell is full of people using it for every day real-world purposes it seems. I was especially impressed after talking to Cliff Beshers of Linspire who are doing all of their distro-specific coding such as the installer etc. in Haskell. [earth.li] I have now seen IRC bots [haskell.org] in haskell, web servers and web application servers [happs.org] in haskell, and video games [haskell.org] in haskell. Heck, the only existing implementation of Perl 6 is written in Haskell. [pugscode.org] It seems like it has escaped Academia and has been looking for a problem to solve for a few years now. And it looks like this multi-core business may well be it. Especially since haskell has a parallelizing compiler. [haskell.org]

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

413 commentsChatGPT Leans Liberal, Research Shows
347 commentsAmazon CEO Says 'It's Probably Not Going To Work Out' For Employees Who Defy Return-to-Office Policy
327 commentsHotel Owners Start To Write Off San Francisco as Business Nosedives
323 commentsChina is Building Nuclear Reactors Faster Than Any Other Country
315 commentsChina is Calling in Loans To Dozens of Countries

The optimum committee has no members. -- Norman Augustine