Multicore Requires OS Rework, Windows Expert Says 631

Posted by timothy on Sunday March 21, 2010 @07:48PM from the ok-let's-split-up dept.

alphadogg writes "With chip makers continuing to increase the number of cores they include on each new generation of their processors, perhaps it's time to rethink the basic architecture of today's operating systems, suggested Dave Probert, a kernel architect within the Windows core operating systems division at Microsoft. The current approach to harnessing the power of multicore processors is complicated and not entirely successful, he argued. The key may not be in throwing more energy into refining techniques such as parallel programming, but rather rethinking the basic abstractions that make up the operating systems model. Today's computers don't get enough performance out of their multicore chips, Probert said. 'Why should you ever, with all this parallel hardware, ever be waiting for your computer?' he asked. Probert made his presentation at the University of Illinois at Urbana-Champaign's Universal Parallel Computing Research Center."

Multicore Requires OS Rework, Windows Expert Says

This discussion has been archived. No new comments can be posted.

Search 631 Comments Log In/Create an Account

Comments Filter:

Luckily OSX is Already Has MultiCore Tech (Score:1, Informative)

by Shuh ( 13578 ) writes: on Sunday March 21, 2010 @08:02PM (#31561624) Journal

It's called Grand Central Dispatch. [wikipedia.org]

The problem: the event-driven model (Score:5, Informative)

by Animats ( 122034 ) writes: on Sunday March 21, 2010 @08:40PM (#31561934) Homepage

A big problem is the event-driven model of most user interfaces. Almost anything that needs to be done is placed on a serial event queue, which is then processed one event at a time. This prevents race conditions within the GUI, but at a high cost. Both the Mac and Windows started that way, and to a considerable extent, they still work that way. So any event which takes more time than expected stalls the whole event queue. There are attempts to fix this by having "background" processing for events known to be slow, but you have to know which ones are going to be slow in advance. Intermittently slow operations, like an DNS lookup or something which infrequently requires disk I/O, tend to be bottlenecks.
Most languages still handle concurrency very badly. C and C++ are clueless about concurrency. Java and C# know a little about it. Erlang and Go take it more seriously, but are intended for server-side processing. So GUI programmers don't get much help from the language.
In particular, in C and C++, there's locking, but there's no way within the language to even talk about which locks protect which data. Thus, concurrency can't be analyzed automatically. This has become a huge mess in C/C++, as more attributes ("mutable", "volatile", per-thread storage, etc.) have been bolted on to give some hints to the compiler. There's still race condition trouble between compilers and CPUs with long look-ahead and programs with heavy concurrency.
We need better hard-compiled languages that don't punt on concurrency issues. C++ could potentially have been fixed, but the C++ committee is in denial about the problem; they're still in template la-la land, adding features few need and fewer will use correctly, rather than trying to do something about reliability issues. C# is only slightly better; Microsoft Research did some work on "Polyphonic C#" [psu.edu], but nobody seems to use that. Yes, there are lots of obscure academic languages that address concurrency. Few are used in the real world.
Game programmers have more of a clue in this area. They're used to designing software that has to keep the GUI not only updated but visually consistent, even if there are delays in getting data from some external source. Game developers think a lot about systems which look consistent at all times, and come gracefully into synchronization with outside data sources as the data catches up. Modern MMORPGs do far better at handling lag than browsers do. Game developers, though, assume they own most of the available compute resources; they're not trying to minimize CPU consumption so that other work can run. (Nor do they worry too much about not running down the battery, the other big constraint today.)
Incidentally, modern tools for hardware design know far more about timing and concurrency than anything in the programming world. It's quite possible to deal with concurrency effectively. But you pay $100,000 per year per seat for the software tools used in modern CPU design.

Re:This is new?! (Score:4, Informative)

by Bengie ( 1121981 ) writes: on Sunday March 21, 2010 @08:44PM (#31561966)

developing server apps to run parallel is easy, client software is hard. Many times, the cost of syncing threads is greater than the work you get from them. So you leave it single threaded. The question is, how may you design a Framework/API that is very thread friendly while making sure everything runs in the order expected all the while making it easy for bad programmers to take advantage of it.
The biggest issue with developing async-threaded programs is logical dependencies that don't allow part to be loaded/processed before another. If from square one, you develop an app to take advantage of extra threads, it may be less efficient, but more responsive. Most programmers I talk to have issues trying to understand the interweaving logic of multi-threaded programing.
I guess it's up to MS to make a easy to use idiot-proof threaded framework for crappy programmers to use.

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

by drsmithy ( 35869 ) writes: <drsmithy@@@gmail...com> on Sunday March 21, 2010 @08:53PM (#31562038)

For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem is.
You can as of Vista.

Re:This is new?! (Score:2, Informative)

by NatasRevol ( 731260 ) writes: on Sunday March 21, 2010 @08:59PM (#31562088) Journal

Well, Apple has Grand Central Dispatch - which is the OS deals with the cores rather than the programs. Granted, it's only been since Snow Leopard, but it's much better than what Windows has.
http://en.wikipedia.org/wiki/Grand_Central_Dispatch [wikipedia.org]

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

by NatasRevol ( 731260 ) writes: on Sunday March 21, 2010 @09:09PM (#31562182) Journal

Transaction is copying some files, failing in the middle, and not rolling back those copied over??
Hint. It's not transaction. It's just a bad piece of software that fails badly at doing it's basic job. Handling files.

Re:This is new?! (Score:2, Informative)

by gig ( 78408 ) writes: on Sunday March 21, 2010 @09:09PM (#31562188)

> Since when have OS designers optimised their code to milk every cycle from the available CPUs?
Apple has been doing this for years. This is one of the advantages for the user of buying a complete product. Apple can't pretend that someone else will solve the problem for them through bigger hardware or the magic of open source.
Enabling large scale multiprocessing is one of the fundamental features of Mac OS X v10.6 Snow Leopard. The feature is called "Grand Central" and enables an app developer to make fairly small modifications to their app which cause it to go from pegging 1 CPU to pegging an unlimited number of CPU's. But multiprocessing has been part of OS X since the beginning. They shipped machines with multiple CPU's a long, long time ago compared to PC. Even Mac OS 9 had multiprocessing features.
Apple has also had XGrid going for some time now, which is quick and easy cluster computing.
Then if you look at iPhone OS, that has been highly, highly optimized. An iPhone 3GS with a 600MHz CPU outperforms a Nexus One with a 1000MHz CPU. The iPhone 3G with a 400MHz CPU outperforms a Palm Pre with 600MHz CPU. Those optimizations are part of the reason why Apple is currently undercutting both Android and Palm on price, which is the opposite of what was expected by Palm and Android developers and the entire industry. iPad on a 1000MHz CPU has been described by the people who have used it so far as being incredibly fast.
So you are right if you're talking about Microsoft (and maybe Linux, I don't know) but you're definitely wrong if talking about all OS designers.

Re:waiting (Score:3, Informative)

by DavidRawling ( 864446 ) writes: on Sunday March 21, 2010 @09:16PM (#31562256)

Yes, as does Windows. I think I should have been more clear - the scale curve is nice and flat up to 8, 16, maybe 32 logical CPUs. After that though, doubling CPUs doesn't necessarily double performance (even in heavy compute) - other bottlenecks start to impact, as does scheduler performance and architecture.

Re:This is new?! (Score:5, Informative)

by stevew ( 4845 ) writes: on Sunday March 21, 2010 @09:25PM (#31562338) Journal

Well - I can tell you that Dave Probert saw his first multi-processor about 28 years ago at Burroughs corporation. It was a dual-processor B1855. I had the pleasure with working with the guy way back then. From what I recall he then went on to work at FPS systems which was an array processor that you could add onto other machines (I think vaxen...but I could be wrong there..)
Anyway - he has been around ALONG time.

Re:Current architecture flawed but workable BUT... (Score:1, Informative)

by Anonymous Coward writes: on Sunday March 21, 2010 @09:36PM (#31562422)

In windows Vista and 7 file copy is improved to avoid these problem, fyi.

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

by duguk ( 589689 ) writes: <dugNO@SPAMfrag.co.uk> on Sunday March 21, 2010 @10:18PM (#31562734) Homepage Journal

For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem
Try TotalCopy [ranvik.net] which adds a copy/move in the right click menu; or Teracopy [codesector.com] commercial (free version available, supports Win7) complete replacement for the sucky Windows copy system.

USB/Network freezes and file copying isn't a fault of CPU cores like you say, Windows is just a sucky OS. Multicore stuff gets complicated, but this isn't going to be a panacea for Microsoft, it's another marketing opportunity.

Re:Fist post! (Score:3, Informative)

by wiredlogic ( 135348 ) writes: on Sunday March 21, 2010 @10:34PM (#31562850)

Well, I came here to see the fisting. And frankly, so far this site has been a real disappointment.
You have to read at -1 to see the goatse trolls.

Re:Oh and which .NET programs are taxing the cpu? (Score:3, Informative)

by jonwil ( 467024 ) writes: on Sunday March 21, 2010 @10:37PM (#31562864)

.NET apps DO use a virtual machine, the Common Language Runtime and support .NET IL. However, the Virtual Machine DOES use just-in-time compiling and precompiling to compile the code to native code before it runs it.
Same as any halfway decent desktop Java Virtual Machine implementation does now (mobile JVMs usually use hardware features like ARM Jazzelle to run the Java code faster)

Re:This is new?! (Score:3, Informative)

by nine-times ( 778537 ) writes: <nine.times@gmail.com> on Sunday March 21, 2010 @10:57PM (#31563004) Homepage

I don't know if you had to support Mac users during the years of transition, but it wasn't quite as easy as you made it sound. It was pretty smooth for such a drastic change, but I wouldn't want to repeat it any more than necessary.

Re:This is new?! (Score:4, Informative)

by amRadioHed ( 463061 ) writes: on Sunday March 21, 2010 @11:00PM (#31563018)

The iPhone certainly doesn't outperform a Nexus One. If you compare browser rendering tests the Nexus One consistently completes loading pages quite a bit faster then the iPhone. You are probably thinking of games performance, and while it's true that the iPhone gets better frame rates, you're forgetting that the Nexus One is pushing around 2.5 times more pixels so that's not exactly an apples to apples comparison.

Re:I hate to say it, but... (Score:3, Informative)

by ceoyoyo ( 59147 ) writes: on Sunday March 21, 2010 @11:12PM (#31563116)

It doesn't make it as different as you seem to think.
I think GCD is a great idea, and a very useful tool, but it's not a magic bullet. GCD can schedule some things more effectively because it has a system-wide view. The closure extensions and GCD interface makes it reasonably easy for novice programmers to get things actually running in parallel.
Of the two, the latter has a MUCH bigger impact in terms of actually getting programs to take advantage of multiple cores. Actually sending
BUT, it's nothing you can't do (and hasn't been done) with various multiprocessing libraries, many of which run on Windows, or with good old threads and processes if you've got a moderate level of skill. In order for it to work effectively the programmer still has to a) structure his program in such a way that the parallelism is exposed and b) actually use GCD.
Contrary to what you seem to suggest, GCD does not really "creates and manages threads on its own, even in applications that are not written to be threaded." It creates threads at the (indirect) request of the application and schedules them appropriately. The application MUST be designed to take advantage of multithreading. The only difference is that GCD makes it easier for the programmer to actually get those threads up and running, and can possibly schedule them more effectively.

Re:Current architecture flawed but workable BUT... (Score:2, Informative)

by Anonymous Coward writes: on Sunday March 21, 2010 @11:17PM (#31563144)

Fixed in Vista and 7, you can ignore errors and continue copying.

Re:This is new?! (Score:3, Informative)

by dudpixel ( 1429789 ) writes: on Sunday March 21, 2010 @11:43PM (#31563294)

Come up with the great new OS...
hang on, this "new" OS you're referring to is basically UNIX (BSD). It was invented before Windows. Sure apple has modified it and put a shiny new layer on top (that works exceptionally smoothly, mind you), but if you wanna get technical, they didn't come up with a new OS, they improved an old one.

Re:This is new?! (Score:3, Informative)

by LtGordon ( 1421725 ) writes: on Sunday March 21, 2010 @11:44PM (#31563300)

... but if Google was so hardcore into efficiency, why the hell did they develop a new runtime for their Android that's based on Java?
Because the Java gets executed on the user's hardware. Google cares about efficiency insofar as it affects their own hardware requirements.

Apple Grand Central Sucks (Score:4, Informative)

by Anonymous Coward writes: on Monday March 22, 2010 @12:55AM (#31563778)

Apple's grand-central dispatch (GCD) solution is really primitive. It's just a simple thread-pool, where the programmer breaks their program down into tasks that can be executed independently then queues them for execution by the thread-pool.
GCD is not in the slightest innovative, except for a hack that allows "c" programmers to write tasks with slightly more convenience, by adding limited "closure" support to the language.
Similar concepts can be found all over the place; just see the "see also" section on the wikipedia article:
http://en.wikipedia.org/wiki/Grand_Central_Dispatch
Using any of the libs listed in that "see also" section, you can get GCD equivalent behaviour on unix/windows, and have been able to for years.
There are also languages with far superior parallel-processing abilities, where the effort is done by the compiler/environment, not the programmer. See any functional language, eg Haskell or Erlang. Write a program in these languages, and the parallel-processing happens just about automatically.
Adding parallelism to the *OS* is quite a different issue, and not one that Apple's GCD addresses.

Re:The problem: the event-driven model (Score:3, Informative)

by Animats ( 122034 ) writes: on Monday March 22, 2010 @01:51AM (#31564024) Homepage

An interesting comment overall, but what relevance does "mutable" have to multi-threaded programming?
A "const" object can be accessed simultaneously from multiple threads without locking, other than against deletion. A "mutable const" object cannot; while it is "logically const", its internal representation may change (it might be cached or compressed) and thus requires locking.
Failure to realize this results in programs with race conditions.

Re:This is new?! (Score:5, Informative)

by mjwx ( 966435 ) writes: on Monday March 22, 2010 @01:58AM (#31564048)

Then if you look at iPhone OS, that has been highly, highly optimized. An iPhone 3GS with a 600MHz CPU outperforms a Nexus One with a 1000MHz CPU. The iPhone 3G with a 400MHz CPU outperforms a Palm Pre with 600MHz CPU
Citation needed? I think you'll find that Iphone only appears to outperform Android because Android is doing a lot more then the Iphone. Further more many things that work on Android do not work on Iphone, slashdot for instance works fine on my HTC Dream or newer Motorola Milestone with the standard browser, it works even better with Dolphin browser.

This cannot be a fair comparison until the Iphone can do everything that Android phones can, unless you want to compare functionality where Iphone is an epic failure.
Those optimizations are part of the reason why Apple is currently undercutting both Android and Palm on price,
Now I can tell you're full of it. All prices are incl of local taxes, and UK VAT does not apply outside the EU for those in Australia, Canada and the US.

UK Expansys
Motorola Milestone GBP 379 [expansys.com]
Nexus 1 GBP 599 [expansys.com]
Iphone 32 GB GBP 799 [expansys.com]

AU Mobicity
Motorola Milestone A$659 [mobicity.com.au]
Nexus 1 A$849 [mobicity.com.au]
Iphone 16 GB A$959 [mobicity.com.au]

The cheapest Iphone 3GS available is A$100 more expensive then the newer Motorola Milestone (droid for the Yanks) and Google Nexus One. Not to mention that both the Milestone and Nexus One can do more as well as lack the restrictions of the Iphone. But then again I suspect you were merely looking to confirm your quite obvious bias rather then do an accurate comparison.

Apple's operating systems are not very well optimised, not even as much as Windows operating systems, Apple's OS pretend to have optimisation by providing the OS with more hardware then it needs and limiting functionality to prevent any perceived loss of speed. Most people using a Mac or Iphone rarely use the full power of the hardware, ergo an un-optimised OS goes unnoticed by the user. Here is the core of the design (in an engineering perspective) a design does not have to work well, it just has to work. The vast majority of people will ignore tiny flaws if they can get the task done, OTOH if a computer doesn't do the task the user will get annoyed no matter how pretty the interface.

As a good developer friend of mine likes to say, "If given the choice, a user will press the 'I just want it to work today' button". OSX provides this very shiny button but only in a few select places, Windows provides this not so shiny button almost everywhere. This is why Windows is still the number one OS on the planet.

Re:This is new?! (Score:4, Informative)

by julesh ( 229690 ) writes: on Monday March 22, 2010 @04:04AM (#31564518)

One is that ARM native code is bigger, size-wise, than Dalvik VM bytecode.
Citation needed. Dalvik is better than baseline Java bytecode, agreed. But so is ARM native code. [http://portal.acm.org/citation.cfm?id=377837&dl=GUIDE&coll=GUIDE&CFID=82959920&CFTOKEN=24064384 - "[...] the code efficiency of Java turns out to be inferior to that of ARM Thumb"]. I can find no direct comparison of ARM Thumb and Dalvik, so I can't tell you which produces the smaller code size.
So it takes up more memory.
Even if your first statement is true, this doesn't necessarily follow. VMs add overhead, usually using up somewhat more runtime memory to execute, particularly if a JIT is used (the current version of Dalvik doesn't have one, but the next one apparently will).

Re:reinventing the wheel (Score:1, Informative)

by Anonymous Coward writes: on Monday March 22, 2010 @05:25AM (#31564774)

Ok, I bet I get modded troll for this, but I so wish Windows WAS a bloated version of VMS.
It would have a distributed lock manager, decent file type support and metadata, baked in security from the ground up, a scripting language that worked, logical names, a built-in flat database engine (RDB), a layered RDBMS (RDB), a distributed file system, clustering system that works with 100+ nodes, could be spread across different physical sites, contain mixed processor architectures, could do rolling upgrades, and could have recorded uptimes of 12 years+...
Basically, Windows provides a bunch of services (win32 et al), that work suprisingly well for creating desktop applications, but can't really do most of the things that VMS can.
On the other hand, there's a Windows PC on my desk, and a VAX, Alpha and Itanium in the server room, and that's the way it should stay! (Get off my lawn).

Re:Answer: Yes (Score:3, Informative)

by jackharrer ( 972403 ) writes: on Monday March 22, 2010 @06:08AM (#31564938)

>>What we need is a "you don't want to use C: right now, trust me" signal. Ever tried to use Firefox while copying something big? Why does it take ages to display a webpage when it does not need to use the disk?
It only works like that on Windows. I think it's mostly about bad system design. I have no such issues on my Linux machine, but lots on my wife's Windows one. Both are the same Thinkpad laptops, so fault can only be on OS side.

Re:I hate to say it, but... (Score:3, Informative)

by TheRaven64 ( 641858 ) writes: on Monday March 22, 2010 @08:07AM (#31565402) Journal

I am finding it very difficult to believe that you have actually used GCD. I have, and have read most of the code for the implementation. Creating threads is not hard - it is definitely not what makes parallel programming difficult. The difficult bit is splitting your code into parts that have no interdependencies and so can execute concurrently.
When you use libdispatch, you still have to do this. All that it does for you is implement an N:M threading model. It allocates a group of kernel threads and then multiplexes them into work queues. The pthread_workqueue_*_np() family of system calls lets the kernel decide the optimum number of kernel threads for the application, depending on system configuration and load. The libdispatch code then executes blocks on these threads. This saves some thread creation time and saves some context switching and cache churn because it runs blocks sequentially in a small number of threads (ideally one per core), rather than running them all concurrently on a separate thread.
It creates and manages threads on its own, even in applications that are not written to be threaded
No it doesn't. You must get a dispatch_queue_t and then send it blocks to execute concurrently. You must do this explicitly.

Re:Answer: Yes (Score:3, Informative)

by mario_grgic ( 515333 ) writes: on Monday March 22, 2010 @09:26AM (#31566494)

Firefox doesn't behave like that in OS X. So I don't know if this is OS specific issue?

Re:A more basic question (Score:3, Informative)

by radish ( 98371 ) writes: on Monday March 22, 2010 @11:07AM (#31568878) Homepage

iPhone isn't even slightly "instant on" - it takes at least a minute to boot an iPhone from off. What you're seeing most of the time is "screen off" mode. Unsurprisingly, switching the screen on & cranking up the CPU clock doesn't take much time. Likewise, waking my Windows box up from sleep doesn't take very long either. Comparing modern OS software running on modern hardware I see little difference in boot times, or wake time from sleep - which would indicate that if MS are being lazy then so are Apple & all the devs in the Linux & BSD worlds. As for why my ST used to boot so much quicker, well the lack of discs helped, as did the lack of hardware variance (and thus lack of drivers to load & start).

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Multicore Requires OS Rework, Windows Expert Says 631

Multicore Requires OS Rework, Windows Expert Says More Login

Multicore Requires OS Rework, Windows Expert Says

Luckily OSX is Already Has MultiCore Tech (Score:1, Informative)

The problem: the event-driven model (Score:5, Informative)

Re:This is new?! (Score:4, Informative)

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

Re:This is new?! (Score:2, Informative)

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

Re:This is new?! (Score:2, Informative)

Re:waiting (Score:3, Informative)

Re:This is new?! (Score:5, Informative)

Re:Current architecture flawed but workable BUT... (Score:1, Informative)

Re:Current architecture flawed but workable BUT... (Score:3, Informative)

Re:Fist post! (Score:3, Informative)

Re:Oh and which .NET programs are taxing the cpu? (Score:3, Informative)

Re:This is new?! (Score:3, Informative)

Re:This is new?! (Score:4, Informative)

Re:I hate to say it, but... (Score:3, Informative)

Re:Current architecture flawed but workable BUT... (Score:2, Informative)

Re:This is new?! (Score:3, Informative)

Re:This is new?! (Score:3, Informative)

Apple Grand Central Sucks (Score:4, Informative)

Re:The problem: the event-driven model (Score:3, Informative)

Re:This is new?! (Score:5, Informative)

Re:This is new?! (Score:4, Informative)

Re:reinventing the wheel (Score:1, Informative)

Re:Answer: Yes (Score:3, Informative)

Re:I hate to say it, but... (Score:3, Informative)

Re:Answer: Yes (Score:3, Informative)

Re:A more basic question (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot