 
			
		
		
	
		
		
		
		
			
				 
			
		
		
	
		
		
		
		
		
		
			
				 
			
		
		
	
    
	Multicore Requires OS Rework, Windows Expert Says 631
			
		 	
				alphadogg writes "With chip makers continuing to increase the number of cores they include on each new generation of their processors, perhaps it's time to rethink the basic architecture of today's operating systems, suggested Dave Probert, a kernel architect within the Windows core operating systems division at Microsoft. The current approach to harnessing the power of multicore processors is complicated and not entirely successful, he argued. The key may not be in throwing more energy into refining techniques such as parallel programming, but rather rethinking the basic abstractions that make up the operating systems model. Today's computers don't get enough performance out of their multicore chips, Probert said. 'Why should you ever, with all this parallel hardware, ever be waiting for your computer?' he asked. Probert made his presentation at the University of Illinois at Urbana-Champaign's Universal Parallel Computing Research Center."
		 	
		
		
		
		
			
		
	
Current architecture flawed but workable BUT.... (Score:5, Interesting)
...the implementation sucks.
Why for example does Windows Explorer decide to freeze ALL network connections when a single URN isn't quickly resolved? Why is it that when my USB drive wakes up, all explorer windows freeze? If you are trying to tell me there's no way using the current abstractions to implement this I say you're mad. For that matter when a copy or move fails in Explorer, why can't I simply resume it once I've fixed whatever the problem is. You're left piecing together what has and hasn't been moved. File requests make up a good deal of what we're waiting for. It's not the bus or the drives that are usually the limitation. It's the shitty coding. I can live with a hit at startup. I can live with delays if I have to eat into swap. But I'm sick and tired of basic functionality being missing or broken.
reinventing the wheel (Score:5, Interesting)
Microsoft should go back and read some of the literature on parallel computing from 20-30 years ago. Machines with many cores are nothing new. And Microsoft could have designed for it if they hadn't been busy re-implementing a bloated version of VMS.
Multithreading is the problem, not the answer (Score:3, Interesting)
The Problem with Threads [berkeley.edu] (UC Berkeley's Prof Edward Lee)
How to Solve the Parallel Programming Crisis [blogspot.com]
Half a Century of Crappy Computing [blogspot.com]
The computer industry will have to wake up to reality sooner or later. We must reinvent the computer; there is no getting around this. The old paradigms from the 20th century do not work anymore because they were not designed for parallel processing.
Re:Microkernel? (Score:1, Interesting)
As someone who has tried to make Minix 3 suck less; microkernel doesn't imply well suited to multiprocessing, but it can help. Minix 3 for example, has disk drivers, network, filesystem etc as separate processes, but because so many operations depend on the file server, and the file server implementation is mostly synchronous and single threaded, IO will cause the entire system to appear to lock up. It would be possible to fix this of course, but it's not necessarily easy.
Re:This is new?! (Score:3, Interesting)
It doesn't sound easily backwards compatible (but I might be wrong there), and there's a certain simplicity to 'reserve one core for the OS, application developers can manage the rest of them themselves' sort of model like consoles.
Those curious about what life would be like with application developers managing system resources, should try firing up an old copy of Windows 3.1 or MacOS and running 10 or so applications at the same time.
I can only assume TFA is an atrociously bad summary of what he's actually proposing, because it sounds way to boneheaded for someone in that position to be seriously suggesting.
Re:Microkernel? (Score:2, Interesting)
Re:I hate to say it, but... (Score:4, Interesting)
I noticed the same on my mac. With a set of eight CPU graph meters in the menu bar, they're almost always evenly pitched anywhere from idle to 100%, with a few notable exceptions like second life, some photoshop filters, and firefox of all things.
When booted into Win, more often than not I have two cores pegged high, and the others idle. Getting even use out of all cores is the exception, not the rule.
This is pretty much completely down to the application mix. Windows has no trouble whatsoever scheduling processes and threads to max out 8 (or 16, or whatever) CPUs, but if the applications are only coded to have, say, 1 or 2 "processing" threads, then there's nothing the OS can do to change that.
Re:Luckily OSX is Already Has MultiCore Tech (Score:5, Interesting)
It seems you are severely underestimating what GCD means to the application developer. I strongly suggest you read parts 12 and 13 of John Siracusa's excellent review [arstechnica.com] very carefully. As Siracusa says,
Those with some multithreaded programming experience may be unimpressed with the GCD. So Apple made a thread pool. Big deal. They've been around forever. But the angels are in the details. Yes, the implementation of queues and threads has an elegant simplicity, and baking it into the lowest levels of the OS really helps to lower the perceived barrier to entry, but it's the API built around blocks that makes Grand Central Dispatch so attractive to developers. Just as Time Machine was "the first backup system people will actually use," Grand Central Dispatch is poised to finally spread the heretofore dark art of asynchronous application design to all Mac OS X developers. I can't wait.
Re:The problem: the event-driven model (Score:5, Interesting)
This has become a huge mess in C/C++, as more attributes ("mutable", "volatile", per-thread storage, etc.) have been bolted on to give some hints to the compiler.
An interesting comment overall, but what relevance does "mutable" have to multi-threaded programming? It is just a way to say that a particular field in a class is never const, even when the object itself is as a whole. There are no optimizations the compiler could possibly derive from that (in fact, if anything, it might make some optimizations non-applicable).
Same goes for "volatile", actually. It forces the code generator to avoid caching values in registers etc, and always do direct memory reads & writes on every access to a given lvalue, but this won't prevent one core from not seeing a write done by another core - you need memory barriers for that, and ISO C++ "volatile" doesn't guarantee any (nor do any existing C++ implementations).
Microsoft Research did some work on "Polyphonic C#" [psu.edu], but nobody seems to use that.
It's a research language, not intended for production use. Microsoft Research does quite a few of those - e.g. Spec# [microsoft.com] (DbC), or C-omega [microsoft.com] (this is what Polyphonic C# evolved into), or Axum [microsoft.com] (the most recent take at concurrency, Erlang-style).
Those projects are used to "cook" some idea to see if it's feasible, what approach is the best, and how it is taken by programmers. Eventually, features from those languages end up integrated into the mainstream ones - C# and VB. For example, X# became LINQ in  .NET 3.5, and Spec# became Code Contracts in  .NET 4.0. So, give it time.
Re:This is new?! (Score:4, Interesting)
I don't know if you noticed my sig, but I'm pretty familiar with what Apple have been up to these past few years  ;-)
What I was getting at was that, in general, programmers simply don't have the time or money to really optimise their code and now that computers are, for all intents and purposes, fast enough to not really worry about optimisations.
Apple are doing a lot of good, as you mention, with things like Grand Central Dispatch, but the multiprocessing features in earlier versions of OS X, and even more OS 9, were nothing that was in any major way any better than that offered by, say, Windows or other Unix based OSs. In fact, in the Mac OS 9 days, the multiprocessing capabilities of Mac OS lagged quite far behind that of Windows NT at the time.
Re:This is new?! (Score:4, Interesting)
The fact is, the vast majority of programmers (and their tools) are not going to change virtually everything they do in order to deal with multiple cores. And there's good reason for that: it hugely complicates what could otherwise be fairly simple tasks. As the number of cores expand, it gets worse to the point of simply not being practical. This is a job that properly belongs in the OS or hardware layer.
Is it harder to design a system that decides for itself how to go about threading and multiprocessing, rather than relying on the programmer to know when it is best for that particular program? Yes! But that is irrelevant, because in the long run, that is the way it must be done. There is no other practical choice.
I had to laugh at Intel a few years ago when they called for end-product programmers to start programming for their multicore processors. I say, "No, Intel. It is you who must cater to the programmers. They are your customers, and essential suppliers of your other customers. It is your job to make sure that your processors do what the programmers want, not the other way around!"
Apple's decision to put provision for this in their Snow Leopard OS is a clear demonstration of their forward (and practical) thinking. Where are all the others?
Re:This is new?! (Score:3, Interesting)
Google? I'm a big Google fan (and despite the rest of my comment, also a big Android fan and totally love my Nexus One).. but if Google was so hardcore into efficiency, why the hell did they develop a new runtime for their Android that's based on Java?
Google didn't seem like the best company to praise for efficiency. I would have picked some sort of video game company like id Software (yeah, I realize this an apples and oranges comparison though).
Re:Current architecture flawed but workable BUT... (Score:3, Interesting)
there is a option, at least as far back as xp that allows explorer windows to run as their own tasks. Why its not enabled by default i have no clue about (except that i have seen some issues with custom icons when doing so).
Reminds me of the Cache Kernel. (Score:3, Interesting)
The part of the article where Probert discusses the operating system becoming something like a hypervisor reminds me of the Cache Kernel from a Stanford University paper back in 1994. http://www-dsg.stanford.edu/papers/cachekernel/main.html [stanford.edu]
The way I understand it, the cache kernel in kernel mode doesn't really have built-in policy for traditional OS tasks like scheduing or resource management. It just serves as a cache for loading and unloading for things like addresses spaces and threads and making them active. The policy for working with these things comes from separate application kernels in user mode and kernel objects that are loaded by the cache kernel.
There's also a 1997 MIT paper on exokernels (http://pdos.csail.mit.edu/papers/exo-sosp97/exo-sosp97.html). The idea is separating the responsibility of management from the responsibility of protection. The exokernel knows how to protect resources and the application knows how to make them sing. In the paper, they build a webserver on this architecture and it performs very well.
Both of these papers have research operating systems that demonstate specialized "native" applications running alongside unmodified UNIX applications running on UNIX emulators. That would suggest rebuilding an operating system in one of these styles wouldn't entail throwing out all the existing software or immediately forcing a new programming model on developers who aren't ready.
Microsoft used to talk about "personalities" in NT. It had subsystems for OS/2 1.x, WIn16, and Win32 that would allow apps from OS/2 (character mode), Windows 3.1 and Windows NT running as peers on top of the NT kernel. Perhaps someday the subsystems come back, some as OS personalities running traditional apps, and some as whole applications with resource management policy in their own right. Notepad might just run on the Win32 subsystem, but Photoshop might be interested in managing its own memory as well as disk space.
The mid-90s were fun for OS research, weren't they?  :)
Energy efficiency will do it (Score:5, Interesting)
If we want efficient code, we have to figure out ways to reward the programmers that write it. I don't see any sign that people anywhere are interested in doing this. Anyone have suggestions for how it might be done?
It's happening, from a source people didn't expect: portable devices. Battery life is becoming a primary feature of portable devices, and a large fraction of that comes from software efficiency. Take your average cell phone: it's probably got a half dozen cores running in it. One in the wifi, one in the baseband, maybe one doing voice codec, another doing audio decode, one (or more) doing video decode and/or 3d, and some others hiding away doing odds and ends.
The portable devices industry has been doing multi-core for ages. It's how your average cell phone manages immense power savings: you can power on/off those cores as necessary, switch their frequencies, and so on. They have engineers who understand how to do this. They're rewarded for getting it right: the reward is it lives on battery longer, and it's measurable.
Yes, you can get lazy and say 'next generation CPUs will be more efficient', but you'll be beaten by your competitors for battery life. Or, you fit a bigger battery and you lose in form factor.
The world is going mobile, and that'll be the push we need to get software efficient again.
Put up or shut up. (Score:2, Interesting)
I'm getting really sick of posting this, but I'll continue to do so until you do.
BUILD A WORKING PROTOTYPE OF THIS "UNIVERSAL BEHAVING MACHINE", OR SHUT THE HELL UP.
Those of us who aren't insane aren't impressed by talk, we're impressed by results. If you spend half as much effort building the thing as you do flapping your damn jaw, you'd be done by now.
(For any uninitiated mods, this fellow is slashdot poster "rebelscience", and maintains a website of the same name. Every time a multiprocessing-related thread comes up, he posts this tripe but has never actually done anything about it. Visit his website, and you'll see why I call him a lunatic)
Re:waiting (Score:5, Interesting)
Well, with the rise of the SSD, that's no longer as much of a problem.
ORLY!
Let's do some math shall we? Take a simple 4 core Nehalem running at 2.66Ghz. Let's conservatively assume that it can complete a mere *1* double precision floating point number per clock cycle, per core. So. How big is a double? 64 bits, or 8 bytes. Now, that's 2.66 billion * 4 = 10.64 BILLION doubles per second, which is 85 GB/s.
The trick to understanding computing is that all computing really *is* at its heart a throughput problem.
Do you see the asymmetry in throughput b/t the Nehalem and your SSD?
C//
Re:This is new?! (Score:2, Interesting)
Question: Does Linux need any retooling? (Score:3, Interesting)
The article in question talks about Winblows.
What about Linux?
Does it need retooling as well?
Data flow languages (Score:5, Interesting)
I've always thought that both data flow languages and fortran95 had some innovations for multi-core programming worthy of being copied.
Data flow languages such as "G" which is sold as national instruments "labview" brand are intrinsically parallel at many levels. What they do is look at a function call as a list of unsatisfied inputs. These inputs are waiting for the data to arrive to make the variables valid. Then the subroutine fires. Thus every single function is potenitally a parallel process. it's just waiting on it's data. If you program in a serial fashion then of course those functions get called serially. But with graphic programming in 2D, you almost never are programming serially. You are just wiring outputs of other functions to inputs of others. Serial dependencies do arise but these are asynchronous and localized cliques. everything else is parallel. Yet you never ever ever actually write parallel code. it just happens automatically. Perl data language had a glimpse of this but it's not the same thing since the language is still perl and thus not parallel.
Objective-C with it's "message passing" abstraction is perhaps getting closer to the idea of a data flow. While one might complain that well objective-C message passing is just a different sugar coating of C just like C++ is. This would be true from the user's point of view. But it's not as true from the Operating system's point of view. IN OSX, these messages are passing more like actual socket programming at the kernel level. So there's more to objective C on apple's than meets the eye. But I don't know how far you can push that abstraction.
In fortran there are some rather simple but powerful multi-processor optimizations. First there's loops like "forall" that designate that a loop can be done in any order of the loop index and even in parallel. and then there's vectorized statements as part of the language like matrix multiplies. those are rather simple things so don't solve much but they do show that you can put a lot of compiler hinting into the language itself without re-inventing the language.
Re:This is new?! (Score:2, Interesting)
Grand Central Dispatch (GCD) is not some magic bullet that "deals with the cores", as you put it. The big thing it adds is a system wide tasking and scheduling component accessible to individual applications, making it easier to designate blocks of code (ie tasks) that can run in parallel, and spread those tasks among the available CPUs. Programmers still have the burden creating task parallel algorithms to solve their problems, and that is usually the tricky part. Creating a Thread Pool (GCD like functionality) component for an application (or using one someone else built, of which there have been many long before GCD), in both Windows and OS X is very easy in comparison.
Don't get me wrong, GCD is a nice optimization and has some good features, but it is a relatively small and trivial part of the bigger problem.
Answer: Yes (Score:5, Interesting)
First, the article in question talks about OS architecture, not Windows specifically. He specifically states that what he is speaking about is not something MS is working on. Quite the opposite, many of his MS colleagues disagree with him.
Second, the fundamental problems with OS design are exactly that: fundamental problems with OS design. Nobody is making an OS that truly takes advantage of multiple cores, it's still single-processor thinking with the ability to use more than one processor, and this leads to a number of inherent problems.
The article talks about what an OS might look like if built from scratch specifically for multiple core processing power, and there is nothing on the market like it at the moment. It's basically a hypervisor-based OS, where instead of giving programs slices of CPU time, the OS gives programs actual CPUs and slices of memory to use.
Something like that would be extremely slick, we already do that for virtual machines and we end up with 8+ full-fledged servers running on the same machine. Why can't you pull that back a little more so it's individual programs assigned to each CPU such that they don't have to interact with the OS at all once they are up and running? Can you imagine?
Re:This is new?! (Score:3, Interesting)
I don't think you understood the point he was trying to make. Windows has had threading since 1993 and a threadpool API [microsoft.com] since before OS X was released [microsoft.com]. The point he was making was not that Windows wasn't good enough for multiple cores, it was that the current paradigm about how OSes and apps relate wasn't good enough.
Back when you only had a single core CPU, the OS had to share the CPU with all the apps. Thus arose the kernel/user model where the OS ran in kernel mode and the apps ran in user mode. When an app needed some system service it would stop running, the CPU would switch to kernel mode, perform the server, and go back to user mode so the app could resume. When multiple CPUs and then multiple cores per CPU became available, this model was simply expanded so that the OS ran on every CPU core. This is called the SMP (Symmetric MultiProcessing) model because every processor core has the same duties as all the others.
I think what he's saying is that having the OS run on every core means that data structures it uses will have to be shared across all the cores in the system, causing problems like contention and false sharing. It sounds like he is considering what would happen if the OS just ran on some cores and apps ran on others. If an app needs a system service it need not stop running, switch into kernel mode, run the OS, etc. Instead it could send a message to one of the cores that the OS is running on and go about its business, hopefully staying more responsive that way. Obviously the app can't have full control of the CPU because it has to share the computer nicely, but it doesn't need a fully-blown kernel either, so the thin supervisor layer is what he related to a hypervisor.
It may be hard to imagine a 256-core computer because Apple doesn't make any, but Windows can already run on 256 cores. Of course those are huge server boxes, but it won't be long before it's common to have desktop boxes with 256 logical CPUs (2 sockets, 32 cores/socket, 4 threads/core), and then you can imagine that a high-end server might have upwards of 2048 cores. At that point does it even make sense to have the OS running on hundreds or thousands of cores simultaneously? Probably not.
I'm not saying that this guy has the right solution, but he has some interesting ideas worth considering.
dom
Re:Answer: Yes (Score:3, Interesting)
Looks like it's time for me to update my whitepaper on massively parallel OS design [blogspot.com] again? I admit, due to lack of interest I have let it fall a bit out of date, recently.
Among other things, I'm going with the name "Ironfluid" now, as I've finally deconflated the terms "cloud computing" and "fluid computing". Cloud really just means "run by somebody else", while "fluid computing" implies parallel processing and fault tolerance; decoupling the software completely from the hardware. Google, for example, offers both: but does not offer the tools for the common sysadmin to form their own clouds.
I think I'd like to.
Looks like Tanenbaum will have been right after al (Score:2, Interesting)
Re:The problem: the event-driven model (Score:4, Interesting)
Most languages still handle concurrency very badly. C and C++ are clueless about concurrency. Java and C# know a little about it. Erlang and Go take it more seriously, but are intended for server-side processing. So GUI programmers don't get much help from the language.
In particular, in C and C++, there's locking, but there's no way within the language to even talk about which locks protect which data. Thus, concurrency can't be analyzed automatically. This has become a huge mess in C/C++, as more attributes ("mutable", "volatile", per-thread storage, etc.) have been bolted on to give some hints to the compiler. There's still race condition trouble between compilers and CPUs with long look-ahead and programs with heavy concurrency.
We need better hard-compiled languages that don't punt on concurrency issues. C++ could potentially have been fixed, but the C++ committee is in denial about the problem; they're still in template la-la land, adding features few need and fewer will use correctly, rather than trying to do something about reliability issues. C# is only slightly better; Microsoft Research did some work on "Polyphonic C#" [psu.edu], but nobody seems to use that. Yes, there are lots of obscure academic languages that address concurrency. Few are used in the real world.
Ada 2005's task model is a real world, production quality approach to include concurrency in a hard-compiled language. Ada isn't exactly known for its GUI libraries (there is GtkAda), but it could be used as a foundation for an improved concurrent GUI paradigm.
This book [google.com] covers the subject quite well.
Re:This is new?! (Score:2, Interesting)
Still, one could hope that people learned from Apple's experience with the Classic VM inside OS X.
The one thing I don't like from the transition is how Carbon/Classic  :-separated paths are still hanging around in some interfaces.
But between Classic.app on OS X, Cygwin on Windows, and WINE on anything POSIX-ish (source API compatibility, not binary), there's plenty of work out there to serve as a template.
None of which Microsoft can use, I guess, because it's either Apple's or (L)GPLed.
Re:This is new?! (Score:3, Interesting)
Well it's my understanding that Carbon simply wasn't supposed to stick around this long. Cocoa was supposed to replace it, but there were some major developers (e.g. Adobe and Microsoft) who refused to transition.
There was even a dust up in the last year or so when 10.6 was released, and Apple made it clear that they weren't ever going to update Carbon to support 64-bit applications. Adobe pretty much flipped out, and is only now working on migrating over to Cocoa in CS5. Microsoft is finally releasing a Cocoa version of Office in 2010.
So in essence, we're 10 years out and the transition from OS9 still isn't done.
Don't get me wrong-- in the past 10 years, Apple has transitioned to an entirely new OS and a different chip architecture (PowerPC->x86), and overall both transitions went fine. I still wouldn't want to keep doing it every couple of years.