Removing the Big Kernel Lock 222
Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"
Looks like "Worse is Better" all over (Score:5, Insightful)
That's fine, but once you reach maturity you should be trying to do the "right thing" (the exact opposite.) And the Linux kernel has reached maturity for quite a while now.
I think Linus is right on this.
Re:Fascinating. (Score:3, Insightful)
Hey, I consider myself a code junky (and yes, even consider the issue of the BKL somewhat interesting), but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn.
Re:I don't understand (Score:4, Insightful)
If the BKL code is rarely used then the general usage performance impact is minimal and the efficiency of a spinlock vs mutex is irrelevant. If this is not true then saying it is rarely used is misleading.
However for real-time use you either do or don't meet a given worst case latency spec - the fact that a glitch only rarely happens is of little comfort.
It seems like it should have been a no-brainer to leave the pre-emptable code in for the time being. If there's a clean way to redesign the lock out altogether then great, but that should be a seperate issue.
Keep these on Front Page... (Score:5, Insightful)
This is the type of stuff that needs to be kept in the news, as the people who post here often have no understanding of, and the ones that do, have the opportunity to explain this stuff, bringing everyone up to a better level of understanding.
Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately.. Or even dare say, NT won't have people go ape when someone refers to a good aspect of its kernel design.
Re:Linux? (Score:3, Insightful)
Re:Linux? (Score:3, Insightful)
Re:Fascinating. (Score:5, Insightful)
(Performance != Speed) // in an RT system (Score:5, Insightful)
Re:Translation? (Score:4, Insightful)
Keep dreaming ...
With all those processors, you'll want to be saving energy, so you'll be aiming to turn off individual processors until needed, and run the remaining processors at full load, so you'll still need a scheduler, locks, etc.
And yes, it's possible even today to use up more than 4 gig of ram and have to hit swap.
Re:Sounds like the Linux kernel needs some tests.. (Score:5, Insightful)
You talk about unit testing, but how exactly are you going to unit test multi-threading issues? This is not some simple problem that you can run a test/fail test against. These kinds of things can really only be tested by analysis to prove it can't fail, or extensive fuzz testing to get it "good enough"..
This is why monolithic kernels do real-time badly (Score:5, Insightful)
This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch.
This is where microkernels win. When almost everything is in a user process, you don't have this problem.
Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs. All these operations either have a hard upper bound in how long they can take (a few microseconds), or are preemptable. Real time engineers run tests where interrupts are triggered at some huge rate from an external oscillator, and when the high priority process handling the interrupt gets control, it sends a signal to an output port. The time delay between the events is recorded with a logic analyzer. You can do this with QNX while running a background load, and you won't see unexpected delays. Preemption really works. I've seen complaints because one in a billion interrupts was delayed 12 microseconds, and that problem was quickly fixed.
As the number of CPUs increases, microkernels may win out. Locking contention becomes more of a problem for spinlock-based systems as the number of CPUs increases. You have to work really hard to fix this in monolithic kernels, and any badly coded driver can make overall system latency worse.
Re:Fascinating. (Score:3, Insightful)
but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn.
The funny thing is that the reply quality here is quite high for technical topics, but over time slashdot management has found that retarded political threads are much more popular.
Re:Looks like "Worse is Better" all over (Score:1, Insightful)
Think fast hot shot. What do you do. What do you do.
Re:Looks like "Worse is Better" all over (Score:2, Insightful)
In fact none of the people in the discussion seem to think that. They just think it is a huge amount of work (and they should know far better than me, but for what it's worth I agree)
The problem? Changing legacy code used under heavy multitasking in N(-> infinity) configurations.
Torvalds' solution? Start doing it and offer it as an experimental config option. By supporting both options, with BKL as the default, they (a) move the architecture in this direction and (b) allow heavy users and distro creators to experiment with it and give feedback (and help). Open source at its best.
Re:Translation? (Score:3, Insightful)
I'll respond because it is fantastic to see new people thinking about these issues. But I must agree with twizmer on this - grabbing multiple resources might solve the problem, but it is very clumsy. Some resources (eg storage) may take milliseconds to complete, whereas others (eg graphics) might take only microseconds. Holding up the fast ones while the slow ones complete is very undesirable (for all the reasons twizmer gives).
There are techniques that are used for problems like deadlocks and starvation: changing priorities on the fly; or enforcing mutex ordering; or even 'prodding' deadlocked tasks, but they are somewhat ugly. You'll find chapters in any book on OS design.
The essential problem is that the use of semaphores (and mutexes etc) is a low level way to control multiple processes; it is analogous to using the goto for flow control. There are languages that have attempted to address this (eg Occam or Modula I) with slightly higher level constructs, but they have not become popular, and are not totally radical.
I believe that we will need new programming languages to achieve safer parallelism. My bet would be on a language with message passing primitives (since they fit well with our object oriented models), and perhaps the use of Petri net formalism to prevent deadlocks. I gather that Nokia's phone OS uses this message passing model.
It should be noted that current processor design does not suit efficient message passing (the emphasis is more on an efficient stack, since that corresponds to the procedural flow of control - an exception may be the old Transputer architecture). However, I think the languages need to be developed first, even if they are not efficient to compile; processor development will support the most popular languages (as it has grown to support the use of C and other procedural languages).
Re:Translation? (Score:3, Insightful)
Managed languages have their place (and in fact all my work now is with managed languages)... but they also have to run on something. You think your java apps will just run without an operating system underneith it with these kinds of features to support the running of the virtual machine? What language do you think your java machine is written in? Do you think that's air you're breathing now? Hmm...
"This is innacurate, not correct and misleading at the same time"
But hey at least you prefixed what you had to say with this so we did know.
Re:Sounds like the Linux kernel needs some tests.. (Score:3, Insightful)
All lines of code are not equal.
There's a huge difference between typical application code and system code. Very little application code is as performance-sensitive as system code, because the goal of the system code is to use as little time as possible (consistent with some other goals), to make as much time as possible available to run application code.
OS code is performance-tuned to a degree that application code almost never is, and that focus on performance results in complexity that isn't well-represented by counting lines of code. Further, most application code isn't nearly as multiprocessor-aware as modern OS code, which introduces another huge complexity factor. Finally, the role of OS code is to interact directly with hardware, and if you've ever written on-the-metal code you know how much complexity that adds. Linux, of course, takes that even further by trying to work on a wide variety of hardware platforms, abstracting commonality where possible, but only when it doesn't interfere with performance.
No, I don't think there are many, if any, unit tests.
Altogether, I estimate that system code is an order of magnitude more complex, per line, than application code.
There are a bunch of sets of system tests in place for Linux. They're created and executed by multiple groups of people around the world and the results are made available to the developers (some of whom are the same people executing the tests).
This is a different approach than is common in the normal, centralized development model, but it's one that's very effective for the sort of decentralized development model used by the Linux kernel team. People who are interested in different aspects of Linux create tests designed to evaluate the kernel according to those aspects. When they see problems, or opportunities for improvement, they post their results to LKML, often with patches to address the issue they identified.