Removing the Big Kernel Lock

Removing the Big Kernel Lock 222

Posted by CmdrTaco on Saturday May 17, 2008 @12:13PM from the wait-i-thought-locks-made-it-secure dept.

Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"

Removing the Big Kernel Lock

This discussion has been archived. No new comments can be posted.

Search 222 Comments Log In/Create an Account

Comments Filter:

Re:I don't understand (Score:5, Interesting)

by Vellmont ( 569020 ) writes: on Saturday May 17, 2008 @12:44PM (#23446302) Homepage

Why did they remove the preemptable BKL?

I'm not a kernel developer, but I'd say it's because there's widespread belief that the preemtable BKL is "the wrong way forward". Statements like these lead me to believe this:
"all this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong."

In any large software project there's always a path to get from where you are, to where you want to be. It sounds like any version of BKL is considered ugly and causes problems, and patching it just won't work. In other words, fixing this part of the kernel isn't really possible, so they need to start over and change any code that relies on it to rely on something different entirely.

Re:Translation? (Score:3, Interesting)

by Anonymous Coward writes: on Saturday May 17, 2008 @12:49PM (#23446342)

Anytime you have more than one application running, they could get into an argument about who gets to use the serial port, the video display, memory, or drive storage. This is especially critical in multi-processor systems.

The answer is to allow sections of code to "lock" access for a brief duration -- "I'm working with this right now, don't anyone else touch it." Simple in theory, very difficult in concept.

Note that I'm speaking generically; I'm not an expert on the Linux kernel. Ideally, though, you want locks to be "granular" -- in other words, you only lock that specific hardware and/or portion of memory that you need exclusive access to. Apparently, the "big kernel lock" takes a brick wall and hammer approach, locking access (and claiming exclusive access during the lock, preventing anything from running). It's not granular.

If I'm wrong, someone else here can correct me. Like I said, I'm not an expert on the Linux kernel.

Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Saturday May 17, 2008 @01:07PM (#23446446)

Comment removed based on user account deletion

Re:I don't understand (Score:3, Interesting)

by kestasjk ( 933987 ) writes: on Saturday May 17, 2008 @01:48PM (#23446722) Homepage

new semaphore code was introduced that simplified locking. Unfortunately in many kernel situations it's proven to affect performance at around something like 40% - which isn't just considerable its disastrous. rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL). Matt
Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on? Why try to make a non-preemptible kernel preemptible for the sake of real-time, if it affects non-real-time performance?

Re:Sounds like the Linux kernel needs some tests.. (Score:5, Interesting)

by Alex Belits ( 437 ) * writes: on Saturday May 17, 2008 @02:14PM (#23446858) Homepage

And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly.
On the other hand, code WITH tests also can't be easily checked for correctness when a change is made. There is only very small scope of possible mistakes that a test can detect, and if you will try to make test verify everything, test will grow larger (and buggier, and more incomprehensible) than your code. It's also possible that intended behavior of the code and expected behavior that the test checks for, diverge because of some changed interface. Tests help with detection of obvious breakage, but you can never rely on anything just because it passed them.

In other words:

TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.

Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Saturday May 17, 2008 @02:17PM (#23446876)

Comment removed based on user account deletion

Re:Fundamental kernel structures such as this... (Score:3, Interesting)

by HiThere ( 15173 ) writes: <charleshixsn@@@earthlink...net> on Saturday May 17, 2008 @02:45PM (#23447016)

I'll agree that this should have been shorted out long since. But it wasn't, and very few people though that it was reasonable to expend time on something so obviously unreasonable. (Multiprocessors were things like Illiac IV, huge monsters that were utterly impractical.)

Time passes, technology changes, and now it's become urgent to deal with this, so now it's being dealt with.

One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!! The things I can think of divide neatly into two camps. 1) We don't know enough to even get started, and 2) It really seems utterly implausible, even given this example to work from. Unfortunately, somewhere in there is something that's being overlooked, and I don't know what. Kernel support for Actors? Kernel security to control Actors? Kernel support for Language parsing? They all seem implausible.

What is clearly needed soon is software that facilitates the use of multi-processor environments. Dataflow languages have promise, but there may be other reasonable choices. Possibly some interface that would easily allow different computer languages to work together, but that may be a real impossibility. Or even a language basically like C or C++. but extended with a "foreach" operator that allowed parallel execution of the loop body...but the language would need to be smart enough to tell what needed to be read locked and what needed to be write locked, and what could just be ignored. This implies that use of pointers is *severely* circumscribed! And if you're going to do that, you probably ought to have garbage collection. It might sound like I'm talking about Java, but that would be wrong. This language would need to be close to the metal, so it could adapt itself (at run time!!) to the local machine. And since we want as much efficiency as possible, virtual machines, interpreters, etc. are probably out.

I don't know of any language that meets the specs I've outlined, but I know of many languages that meet large parts of them. Of the languages I know, D (Digital Mars D) comes the closest, but its totally missing on even the parallelization that C/C++ have (as an add-on).

But that doesn't really say where the kernel should be going...except that possibly C isn't the best language to use for a multiprocessor environment. (But C is still the most efficient in most places, and it DOES have add-ons for parallelization...though whether you can use those add-ons in kernel programming isn't something I've investigated.)

Re:Translation? (Score:2, Interesting)

by suck_burners_rice ( 1258684 ) writes: on Saturday May 17, 2008 @03:25PM (#23447256)

Since removing the BKL will cause deadlock situations like the one you describe, perhaps a solution to this problem is to re-think the way locking is implemented. If a program knows that it will need access to resources A, B, and C, it could put in a request to reserve all three of those resources simultaneously. If the three resources are available at that moment, they will all be locked simultaneously, the task will execute, and then they will be unlocked simultaneously. But if one or more of the resources are not available at that moment, that task will simply stop executing (it won't be scheduled) until the first instance that all three become available. This way, a resource doesn't become locked until it is actually going to be used.

Re:I don't understand (Score:3, Interesting)

by QX-Mat ( 460729 ) writes: on Saturday May 17, 2008 @03:46PM (#23447382)

yes I agree. the BKL wasn't fair - which worked. happiness. but hey my views are pointless because I'm pro desktop/nvidia/life (and I have a soul).

The new semaphore implementation gives way in a very fair manor proving that sometimes you can be too fair and Kon was right from the get go (a desktop user's opinion, i don't mean to bait). if something didn't explicitly ask for a time slice, ie: a kernel section was preempted by an interrupt and then queued back in normally, then it doesn't get one until the pope said yes. the BKL as we knew it would allow a section enclosed in [un]lock_kernel to queue jump - at the expense of some extra code trickery.

Sadly the BKL was one of the more demanding locks - if it didn't get what it wanted fast enough, subtle differences in character device operations, which all seem to be strategically linked to the tty and other vfs layers from what I can make of KernelTrap, would work against the overall throughput of the kernel and into usermode (archaic ioctls suffer)

the best "fix" would be to implement the old semaphore types for depreciated BKL-only use (don't export them?) - but the old semaphore code was quite substantial, and required some arch specific implementations which have now... I don't see that much work being reintroduced in it's old form any time soon.

Matt

Re:Fundamental kernel structures such as this... (Score:2, Interesting)

by RiotingPacifist ( 1228016 ) writes: on Saturday May 17, 2008 @03:53PM (#23447424)

while a 2.7 branch would make sense if stability was important, its not, with 2.6 linus decided he couldn't be bothered with the boring part of stability and so told distros to do it themselves. This happens (with varying degrees of success), and has allows the kernel to develop at a much faster pace. Unfortunately this particular change is too much even for the 'unstable' kernel, so a temporary testing branch to get other peoples code sorted is being formed (a virtual 2.7 in effect only its so experimental it will never get released)

Re:(Performance != Speed) // in an RT system (Score:2, Interesting)

by Anonymous Coward writes: on Saturday May 17, 2008 @05:03PM (#23447788)

Yes, and since the kernel can and is branched, they can decline to apply this patch and keep the 2.6.7-2.6.22 or whatever style BKL... or they can help everyone and rewrite various BKL-using code to not use it. I'd rather have a kernel that has low latency AND behaves correctly, but if I have to chose I prefer correct behavior every time.

Re:So when can we run QNX-Ubntu? (Score:3, Interesting)

by Animats ( 122034 ) writes: on Saturday May 17, 2008 @05:40PM (#23448000) Homepage

You can actually run X-windows on QNX, although nobody does. QNX has its own GUI system called Photon, which is oriented towards things one might want in an industrial control panel or an entertainment system, like meters, dials, and graphs.
QNX the OS is fine; it's QNX the company that's difficult to deal with. On the other hand, who else has gone up against Microsoft on x86 for 20 years and is still alive?

Re:Translation? (Score:4, Interesting)

by twizmer ( 1206952 ) writes: on Saturday May 17, 2008 @05:42PM (#23448010)

This is one approach to deadlocks; it would fall generally under "avoidance".

The problem is that if you have any serious contention over the resources, it is entirely possible that the process will _never_ get the resources (because one of them always gets snatched up before another gets released, so all n resources are never available at once to the requesting process). This leads to starvation and general sadness.

If the system has minimal contention (so the normal case is that all three resources are unclaimed) and resources are held very briefly (so if a resource is taken it is likely to be released before another is taken, anyway) then it may work. In real systems these are hard properties to guarantee.

Also, the scheme requires a process to know in advance which locks it will need. A lot of algorithms may discover this on the fly (e.g. if you are traversing a data structure), which becomes a problem. The best you could hope to do is to lock aggressively--taking everything you might need--but this is ugly, and would tend to violate the conditions above (locking everything will lead to contention, locking everything in advance will lead to holding locks for a long time). Alternatively, if you discover that you need a new lock that you don't yet have, you could give up all the locks you do have and then try to lock again (with the new lock added to the set). This is also ugly and increases the chance of starvation (since now you need to lock a bunch of resources several times). Additionally, since you have to unlock in the middle, the algorithm becomes much more complicated. For example, when you discover you need a new lock, you must put the world in a consistent state before unlocking. And when you re-lock, you must check to make sure that the world hasn't been modified under your feet (which is entirely possible, and may very well cause you to need a still-different lock).
Basically it doesn't work that well.

Re:I don't understand (Score:5, Interesting)

by QX-Mat ( 460729 ) writes: on Saturday May 17, 2008 @07:06PM (#23448634)

I'm brave enough to want per CPU microkernels (with a messaging master?). I envisage all multi-CPU systems addressing memory in an non-unified manor soon enough - it'll be like the jump from segmented addressing to protected mode, but for CPUs.

The monolithic design is slowly forming a focal point in performance: something has to do a lot of locked switching - if SMP machines could do what they do best and handle IRQs and threads concurrently without waiting for a lock (they're better spent sending/receiving lockless messages), life would be easier on the scalability gurus.

Re:Translation? (Score:2, Interesting)

by suck_burners_rice ( 1258684 ) writes: on Saturday May 17, 2008 @11:14PM (#23450124)

What would happen if there were two distinct ways of handling the BKL problem, which could be implemented simultaneously: Method number 1, the existing method. This would remain in place to support all the program that use it. Method number 2, a newfangled method that new programs will need to be made aware of. The idea is that "well behaved" programs would, over time, begin to use the second method. The first method would actually use the second method in its implementation, so really there's only one method, although it appears that there are two.
Each resource in the kernel that might potentially be locked receives its own small lock. The idea is that the more finely grained the locking, the less likely it is that more than one program will need the same resource at the same time. The deadlock and resource contention problems will still exist, of course, but they will occur less frequently. Programs that know exactly which locks they'll need over the course of an operation would put in a request for all of those resources. At this moment, the kernel checks if all requested resources are available, and if so, the resources are locked immediately (and simultaneously) and program execution continues into the portion that uses those resources. However, if one or more of the resources are not available, the kernel will queue the request and one of two things will happen at the program's choice: Either the call will block until the resources are available, at which point program execution continues normally (we'll call this "synchronous"); or the program can be notified that "I'll get back to you on that" (we'll call this "asynchronous") in which case it can continue executing some other, unrelated part of its code, such as the part that updates the screen, outputs sound, or whatever. Obviously, this will be a bit trickier for application programmers to implement, but the reward will be programs that appear responsive to a user while they "wait" for locks. There should be an additional option in the locking request: a timeout value (with an option to never timeout). If the request times out, the program can do something appropriate. The kernel function that releases a lock will be modified to check the queue of lock requests, and immediately transfer control of the lock to a waiting program (according to some prioritization method). This means unblocking something that is blocked synchronously, or sending a signal to the process that queued the asynchronous lock request. The BKL will be implemented by requesting all locks, synchronous, never time out. Simple. The lock request function will return a value telling you whether you received the lock or why you didn't receive it. The scheduler "knows" if there are two or more processes in its list that are blocked waiting for a lock. If at least one of them has provided a timeout value other than "never," the scheduler does nothing since the problem will fix itself eventually, a la Office Space. Otherwise, it can examine the lock relationships. If a deadlock situation exists, it will choose one of the processes and return it an answer that it didn't receive the lock for deadlock prevention. That process can then do something appropriate. Code that doesn't know how to handle this situation will crash, but the system won't block indefinitely.

Re:Punchline (Score:1, Interesting)

by Anonymous Coward writes: on Sunday May 18, 2008 @03:25AM (#23451180)

Correct, this is what I have understood as well. Interesting to add might be that Linus has stated that he would prefer to have this in mainline as soon as possible, so that there will be more testing coverage.

So the approach they seem to be taking, is:
- add a WARN_ON in the scheduler when it releases the BKL, so that users get notified via syslog and can report such cases (in mainline)
- identify all cases where BKL is locked and not released, and fix the code (might be done in mainline)
- rewrite the BKL as a recursive spinlock (basically, a lock that can double-lock itself and still un-lock correctly) - might also be done in mainline once they are confident that all call sites have been fixed
- merge it, if needed
- move the locks down into subsystems (as you said)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Removing the Big Kernel Lock 222

Removing the Big Kernel Lock More Login

Removing the Big Kernel Lock

Re:I don't understand (Score:5, Interesting)

Re:Translation? (Score:3, Interesting)

Comment removed (Score:5, Interesting)

Re:I don't understand (Score:3, Interesting)

Re:Sounds like the Linux kernel needs some tests.. (Score:5, Interesting)

Comment removed (Score:5, Interesting)

Re:Fundamental kernel structures such as this... (Score:3, Interesting)

Re:Translation? (Score:2, Interesting)

Re:I don't understand (Score:3, Interesting)

Re:Fundamental kernel structures such as this... (Score:2, Interesting)

Re:(Performance != Speed) // in an RT system (Score:2, Interesting)

Re:So when can we run QNX-Ubntu? (Score:3, Interesting)

Re:Translation? (Score:4, Interesting)

Re:I don't understand (Score:5, Interesting)

Re:Translation? (Score:2, Interesting)

Re:Punchline (Score:1, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot