Slashdot Log In
Removing the Big Kernel Lock
Posted by
CmdrTaco
on Saturday May 17, @12:13PM
from the wait-i-thought-locks-made-it-secure dept.
from the wait-i-thought-locks-made-it-secure dept.
Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"
Related Stories
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.

Looks like "Worse is Better" all over (Score:5, Insightful)
That's fine, but once you reach maturity you should be trying to do the "right thing" (the exact opposite.) And the Linux kernel has reached maturity for quite a while now.
I think Linus is right on this.
Reply to This
Punchline (Score:5, Informative)
Assuming everything is stable and correct, the next step is to break the BKL into locks with finer granularity so that the BKL can go the way of the dodo.
Reply to This
Keep these on Front Page... (Score:5, Insightful)
This is the type of stuff that needs to be kept in the news, as the people who post here often have no understanding of, and the ones that do, have the opportunity to explain this stuff, bringing everyone up to a better level of understanding.
Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately.. Or even dare say, NT won't have people go ape when someone refers to a good aspect of its kernel design.
Reply to This
This is why monolithic kernels do real-time badly (Score:5, Insightful)
This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch.
This is where microkernels win. When almost everything is in a user process, you don't have this problem.
Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs. All these operations either have a hard upper bound in how long they can take (a few microseconds), or are preemptable. Real time engineers run tests where interrupts are triggered at some huge rate from an external oscillator, and when the high priority process handling the interrupt gets control, it sends a signal to an output port. The time delay between the events is recorded with a logic analyzer. You can do this with QNX while running a background load, and you won't see unexpected delays. Preemption really works. I've seen complaints because one in a billion interrupts was delayed 12 microseconds, and that problem was quickly fixed.
As the number of CPUs increases, microkernels may win out. Locking contention becomes more of a problem for spinlock-based systems as the number of CPUs increases. You have to work really hard to fix this in monolithic kernels, and any badly coded driver can make overall system latency worse.
Reply to This
Re:Fascinating. (Score:5, Funny)
Reply to This
Parent
Re:Fascinating. (Score:5, Insightful)
Reply to This
Parent
Re:Translation? (Score:5, Funny)
Reply to This
Parent
Re:Translation? (Score:5, Informative)
Now, a group is trying to see if it is possible to weed out all the remaining uses of the BKL and replace them with localized locking for specific sections of the kernel. This is tricky, as there are side-effects of the BKL that are not always obvious.
Reply to This
Parent
Re:Translation? (Score:5, Informative)
There would be thousands of such, and you'll probably never succeed in debugging it.
The approach suggested in the article is to replace the BKL by a true lock, then "pushing it down", which means understanding WHY that code want the BKL, and get smaller locks instead in subroutines.
For instance, one piece of code could take the BKL because it will change 3 data structure. You could then remove the BKL and use, in the 3 part of code that changes those 3 structure, and use a finer grained lock for each of those.
By iterating this way, you should always get a somewhat working kernel, and slowly kill the BKL.
Reply to This
Parent
Re:Translation? (Score:5, Informative)
No.
You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?
It's not that simple. When it comes to locking, there is no "part of the code" that can be replaced. Locking governs interaction between two pieces of code, sometimes two pieces that are very different but have some small thing in common.
Besides, the kernel is too big to just start throwing parts of it out and redoing them from scratch. It's much better to make incremental improvements, because then the people working on them will actual learn how to solve the problem. The BKL is not just a coding problem, but also a people and project management problem.
Reply to This
Parent
Re:Translation? (Score:5, Informative)
Linux is a preemptive multi-tasking kernel. What this means is that a hardware interrupt like a keyboard click or the system timer will interrupt whatever is currently running on the CPU, and an interrupt handler in the kernel starts running code. In order to make sure that all the states of the kernel are consistent (ie: not corrupt), the different parts of the kernel are supposed to lock the data that they are using or modifying (ie, readlock or writelock) in case another code path gets run at the same time trying to modify the same data. It becomes even more important in a multi-cpu environment where locks have to be atomic (happen at the same time on all CPUs). So what you are supposed to do is only lock the resources you currently need (a file system drivers would only lock parts of the filesystem, not a character device). Because some programmers are lazy, or not sure what they are doing, they just use the big kernel lock which locks pretty much everything in the kernel. This is bad for multi-tasking and multi-processing because it means you can only have one codepath using the lock at a time.
Note: it's been a while since I've done kernel work, so I'm sure this is not 100% true, but hope it helps you understand.
Reply to This
Parent
Re:Translation? (Score:5, Informative)
The granularity is important because you want other threads(jobs) to beable to get something done. At some point there is this thing called a scheduler that assigns your thread to execute, because every job needs a CPU. You get to work until your time allotment is expired or you have to stop because something you need to continue is not availible, because its say "locked".
Think of this like working in a shop along side someone else. You have one set of tools, you need a little screw driver, and a big one to do your work. The other guy needs the little scew driver and a pair of pliers. You want him to put the screw driver down while he is using the pliers so that you can use it if you need to. If he instead puts it in his breast pocket you are going to have to wait to finish your job until he finishes his. Even though its your turn at the work bench(CPU) you can't do anything with it because you don't have what you need. So all you can do is yeild the rest of the time to the other task, and hope he finished up soon.
In the kernel world this really short circuits the work of the scheduler. It might want to give time to other threads and it will but they are going to just turn around and give that up because whichever thread is holding the BKL is likely the only one who can actually do any work. As an end user this means something like data gets read from your network card ok but your sound keeps skipping.
The tricky part with more granular locks though is avoiding circular conditions, these can crash the system. Imagine: Job One needs resource A and B and has A locked, its waiting for B. Job Two needs B and C and has B locked and is waiting on C. Job Three needs A and C and has C locked and is waiting on A. Unless the system can detect this condition which is hard to do in many cases none of these threads will ever be able to run. The kernel contributors likely have some work ahead to eliminate the BKL and not cause these types of problems.
Reply to This
Parent
Re:I don't understand (Score:5, Interesting)
Why did they remove the preemptable BKL?
I'm not a kernel developer, but I'd say it's because there's widespread belief that the preemtable BKL is "the wrong way forward". Statements like these lead me to believe this:
In any large software project there's always a path to get from where you are, to where you want to be. It sounds like any version of BKL is considered ugly and causes problems, and patching it just won't work. In other words, fixing this part of the kernel isn't really possible, so they need to start over and change any code that relies on it to rely on something different entirely.
Reply to This
Parent
Bad interaction with the generic semaphores. (Score:5, Informative)
The consensus was to not revert the generic semaphore patch, but to fix it another way. Linus decided on a path that will make people focus on removing the BKL rather than a workaround in the generic semaphore code. Also, Linus doesn't think that the latency of the non-preemptable BKL is too bad [2].
[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/msg03526.html
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3e076c5a78519a9f64cd384e8f18bc21882ce0
Reply to This
Parent
Re:I don't understand (Score:5, Informative)
rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL).
Matt
Reply to This
Parent
(Performance != Speed) // in an RT system (Score:5, Insightful)
Reply to This
Parent
Re:I don't understand (Score:5, Informative)
Reply to This
Parent
Re:I don't understand (Score:5, Interesting)
so they want to try this other road: make the BKL working as intended, add more debugging information and making each call of the BKL more visible to the kernel developers, and then remove the call to the BKL using other synchronization mechanism, changing the BKL client code to call other primitives. This won't fix the BKL, but renders it useless and removable.
it's good to see those decisions made inside the linux kernel, as being backward compatible is the road to madness that hindered the windows kernel.
Reply to This
Parent
Re:I don't understand (Score:5, Interesting)
the first one: http://www.joelonsoftware.com/articles/APIWar.html [joelonsoftware.com]
for the second one couldn't find any reference,I think I first read it on the russovich blog
the fact is, there are a lot of bug that couldn't be resolved because resolving them would broke backward compatibility, there are a lot of api which couldn't be cleaned for the same reason, for example the filesistem api, which had lead to the curious situation on which each program using different portions of the api shows a different file opener/file save dialog, and so on. There are a lot of strange things happening in windows, all the time: you could look at some of them on this blog: http://blogs.msdn.com/oldnewthing/default.aspx [msdn.com]
Reply to This
Parent
BKL is again a big source of latency (Score:5, Informative)
Unfortunately this caused a 40% regression in the AIM7 benchmark [google.com]. The BKL was now a (slower) semaphore and the high lock contention on it was made worse by its ability to be preempted. As the ability to build a kernel without BKL preemption had been removed [google.com] Linus decided that the BKL preemption would go. Ingo suggested semaphore gymnastics to try and recover performance but Linus didn't like this idea.
As the the BKL is no longer be preemptible [google.com] it is now a big source of latency (since it could no longer be interrupted). People still want low latencies (that's why they made the BKL preemptible in the first place) so they took the only option left and started work to get rid of the BKL.
(Bah half a dozen other people have replied in the time it's taken me to edit and redit this. Oh well...)
Reply to This
Parent
Re:Linux? (Score:5, Informative)
The future.
Reply to This
Parent
Re:Interesting (Score:5, Funny)
Reply to This
Parent
Re:Sounds like the Linux kernel needs some tests.. (Score:5, Interesting)
In other words:
TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.
Reply to This
Parent
Re:Sounds like the Linux kernel needs some tests.. (Score:5, Insightful)
You talk about unit testing, but how exactly are you going to unit test multi-threading issues? This is not some simple problem that you can run a test/fail test against. These kinds of things can really only be tested by analysis to prove it can't fail, or extensive fuzz testing to get it "good enough"..
Reply to This
Parent
Tough to test drivers for hardware you don't have (Score:5, Informative)
This is not to say that there need to be tests for things that can be caught at compile time or run time regardless of hardware but there is only so far you can take it.
It's not like the kernel doesn't have any testing done on it though. There's the Linux Test Project [sourceforge.net] which seems to test new kernel's nightly. If you ever look in the kernel hacking menu of the kernel configuration you will see tests ranging from Ingo Molnar's lock dependency tester [mjmwired.net] (which checks to see locks are taken in the right order at run time), memory poisoning, spurious IRQ at un/registration time, rcu torture testing, softlockup testing, stack overflow checking, marking parts of the kernel readonly, changing page attributes every 30 seconds... Couple that with people like Coverity [coverity.com] reporting static analysis checks on the code. Tools like sparse [kernel.org] have been developed to try and so some of the static checks on kernel developer machines while they are building the code.
But this is not enough. Bugs STILL get through and there are still no go areas of code. If you've got the skills to write tests for the Linux kernel PLEASE do! Even having more people testing and reporting issues with the latest releases of the kernel would also help. It's only going to get more buggy without help...
Reply to This
Parent