Google Finds Hundreds Of Data-Race Conditions In The Linux Kernel (phoronix.com) 57
Google has been testing the Linux kernel with its "sanitizer" testing software that hunts for memory corruption bugs and undefined behaviors. Now Phoronix reports on Google's newest sanitizer:
Kernel Concurrency Sanitizer (KCSAN) is focused on discovering data-race issues within the kernel code. This dynamic data-race detector is an alternative to the Kernel Thread Sanitizer. In their testing just last month, in two days they found over 300 unique data race conditions within the mainline kernel.
There was a recent discussion about the Kernel Concurrency Sanitizer on the LKML.
There was a recent discussion about the Kernel Concurrency Sanitizer on the LKML.
Interesting tool (Score:3)
And I hope that the tool output is used properly.
I wonder how much it would find on the Windows kernel.
Re: (Score:2)
To learn whether I made the right choice.
(Nah, not really. I wouldn't switch anyway;-) )
Re: (Score:2)
Re:Interesting tool (Score:4, Interesting)
Because windows systems are connected to the Internet which is a shared resource.
Re:Interesting tool (Score:5, Insightful)
Exactly my thoughts.... but also various BSD kernels, Android, and Darwin.
Nothing is perfect, and a good analyzer strengthens what gets fixed properly (and re-tested). It ought to be a mandatory QC test, despite the kernel maintainer egos it would bruise.
Re: (Score:2)
If I'm getting rapped on my fingers by an analyzer then can live with that and fix my darn stuff, it's a lot better than having to face an upset project manager or a gang of angry users.
Add Minix and QNX to the list too.
I'm not sure if it's possible to run the tool on OpenVMS though.
Re: (Score:2)
thats the stuff nightmares are made of, glad im not you
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
>> you're not allowed to complain
Nope. You can complain all you want.
>> Or stay with it and silently eat shit if it does.
Nope. You can also just improve it, to everybody's benefit.
Re:Interesting tool (Score:5, Interesting)
And I hope that the tool output is used properly.
Not comparable. The Linux kernel contains a host of drivers (with variable quality levels), the Windows kernel has just a few.
Re: (Score:2)
I wonder how much it would find on the Windows kernel.
How much computing power do you have? Which then reminds me of a very old joke: did you hear the new Cray can execute an infinite loop in 5 seconds?
Racism in Linux (Score:5, Funny)
Sigh. even operating systems now.
Re: (Score:2)
Re: Racism in Linux (Score:2)
But then you are going to trigger the speed-phobic. They are probably a thing in this brave new world.
Re: (Score:2)
Re: (Score:3, Informative)
This would be inaccurate, as race conditions actually don't have to be at high speed.
It has to do with multiple threads trying to get access to the same resource at the same time, but it technically hasn't anything to do with speed, as there could be many milliseconds or even longer between one resource access to the next.
A race condition is not handling multiple threads wanting to get to a resource at overlapping times.
Remember that a computer system with an OS is trying create the illusion for all program
Re: Racism in Linux (Score:2)
Re: (Score:2)
Or 'love speed condition'?
As all threads would love to get access to the resource.
That may however be requiring too much context for people entering the stage.
Re: Racism in Linux (Score:2)
Re: (Score:2)
let's call it "hate speed condition"
Re: (Score:1)
This would be inaccurate, as race conditions actually don't have to be at high speed. It has to do with multiple threads trying to get access to the same resource at the same time, but it technically hasn't anything to do with speed, as there could be many milliseconds or even longer between one resource access to the next. A race condition is not handling multiple threads wanting to get to a resource at overlapping times.
Remember that a computer system with an OS is trying create the illusion for all programs and threads of those programs that they alone are using all the resources of the computer, and when that is not handled properly, there is a race condition, regardless of the speed.
Illustrated in a different way, if your girlfriend is keeping you and her other lover separated, so you don't meet or are aware of each other, then the illusion of owning her is kept for the both of you, but if you suddenly one day see her in bed with that other guy (or vice versa) then there is a race condition, and you can't trust the condition you have with her, eg. is she pregnant with you or him?
So, continuing with 'race condition' is fine with me, and High Speed condition doesn't catch the true issue.
As it is about multiple resources trying to get to a resource, ie. 'grab it', getting over racism in Linux might be handled better by calling it a 'grabbing condition'? Then of course, in these dire Trump times, using the word 'grab' may not go so well with some people?
Illustrated in a different way, if your girlfriend is keeping you and her other lover separated, so you don't meet or are aware of each other, then the illusion of owning her is kept for the both of you, but if you suddenly one day see her in bed with that other guy (or vice versa) then there is a race condition, and you can't trust the condition you have with her, eg. is she pregnant with you or him? https://newsredar.com/jazz-int... [newsredar.com]
Re: Racism in Linux (Score:2)
It appears to me that the race conditions are more likely resembling the case where your apartment manager is letting other people use your apartment when you are supposed to be at work. He is supposed to switch out all the furniture while you are gone and switch it back when you return, but he does not check if you are still there in the morning when they arrive for what they think is their turn, so they are using some of your furniture, and vice-versa in the evening.
Re: (Score:2)
why, that's downright offensive, and insulting to the Differently Speeded . . .
Re: (Score:2)
Re: (Score:2)
hey, that's panic shaming! :)
Re:Racism in Linux (Score:5, Funny)
Last year, someone in the /r/formula1 subreddit asked an innocent question about which race people would like to eliminate. They did not anticipate that thread making it to the front page, and grabbing the attention of people who don't follow motor racing.
Re: (Score:2)
Was the highest rated answer "the human race"?
Re: (Score:2)
Re: (Score:2)
Maybe found in Chrome OS first? (Score:1)
My thought exactly - Also Test on QNX (Score:2)
I'd also like to see it used on Windows but in doing a quick scan of the code and its build requirements, I would think that it would be a non-trivial amount of work with the Windows tool chain.
Chrome OS would be interesting as would seeing it be used with QNX; as QNX is a high reliability OS - this could be an interesting tool to help validate it's reliability.
Re:My thought exactly - Also Test on QNX (Score:4, Informative)
It's a tool to find race conditions. A proper microkernel doesn't have any of those, because it doesn't have shared data or locking.
Of course, that says nothing about the possibility of other classes of mistakes, but this tool won't be helpful in finding them.
Re:My thought exactly - Also Test on QNX (Score:5, Insightful)
A proper micokernel shares the hardware. Last I checked, it had plenty of data. And how do you suppose the hardware keeps interrupts from clobbering each other after firing off the drivers? Locking.
Re:My thought exactly - Also Test on QNX (Score:5, Interesting)
A proper micokernel shares the hardware
Not in the same sense. Each process runs in its own space, using its own private data.
And how do you suppose the hardware keeps interrupts from clobbering each other after firing off the drivers? Locking
Of course. The actual implementation of message passing and scheduling needs a few locks. But then you're talking about one tiny piece of code that is at the heart of the microkernel, and very unlikely to have a mistake in it.
Re:My thought exactly - Also Test on QNX (Score:5, Interesting)
I see you haven't been a programmer for very long. A classic quote: "But I only changed one card, and it was in a different part of the program!"
Re: (Score:2)
It doesn't even need to be a mistake on the part of the programmer. It's been more than once that I've found that the code the compiler produces is different when compiled for debugging and release. I've sent Microsoft plenty of emails detailing incorrect assembly code the compiler has generated in those cases.
Re: My thought exactly - Also Test on QNX (Score:2)
And plus what you said, here's an article with some more detail.
https://lwn.net/Articles/79325... [lwn.net]
Re: (Score:2)
Microkernsls are small intentionally. So small, that you can easily audit, code review, etc. the code. It's not many lines of code (a few thousand) so any individual programmer can get a sense of how the code works and a mental map. It wouldn't even be very
Re: (Score:2)
Just because a program is small doesn't necessarily mean it is easy to understand. The paper tape loader for the PDP-10 was only 15 words long, yet was fiendlishly complex. On the other hand, Literate Programming, as practiced by Don Knuth, can make even a large program approachable.
I have never seen the source code for QNX. Is it easy to read and understand, or does it look like Obfuscated C?
False Positives.... (Score:1)
OK, that's an acceptable problem for an automated tool, but, can we get some numbers please.
Otherwise, it's really not a story.
Re: (Score:2)
They are all false positives.
Re: (Score:2)
I wouldn't be so sure of that. The CIFS layer for instance, is a lot less stable on 16 thread systems.
Use better tools (Score:4, Insightful)
People talk about kernel developers as if they are super human and C as if it is safe if you are just expert enough. Time and again, these reports expose the myth of the genius programmer.
The code bases are large. Safety does not come from expertise and discipline alone, but needs to be worked into the process with many safeguards. But old habits, old culture and tools are entrenched.
Re: (Score:2)
Spend hours on safeguards, testing, review and code?
Reading a new CoC about what comments will be approved?
Re: Use better tools (Score:2)
These are the same people that rail against any language development that makes it safer and insist that programmers just need to get better if they can't trivially make raw C safe 100% of the time.
Re: (Score:2)
In fact, it illustrates the point that code correctness is one of nature's 'hard' problems. As in, the _only_ way to achieve correctness is to give a mathematical correctness proof for a piece of code. And anyone who's ever tried that knows how hard that is for any piece of non-trivial code (e.g. the textbook examp
Difficult system scenarios (Score:5, Interesting)
On embedded systems, the system shutdown phase can be one of the worst scenarios to get right. The main issue being that a system shutdown request is asynchronous to the normal system activities so is racy by its very nature.
I see failures whereby a system shutdown request attempts to force active connections on protocol stacks such as Bluetooth to disconnect starting at the upper protocol layers. However, the shutdown also takes down lower protocol layers and device drivers causing protocol stacks to lose communications with the low-level hardware. Then races occur as various protocol layers timeout due to loss of communications but meanwhile the shutdown procedure continues to remove resources without waiting. These scenarios can be non-deterministic.
In other words, despite having defined protocol disconnection procedures which have timeout periods, a higher priority event such as shutdown may not wait for a nice clean protocol disconnection to take place.
One of my bugbears of the Linux kernel is of the widespread use of reference counters which cause memory objects to be freed when the counter reaches zero. However, during system shutdown, it would be preferable to just free the memory and not to rely on the reference counters because it causes delay and a risk of premature freeing (reference counters decremented in wrong sequence). The problem being that reference counters are relative and not absolute.
I would welcome any tool that could analyse races during system shutdowns,
Re: (Score:1)
How many have Android parallels? (Score:2)
Or is that âoenot a thingâ
Zero day pwning I guess is.
Impossible (Score:2)
The many eyes theory makes such bugs impossible.
Re: (Score:2)
Indeed, because tools like this can be developed by anybody (many of similar tools have also been developed in the past), making sure they get fixed.
Explain me how i can make a tool like this for a closed OS?