Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Google Open Source Security Linux

Google Finds Hundreds Of Data-Race Conditions In The Linux Kernel (phoronix.com) 57

Google has been testing the Linux kernel with its "sanitizer" testing software that hunts for memory corruption bugs and undefined behaviors. Now Phoronix reports on Google's newest sanitizer: Kernel Concurrency Sanitizer (KCSAN) is focused on discovering data-race issues within the kernel code. This dynamic data-race detector is an alternative to the Kernel Thread Sanitizer. In their testing just last month, in two days they found over 300 unique data race conditions within the mainline kernel.

There was a recent discussion about the Kernel Concurrency Sanitizer on the LKML.

This discussion has been archived. No new comments can be posted.

Google Finds Hundreds Of Data-Race Conditions In The Linux Kernel

Comments Filter:
  • by Z00L00K ( 682162 ) on Saturday October 05, 2019 @01:44PM (#59273308) Homepage Journal

    And I hope that the tool output is used properly.

    I wonder how much it would find on the Windows kernel.

    • by postbigbang ( 761081 ) on Saturday October 05, 2019 @02:00PM (#59273354)

      Exactly my thoughts.... but also various BSD kernels, Android, and Darwin.

      Nothing is perfect, and a good analyzer strengthens what gets fixed properly (and re-tested). It ought to be a mandatory QC test, despite the kernel maintainer egos it would bruise.

      • by Z00L00K ( 682162 )

        If I'm getting rapped on my fingers by an analyzer then can live with that and fix my darn stuff, it's a lot better than having to face an upset project manager or a gang of angry users.

        Add Minix and QNX to the list too.

        I'm not sure if it's possible to run the tool on OpenVMS though.

        • a gang of angry users with pitchforks and torches, and you inside your house in the middle of the night wearing just your pajamas and the angry users with pitchforks and torches are demanding refunds while your wife and kid is crying and you are looking all over the house for your checkbook and a pen

          thats the stuff nightmares are made of, glad im not you
          • Here's a refund for your free software.
            • So what you are saying is that if you want peace of mind, stay away from open source because you're not allowed to complain about it if someone gives you shite code that could possible screw up your system(s) or compromise your security. Or stay with it and silently eat shit if it does.
              • by stooo ( 2202012 )

                >> you're not allowed to complain
                Nope. You can complain all you want.

                >> Or stay with it and silently eat shit if it does.
                Nope. You can also just improve it, to everybody's benefit.

    • Re:Interesting tool (Score:5, Interesting)

      by gweihir ( 88907 ) on Saturday October 05, 2019 @03:24PM (#59273596)

      And I hope that the tool output is used properly.

      Not comparable. The Linux kernel contains a host of drivers (with variable quality levels), the Windows kernel has just a few.

    • by bobby ( 109046 )

      I wonder how much it would find on the Windows kernel.

      How much computing power do you have? Which then reminds me of a very old joke: did you hear the new Cray can execute an infinite loop in 5 seconds?

  • by goombah99 ( 560566 ) on Saturday October 05, 2019 @01:46PM (#59273314)

    Sigh. even operating systems now.

    • We should rename them as "High Speed" conditions to avoid the appearance of racism and competitiveness.
      • But then you are going to trigger the speed-phobic. They are probably a thing in this brave new world.

      • Re: (Score:3, Informative)

        by lalleglad ( 39849 )

        This would be inaccurate, as race conditions actually don't have to be at high speed.
        It has to do with multiple threads trying to get access to the same resource at the same time, but it technically hasn't anything to do with speed, as there could be many milliseconds or even longer between one resource access to the next.
        A race condition is not handling multiple threads wanting to get to a resource at overlapping times.

        Remember that a computer system with an OS is trying create the illusion for all program

        • Oh, OK. Let's call it a low speed condition then.
        • This would be inaccurate, as race conditions actually don't have to be at high speed. It has to do with multiple threads trying to get access to the same resource at the same time, but it technically hasn't anything to do with speed, as there could be many milliseconds or even longer between one resource access to the next. A race condition is not handling multiple threads wanting to get to a resource at overlapping times.

          Remember that a computer system with an OS is trying create the illusion for all programs and threads of those programs that they alone are using all the resources of the computer, and when that is not handled properly, there is a race condition, regardless of the speed.

          Illustrated in a different way, if your girlfriend is keeping you and her other lover separated, so you don't meet or are aware of each other, then the illusion of owning her is kept for the both of you, but if you suddenly one day see her in bed with that other guy (or vice versa) then there is a race condition, and you can't trust the condition you have with her, eg. is she pregnant with you or him?

          So, continuing with 'race condition' is fine with me, and High Speed condition doesn't catch the true issue.

          As it is about multiple resources trying to get to a resource, ie. 'grab it', getting over racism in Linux might be handled better by calling it a 'grabbing condition'? Then of course, in these dire Trump times, using the word 'grab' may not go so well with some people?

          Illustrated in a different way, if your girlfriend is keeping you and her other lover separated, so you don't meet or are aware of each other, then the illusion of owning her is kept for the both of you, but if you suddenly one day see her in bed with that other guy (or vice versa) then there is a race condition, and you can't trust the condition you have with her, eg. is she pregnant with you or him? https://newsredar.com/jazz-int... [newsredar.com]

          • It appears to me that the race conditions are more likely resembling the case where your apartment manager is letting other people use your apartment when you are supposed to be at work. He is supposed to switch out all the furniture while you are gone and switch it back when you return, but he does not check if you are still there in the morning when they arrive for what they think is their turn, so they are using some of your furniture, and vice-versa in the evening.

      • by hawk ( 1151 )

        why, that's downright offensive, and insulting to the Differently Speeded . . .

    • by religionofpeas ( 4511805 ) on Saturday October 05, 2019 @03:17PM (#59273576)

      Last year, someone in the /r/formula1 subreddit asked an innocent question about which race people would like to eliminate. They did not anticipate that thread making it to the front page, and grabbing the attention of people who don't follow motor racing.

    • Even OSes have privileged operations.
    • by AHuxley ( 892839 )
      A new CoC will remove all the wrong comments found.
  • Best to insure ones own product is not included.
    • I'd also like to see it used on Windows but in doing a quick scan of the code and its build requirements, I would think that it would be a non-trivial amount of work with the Windows tool chain.

      Chrome OS would be interesting as would seeing it be used with QNX; as QNX is a high reliability OS - this could be an interesting tool to help validate it's reliability.

      • by religionofpeas ( 4511805 ) on Saturday October 05, 2019 @02:42PM (#59273480)

        It's a tool to find race conditions. A proper microkernel doesn't have any of those, because it doesn't have shared data or locking.

        Of course, that says nothing about the possibility of other classes of mistakes, but this tool won't be helpful in finding them.

        • by gtall ( 79522 ) on Saturday October 05, 2019 @03:00PM (#59273534)

          A proper micokernel shares the hardware. Last I checked, it had plenty of data. And how do you suppose the hardware keeps interrupts from clobbering each other after firing off the drivers? Locking.

          • by religionofpeas ( 4511805 ) on Saturday October 05, 2019 @03:13PM (#59273562)

            A proper micokernel shares the hardware

            Not in the same sense. Each process runs in its own space, using its own private data.

            And how do you suppose the hardware keeps interrupts from clobbering each other after firing off the drivers? Locking

            Of course. The actual implementation of message passing and scheduling needs a few locks. But then you're talking about one tiny piece of code that is at the heart of the microkernel, and very unlikely to have a mistake in it.

            • ...one tiny piece of code that is at the heart of the microkernel, and very unlikely to have a mistake in it.

              I see you haven't been a programmer for very long. A classic quote: "But I only changed one card, and it was in a different part of the program!"

              • It doesn't even need to be a mistake on the part of the programmer. It's been more than once that I've found that the code the compiler produces is different when compiled for debugging and release. I've sent Microsoft plenty of emails detailing incorrect assembly code the compiler has generated in those cases.

              • by tlhIngan ( 30335 )

                ...one tiny piece of code that is at the heart of the microkernel, and very unlikely to have a mistake in it.

                I see you haven't been a programmer for very long. A classic quote: "But I only changed one card, and it was in a different part of the program!"

                Microkernsls are small intentionally. So small, that you can easily audit, code review, etc. the code. It's not many lines of code (a few thousand) so any individual programmer can get a sense of how the code works and a mental map. It wouldn't even be very

                • Just because a program is small doesn't necessarily mean it is easy to understand. The paper tape loader for the PDP-10 was only 15 words long, yet was fiendlishly complex. On the other hand, Literate Programming, as practiced by Don Knuth, can make even a large program approachable.

                  I have never seen the source code for QNX. Is it easy to read and understand, or does it look like Obfuscated C?

  • False Positives  ??
    OK, that's an acceptable problem for an automated tool, but, can we get some numbers please.
    Otherwise, it's really not a story.
  • Use better tools (Score:4, Insightful)

    by jma05 ( 897351 ) on Saturday October 05, 2019 @05:41PM (#59273858)

    People talk about kernel developers as if they are super human and C as if it is safe if you are just expert enough. Time and again, these reports expose the myth of the genius programmer.

    The code bases are large. Safety does not come from expertise and discipline alone, but needs to be worked into the process with many safeguards. But old habits, old culture and tools are entrenched.

    • by AHuxley ( 892839 )
      That was the computer hobby for kernel developers to do in their free time after work.
      Spend hours on safeguards, testing, review and code?
      Reading a new CoC about what comments will be approved?
    • These are the same people that rail against any language development that makes it safer and insist that programmers just need to get better if they can't trivially make raw C safe 100% of the time.

    • by golodh ( 893453 )
      Excellent point. Code remains something that is handcrafted rather than machined. Besides which, as far as I understand, race conditions are difficult to find and easy to overlook in any event.

      In fact, it illustrates the point that code correctness is one of nature's 'hard' problems. As in, the _only_ way to achieve correctness is to give a mathematical correctness proof for a piece of code. And anyone who's ever tried that knows how hard that is for any piece of non-trivial code (e.g. the textbook examp

  • by skullandbones99 ( 3478115 ) on Saturday October 05, 2019 @06:22PM (#59273924)

    On embedded systems, the system shutdown phase can be one of the worst scenarios to get right. The main issue being that a system shutdown request is asynchronous to the normal system activities so is racy by its very nature.

    I see failures whereby a system shutdown request attempts to force active connections on protocol stacks such as Bluetooth to disconnect starting at the upper protocol layers. However, the shutdown also takes down lower protocol layers and device drivers causing protocol stacks to lose communications with the low-level hardware. Then races occur as various protocol layers timeout due to loss of communications but meanwhile the shutdown procedure continues to remove resources without waiting. These scenarios can be non-deterministic.

    In other words, despite having defined protocol disconnection procedures which have timeout periods, a higher priority event such as shutdown may not wait for a nice clean protocol disconnection to take place.

    One of my bugbears of the Linux kernel is of the widespread use of reference counters which cause memory objects to be freed when the counter reaches zero. However, during system shutdown, it would be preferable to just free the memory and not to rely on the reference counters because it causes delay and a risk of premature freeing (reference counters decremented in wrong sequence). The problem being that reference counters are relative and not absolute.

    I would welcome any tool that could analyse races during system shutdowns,

    • That's why we don't shutdown those systems, but rather we pull the plug... There you go... No more racing around little car... :J
  • Or is that âoenot a thingâ
    Zero day pwning I guess is.

  • The many eyes theory makes such bugs impossible.

    • by sad_ ( 7868 )

      Indeed, because tools like this can be developed by anybody (many of similar tools have also been developed in the past), making sure they get fixed.
      Explain me how i can make a tool like this for a closed OS?

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...