Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Google Programming Security

What Happened After Google Retrofitted Memory Safety Onto Its C++ Codebase? (googleblog.com) 81

Google's transistion to Safe Coding and memory-safe languages "will take multiple years," according to a post on Google's security blog. So "we're also retrofitting secure-by-design principles to our existing C++ codebase wherever possible," a process which includes "working towards bringing spatial memory safety into as many of our C++ codebases as possible, including Chrome and the monolithic codebase powering our services." We've begun by enabling hardened libc++, which adds bounds checking to standard C++ data structures, eliminating a significant class of spatial safety bugs. While C++ will not become fully memory-safe, these improvements reduce risk as discussed in more detail in our perspective on memory safety, leading to more reliable and secure software... It's also worth noting that similar hardening is available in other C++ standard libraries, such as libstdc++. Building on the successful deployment of hardened libc++ in Chrome in 2022, we've now made it default across our server-side production systems. This improves spatial memory safety across our services, including key performance-critical components of products like Search, Gmail, Drive, YouTube, and Maps... The performance impact of these changes was surprisingly low, despite Google's modern C++ codebase making heavy use of libc++. Hardening libc++ resulted in an average 0.30% performance impact across our services (yes, only a third of a percent) ...

In just a few months since enabling hardened libc++ by default, we've already seen benefits. Hardened libc++ has already disrupted an internal red team exercise and would have prevented another one that happened before we enabled hardening, demonstrating its effectiveness in thwarting exploits. The safety checks have uncovered over 1,000 bugs, and would prevent 1,000 to 2,000 new bugs yearly at our current rate of C++ development...

The process of identifying and fixing bugs uncovered by hardened libc++ led to a 30% reduction in our baseline segmentation fault rate across production, indicating improved code reliability and quality. Beyond crashes, the checks also caught errors that would have otherwise manifested as unpredictable behavior or data corruption... Hardened libc++ enabled us to identify and fix multiple bugs that had been lurking in our code for more than a decade. The checks transform many difficult-to-diagnose memory corruptions into immediate and easily debuggable errors, saving developers valuable time and effort.

The post notes that they're also working on "making it easier to interoperate with memory-safe languages. Migrating our C++ to Safe Buffers shrinks the gap between the languages, which simplifies interoperability and potentially even an eventual automated translation."

What Happened After Google Retrofitted Memory Safety Onto Its C++ Codebase?

Comments Filter:
  • Lots of numbers on number of bugs found, future estimates on bugs prevented but none on how many bugs found per line of code.

    • by bussdriver ( 620565 ) on Sunday November 17, 2024 @12:36AM (#64951165)

      Shouldn't they always have been running bound checking in debug builds and testing all this time? or made that a lot better? We all could have used way better dev tools all this time and it took Rust and decades of security holes to finally give security a priority. Finally C++ has been making some official moves after the US gov shifted policy to promote helpful languages.

      I can see how in the past leaving in a ton of memory checks could hurt, but now the CPU is so starved for input they run another virtual cpu during the idle gaps and we're compressing and decompressing RAM... plus we're running virtual machine overhead... (but still not willing to go to a micro kernel! but we'll have a hypervisor and a ton of bloat )

      • still not willing to go to a micro kernel!

        According to some, MacOS is a microkernel. The kernel handles some stuff like memory management that a "real" microkernel like Mach might not, but device drivers and filesystems run in a separate space.

        Hurd is a microkernel. It'll be replacing Linux real soon.

      • Finally C++ has been making some official moves after the US gov shifted policy to promote helpful languages.

        Basically, it bruised Bjarne Stroustrup's ego. What he's doing really isn't enough though. He made this kind of dumb comment about there being more to safety than memory safety, but after you listen for a few minutes it's obvious that he's only talking about temporal pointer safety, which C++ won't ever have without breaking backwards compatibility, which is his entire argument for keeping C++ around to begin with and was the premise behind his "call to arms" speech. So it seems like he's just pretending th

        • It's not stupid to disagree with bogus "safety" checks. That shit killed Java's performance. If you want to put reduntant bounds checking everywhere and sprinkle mutexes all over just in case, you'll force an exodus from people who actually know what they're doing and a fork eventually. Those safety checks belong properly in debug builds: If you want them permanently just run debug in live and be done. And stop copying and pasting from copilot, the code it produces is shit and not production ready.
          • It's not stupid to disagree with bogus "safety" checks. That shit killed Java's performance. If you want to put reduntant bounds checking everywhere

            While you don't need redundant bounds checks (nobody is asking for that) you should be doing bounds checks everywhere. Anything less is hubris. C++ developers, more than anybody else in my experience, always seem to think they're the only person in the room who never makes mistakes. Yet they do anyways, and typically blame it on somebody else. That's hubris.

            and sprinkle mutexes all over just in case, you'll force an exodus from people who actually know what they're doing and a fork eventually.

            Mutexes everywhere for what? Why would anybody do that? For data race safety? There's no need for this. Anybody who says otherwise doesn't know what the

            • I would really like to know how you do compile-time bounds checking on variable-length runtime data. Like a file of arbitrary size, or a user inputting an arbitrary number of items into a list.

              Also if you really want memory safety, you do things like "hard embedded" and just don't allow dynamic memory at all. That "eliminates a large class of problems."

              The real takeaway should be "If you want to solve problems, you have to actually do engineering, not just write code and iterate quickly until it appears to

            • Mutexes everywhere for what? Why would anybody do that? For data race safety? There's no need for this. Anybody who says otherwise doesn't know what they're doing. Rust guarantees data race safety with only two mechanisms: The borrow checker, and send/sync marker traits. Those carry no runtime cost whatsoever.

              You've heard it here first folks. Rust provides data race safety (Yet no race condition safety) for free with no runtime cost whatsoever. Rust is magic.

          • Initial Java had performance problems, because it was in most implementations a 100% byte code interpreter.
            Java is JIT compiled since decades, and is pretty on par with C or C++

            It is only slower in areas where they intentionally made a trade off, e.g. generational garbage collection, to avoid heap fragmentation. Or having "Generics" instead of templates, despite the fact that a real template implementation, see Pizza compiler, was already done.

            • It's been many years since I was at a Java shop but at the time garbage collection cycles were brutal and could stall our app so badly it effectively took the site down. When Java did a full GC, we found it was way faster to detect it was stuck doing a multi hour GC and restart the app than let it finish. We tried letting it go once just to see. 3 hours on a big system to full GC. I hope they've done a lot of work since then to fix how horrible full GC can be in the larger cases.

              • There is no machine on the planet , that has so much memory, that a GC can take more than a minute.

                In our days GC is usually concurrent. And does not stop the VM.

                No idea what your problem was. If it had anything at all to do with memory, then it was a memory leak, and not a GC problem.

                Let me check what the memory bandwidth of a modern CPU / RAM is ...

                One of the fastests Apple Silicons has 800GB/s the lower end is 100GB/s.

                So let's take 100GB/s

                I simplyfy a bit, to spent one single hour in a GC cycle, you wou

                • It was a GC problem. We spent significant amounts of engineering and systems time in it as well as talking to outside consulting firms and companies that claimed to have fixed it in their version of Java. We had several world class engineers working on this as their only job for extended periods of time with full company support. We ended up doing nothing about it. There was no reasonably non kludge technical solution. Each server would hang only under the heaviest usage maybe every other week or so.

                  • by cowdung ( 702933 )

                    Yes. About a decade ago large GC full collections could take more than a minute if you had a huge heap like 128 GB for example. (For example if you kept a huge in-memory cache or processed huge objects in RAM)

                    The G1 collector was invented to solve this problem. These days there is the Shenandoah and Z collectors that are even better for huge heaps. There was also a company that had a special VM that was a zero pause collector.

                    Basically the problem has been solved.

                    But also, modern Java allows people to creat

                  • Have to wonder if some thrashing happened to cause a big delay if some memory was virtualized to a slow disk and needed to be paged in for the GC?

                    I used to tune JVM memory use at NBCUniversal about a decade ago (with Java 6 or so). But those systems were using memory only in dozens of gigabytes -- not in terabytes like you mention. A full GC could take on the order of tens of seconds and potentially create various issues in real-time broadcasting, so we tuned things to avoid a full GC as much as possible.

                • No idea what your problem was. If it had anything at all to do with memory, then it was a memory leak, and not a GC problem.

                  This is the basic problem with general purpose GC languages. GC is a black box and structures that are effectively leaks are indistinguishable by mortals from structures the GC is able to collect without undue delay. You are rolling the dice and crossing your fingers on the operational reliability of the system in exchange for minor programming inconvenience.

                  • A memory leak is a memory leak. Does not matter if you have a GC language or use malloc/free etc.

                    A garbage collector can only collect garbage. Not stuff you hold for some reason forever somewhere for what ever big/reason.

                    A garbage collector simply says: this piece of memory is not referenced from anymore any longer, so it is garbage. I put it into the "free block list"

                    As long as memory is referenced: it is not garbage. It is just like you holding an empty bottle of beer. How should I know your bottle is emp

                    • A memory leak is a memory leak. Does not matter if you have a GC language or use malloc/free etc.

                      A garbage collector can only collect garbage. Not stuff you hold for some reason forever somewhere for what ever big/reason.

                      A garbage collector simply says: this piece of memory is not referenced from anymore any longer, so it is garbage. I put it into the "free block list"

                      As long as memory is referenced: it is not garbage. It is just like you holding an empty bottle of beer. How should I know your bottle is empty? You hold it. It is not in the trash can, you holding a "reference" to it. So: why would I "free" it?

                      The issue I speak of is mismatch in understanding of what is or is not referenced between what developers expect and what the GC does/sees where behaviors are unclear, uncertain or even a-priori unknowable.

          • There are very few people who "actually know what they're doing". You need to have been not only trained on provable computing techniques and understand how formal verification works, but you also need to religiously apply those principles.

            I've built C++ software in 1998 that still runs flawlessly today. It's never needed a patch. But the effort it took is beyond what most organisations will bear.

            So that is why I do not recommend C++ to anyone. There are easier ways to create performant code that don't give

            • by Dr. Tom ( 23206 )

              "shooting yourself in the foot or head unwillingly"

              I spent a long time, too, creating bullet-proof solutions to problems that don't exist anymore. Just stop using the old broken stuff we spent energy on trying to fix please

              I feel the same as Linus. DON'T PATCH THE KERNEL TO FIX BROKEN HARDWARE

              • But the drivers are commonly part of the kernel project...

                I enjoy Linux, despite it being a Unixlike it is somehow also the most modern OS in common use in that new features are often added to it first because of its combination of openness and popularity. And much which was in the kernel is now in user space, so it arguably displays some characteristics of a microkernel based system now. But I can't help but think that with as fast as even cheap hardware is now, I would be willing to give up a little bit o

            • by dfghjk ( 711126 )

              "But the effort it took is beyond what most organisations will bear."

              Then you're not very good at it. Correct coding isn't "style" that you optionally apply "if the organization will bear it".

              • Depends on what you're working on. When I was at a game company we deployed code we -knew- was buggy as Hell. Why? "We're not a hospital, no one is going to die, we'll patch it next week, let's deploy then get beers". --my CTO at the time

                There are situations and environments where "good enough" really is good enough although I'm not willing to call it a "style" of coding, either.

        • by piojo ( 995934 )

          It seems like there is an endless list of typical programming problems that are prevented in Rust. Just last week (in a memory-safe language) I added a new class member but forgot to utilize it when making object fingerprints and comparisons. In Rust we can trivially prevent that mistake--those methods should start with with destructuring assignment, slurping up all the fields of the object. And Rust fixes so many other potential bugs.

          That's not to say Rust is perfect, though as an amateur all I can say wit

        • it bruised Bjarne Stroustrup's ego. What he's doing really isn't enough though. He made this kind of dumb comment

          LOL.Yet another ignoramus spewing ignorance like it was informed comment

          You *cannot* fix stupid with code. You cannot fix bad devs with code. Ever. Quit pretending.

          Bounds checking takes zero effort. The fact that it's so prevalent speaks volumes about the number of bad devs. It has nothing to do with the language.

      • Oh and:

        I can see how in the past leaving in a ton of memory checks could hurt

        What's really stupid about that is these days is compilers can actually safely optimize out some bounds checks. Not only that, but the way modern processors do branch prediction and caching makes bounds checking barely even relevant at all these days, and that's before you even get to the fact that the IPS are so insanely high compared to back then.

      • Well, Chrome cant even run in debug builds.. So that wouldn't help much.

      • Yes but REAL MEN like me always use C++.

        In fact, in the latest 4 KB intro I coded one month ago I didn't use C/C++ but coded in assembler (NASM)!
  • Back to 40 years ago (Score:5, Interesting)

    by lsllll ( 830002 ) on Sunday November 17, 2024 @12:31AM (#64951163)

    When I was programming in Turbo Pascal in the 80s, we had the compiler flag {$R+} for automatic range checking and {$Q+} for overflow checking, but since we were working on 8088s and 8086s, we used to turn them off to reduce the code size and make the code run faster. It's funny how we've come around.

    • People (mostly C and asm programmers) used to rip on Ada's bounds checking overhead. Being fast and small and cheap was more important than being more reliable.

      • by evanh ( 627108 )

        And it serious did make a difference on the hardware of the time. CPUs are not the same any longer though. They can chew on massively more bloat without blinking these days. Bounds checking has become insignificant amongst everything else that is serially piling in.

        • That's simply incorrect. Performance still matters when there's real money at stake.
          • by evanh ( 627108 )

            Then the assembler comes out.

            • Bounds checking in C/C++ or assembly is just the same.

              On modern processors it is pretty tough to write assembly code that beats C/C++ in speed.

              However there was an article about a week ago, where they hand coded GPU code. And had extreme performance boosts in some areas.

              • by evanh ( 627108 )

                Exactly, the assembler comes out for those special libs like SIMD functions and likes. Where speed gains can really be made.

                The bounds checks are done ahead. The assembly doesn't touch allocating, or any other calls for that matter, and therefore doesn't need any inline checks. It is the leaf functions.

            • by dfghjk ( 711126 )

              Sure it does, even for someone who thinks CPUs have infinite performance?

              Someone who claims that bounds checking "has become insignificant" doesn't use an assembler and doesn't understand how stupid that statement is.

            • by Anonymous Coward
              Ah, the mythological notion you can magically make code faster by switching to Assembly Language (it's NOT called "Assembler"). It can make certain types of inter-loop, CPU-bound code faster. It will NOT make ALL code faster, especially code where the bounding resource isn't the processor.
          • That's simply incorrect. Performance still matters when there's real money at stake.

            You can say many things about google but not being "real money" isn't one of them. Anyway, they're operating at the kind of scale where it is "real money", but fact is on a modern processor with out of order execution, speculative execution and branch prediction, bounds checking is often completely free. The happy path is predicted correctly and the chances are your code doesn't have quite enough ILP to completely fill up, s

            • by cowdung ( 702933 )

              "Real money" is what buggy software costs.

              Bounds checking allows us to catch bugs quicker and earlier in the software lifecycle.

              There are places in many types of performance intensive software where bounds checking can have a negative effect. But in many other places the effect of finding bugs sooner is going to help you get the software out faster. If they are excessive they can be eliminated from the build later. It's easier to remove excessive bounds checking than to add it later.

              • "Real money" is what buggy software costs.

                Kinda, though apparently not as much as I'd expect given how companies generally seem to treat it.

                There are places in many types of performance intensive software where bounds checking can have a negative effect. But in many other places the effect of finding bugs sooner is going to help you get the software out faster. If they are excessive they can be eliminated from the build later. It's easier to remove excessive bounds checking than to add it later.

                Indeed. I th

          • by lsllll ( 830002 )

            You're right on in regard to money mattering. A friend of mine who worked at a trading company told me this. The stock exchanges have feeds between them, but each exchange provides a feed for local trading companies who host racks in their exchange. The exchange feeds a client's rack with fiber. But they ensure fairness by using the same amount of fiber for each client. So it doesn't matter if you're one rack away from the switch or 10 racks. You both have the same distance, be it a bunch of coiled fi

        • Back when CPU speeds were measured in kHz.
        • On an ARM a bounds check is usually only one instruction, as it includes the branch.
          So if you have a loop with 100 instructions, it adds only a single one.

          No idea about current intel processors. If you can keep the bound (or the last valid address of an array) in a register, and the index as well, this is simple and fast.

          On old 8/16bit hardware, you usually had not so many registers. So for every comparison you had to load the bounds again into the processor.

          I think on a 6502, that would be roughly ten inst

          • I think on a 6502, that would be roughly ten instructions to do a bounds check.

            Instructions or clock cycles?

            You couldn't use a register (not enough), but zero page is available for faster access

            • Instructions.
              You have to compare 4 bytes.
              And load two of them into the Accumulator, Subtract with carry and branch.
              Actually I guess there is a little bit more work.
              I did not program a 6502 40 years or so.

      • by dfghjk ( 711126 )

        Especially when those "people" wrote good code.

        Defensive coding techniques may not be justified for internal interfaces. Fast and small and cheap are always better when reliability is the same, you know, when code is written correctly.

        • by cowdung ( 702933 )

          Defensive coding techniques are there to find bugs. Your code could be great, but they guy calling your code won't necessarily do it correctly. Best to let him know his call is garbage immediately than wait till data is corrupted to eventually bonk out much later in the code.

          You must be new to programming.

          One problem with the C/C++ culture is this obsession with performance. I know, because I programmed in C++ for 10 years. But often it's better algorithms and data structures that will give you performance

  • Good and bad (Score:4, Interesting)

    by phantomfive ( 622387 ) on Sunday November 17, 2024 @01:24AM (#64951207) Journal
    There's some good stuff in here, like:

    "Thinking in terms of invariants: we ground our security design by defining properties that we expect to always hold for a system"

    Yeah, that's a right approach. Thinking about what you want your code to do, then proving (or demonstrating) that it does it. Unfortunately they also have this:

    "The responsibility is on our ecosystem, not the developer"

    This is false. You need to train your developers (unless they're already skilled). Programming languages give enough power to write security holes, and that's not going to change (for example, a PII leak can be written on almost any API call). A security program that doesn't take developer's into consideration is destined to fail.

    • by gweihir ( 88907 )

      A security program that doesn't take developer's into consideration is destined to fail.

      And that is just it. Whenever you do engineering (and coding is), engineer skill is the one critical factor you absolutely must keep keep in mind and must keep high. If you do not your software (as the example at hand) will be insecure, unreliable and generally badly made.

      These days, security problems stemming from unsafe memory are not even the majority of issues. They just can be especially devastating if the people involved are really incompetent (see the recent Crowdstrike disaster for an example of how

      • These days, security problems stemming from unsafe memory are not even the majority of issues.

        Rust has proven that you can eliminate these issues without runtime overhead. So fixing these issues is now considered by governments to be low-hanging fruit.

    • "The responsibility is on our ecosystem, not the developer"

      This is false. You need to train your developers (unless they're already skilled).

      We are pretty much the only industry that thinks like that. There is no contradiction between "improve eco system" and "train developers". All the other industries around us do both.

      We are also pretty unique as an industry in that we watch our products fail and then go "there is nothing we can do about that, sucks that random people were too stupid to write proper code". We urgently need to improve, or we need regulators to step in to make us improve. Code is just getting too important to continue with our

  • If you need memory safety or expect significant benefits from it, use a memory safe language and stop complaining that not all languages are. Some languages are not memory safe by design and that has its place and also has significant benefits in some scenarios.

    • "Use a memory-safe language" is all well and good if you're starting a new project from scratch.

      OTOH if you have an existing C++ codebase that is large enough and important enough, you can't necessarily afford to just throw it out and reimplement everything from scratch. So in that case your options are down to either "do nothing and hope for the best", or "find ways to make it more secure". Google is doing the latter.

  • I know you can't trivially rewrite your hundreds of thousands of c/c++ lines of code in safer languages, so this sort of thing is a decent mitigation strategy.

    But also the very definition of polishing a turd. You are using fundamentally insecure languages (because that was not even a concern at the time, fair enough, I have also written tens of thousands of lines of c and c++ code). It is impossible to ever actually make secure. Normally this does not matter, but it does when you are a big fat target like

    • You are using fundamentally insecure languages (because that was not even a concern at the time, fair enough, I have also written tens of thousands of lines of c and c++ code). It is impossible to ever actually make secure.

      Which language do you write in to make your code secure?

      • by Sarusa ( 104047 )

        Currently, if you need C level performance the only real option for safe code is Rust. There are other wannabes like Zig, Fil-C, and TrapC, but they all have significant caveats - if you actually want production code, use Rust. C++ industrial complex claims you can use the latest version with STL, but no, it's just more turd polishing.

        If you don't need C level performance, there are lots of options like python or c#. In this case the only vulnerabilities are in their VMs and they've had millions of people

        • ok, you're just wrong. Java, Python, and C# and Rust will not make your code secure. A language won't make your code secure.
          • True, but those languages make sure you do not need to worry about the entire class of bugs that are related to memory management.

            That frees up resources to think harder about all the other issues your code might have.

          • by dfghjk ( 711126 )

            He doesn't understand your point, he doesn't know what "secure" means.

    • It is impossible to ever actually make secure.
      What do you mean with secure? Secure as not attackable, or secure as not crashing, aka having memory bugs?

      • What do you mean with secure? Secure as not attackable, or secure as not crashing, aka having memory bugs?

        Memory issues are used to attack your software. Them causing crashes is the lucky case...

        • by dfghjk ( 711126 )

          Code with bugs is not secure, but fixing those bugs does not make the code secure. Safety and security are not the same.

          • by cowdung ( 702933 )

            True.. but the classic example is the buffer overflow.

            You have a char[] and you don't check that input fits, so the attacker overwrites the stack pointer and can run arbitrary code.

            I know you know this, but you're being difficult. Lots of old C and C++ code is susceptible to this catastrophic attack. Whereas Rust, Pascal and Java would not be.

            That's the point. You know it. But like to be facetious.

    • by dfghjk ( 711126 )

      "I have also written tens of thousands of lines of c and c++ code). It is impossible to ever actually make secure."
      Why are you mixing "safe" and "secure"? Do you know the difference?

      "Normally this does not matter..."
      There it is! The reason shitty code exists, the people who write it don't care.

    • Google is more than search+ads.

      Off the top of my head:
      -very popular browser
      -email
      -file storage
      -password storage
      -document editing and storage
      -dns

      These are all worth doing right.

  • "Migrating our C++ to Safe Buffers shrinks the gap between the languages, which simplifies interoperability and potentially even an eventual automated translation."

    How does it do that? Adding spatial checks doesn't affect interfaces, and therefore interoperability, at all.

  • by Dr. Tom ( 23206 ) <tomh@nih.gov> on Sunday November 17, 2024 @07:27AM (#64951607) Homepage

    I love it when software engineers create solutions to problems like memory leaks which ignore the fact that you are using a stupid stupid architecture that's prone to failure

    • The problem is retraining people who are experts at that language, and resist moving to another.
      • So the problem is that you got devs that are unwilling or unable to learn new things? That's indeed a problem...

        They will suck out the will to live from any fresh blood that joines their team. Your product is stuck in the past forever, behind a strong waöl of "we never did that before".

        • by cowdung ( 702933 )

          In defense of experts, it is hard to go with a new language that is only gaining popularity.

          There are less tools, there are less total experts, there is less experience in general.

          So being the first company to switch to that language can be very expensive and risky. Especially if in the end the industry chooses something else.

          When Google started C++ or C were their only true choices for high performance infrastructure. Even more so with Amazon and Microsoft. Rust may look now as a possible good option. But

//GO.SYSIN DD *, DOODAH, DOODAH

Working...