Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Google Security Cloud Intel Operating Systems Software Hardware

Google Says CPU Patches Cause 'Negligible Impact On Performance' With New 'Retpoline' Technique (theverge.com) 120

In a post on Google's Online Security Blog, two engineers described a novel chip-level patch that has been deployed across the company's entire infrastructure, resulting in only minor declines in performance in most cases. "The company has also posted details of the new technique, called Retpoline, in the hopes that other companies will be able to follow the same technique," reports The Verge. "If the claims hold, it would mean Intel and others have avoided the catastrophic slowdowns that many had predicted." From the report: "There has been speculation that the deployment of KPTI causes significant performance slowdowns," the post reads, referring to the company's "Kernel Page Table Isolation" technique. "Performance can vary, as the impact of the KPTI mitigations depends on the rate of system calls made by an application. On most of our workloads, including our cloud infrastructure, we see negligible impact on performance." "Of course, Google recommends thorough testing in your environment before deployment," the post continues. "We cannot guarantee any particular performance or operational impact."

Notably, the new technique only applies to one of the three variants involved in the new attacks. However, it's the variant that is arguably the most difficult to address. The other two vulnerabilities -- "bounds check bypass" and "rogue data cache load" -- would be addressed at the program and operating system level, respectively, and are unlikely to result in the same system-wide slowdowns.

This discussion has been archived. No new comments can be posted.

Google Says CPU Patches Cause 'Negligible Impact On Performance' With New 'Retpoline' Technique

Comments Filter:
  • by Anonymous Coward
    This is a hardware level problem. This will be continued to be exploited pretty much indefinitely. In my estimation this is the single biggest security problem ever created. My advice? Mortgage your house, cash out the retirement fund, and dump it all into AMD. Because Intel is going to be destroyed by lawsuit after lawsuit.
    • by Anonymous Coward

      You can fix the microcode. You can also include software workarounds for hardware flaws. An example was the Pentium F00F bug, which was addressed by the operating system.

    • by supremebob ( 574732 ) <themejunky@geoci ... minus physicist> on Thursday January 04, 2018 @08:05PM (#55866159) Journal

      Geez... You make it sound like this is the first ever time someone has had to write a software patch to bypass a hardware flaw. Driver developers have had to come up with clever workarounds to hardware defects since the the dawn of computing.

      These Intel firmware fixes are just going to become part of yet another security update that will be required to keep systems secure.

      • by 110010001000 ( 697113 ) on Thursday January 04, 2018 @08:18PM (#55866219) Homepage Journal
        Again: there are no Intel firmware fixes for Meldown. It cannot be fixed without replacing the processor. There are only mitigation workarounds.
        • by AvitarX ( 172628 )

          Based in the summary, this is a fix that dramatically reduces the impact of meltdown (too lazy to read up as it doesn't directly impact me), if they found a way to keep meltdown in the lower bound, they're doing alright.

          Lower bound being about 5% (initial patch on a pcid supporting processor was 7% in an artificial postgress benchmark that was more prone to slowdown than real life), if they found a way to get ok'd chips to that point, and shave a little bit off their, it dramatically reduces the problem.

          It

          • I don't understand this talk about 'dramatically reducing the problem'. Either there is an exploitable flaw or not. If the fix only makes implementing the type of exploit harder, then it's not going to help at all. Some assembler freak and malware author somewhere in the world will still make it work.

            I'm not claiming that there is no fix, only that mere workarounds may be of limited value. What I've read so far hasn't really reassured me. The same can be said about rowhammer, btw. What's so worrying about

        • Sure, but it's kind of like the Intel Pentium F00F bug. The underlying hardware issue will always be there, but the OS kernel can prevent that instruction from being run on the system.

    • amd needs desktop level server chips / ipmi boards. Like intel exon-e3

      Ryzen PRO chips fully support ECC so we just need a few boards with IPMI

      ThreadRipper is an nice workstation system.

        Threadripper boards with IPMI will be nice as it has higher clocks with less cores then epyc chips.

      an full eypc board is overkill for smaller site hosts.

    • by Anonymous Coward

      This is a hardware level problem. This will be continued to be exploited pretty much indefinitely.

      Have you looked at the actual retopline patches rather than simply inserting foot? It is an interesting approach to block speculative fetching by using indirect jumps/calls/returns.

  • time flies (Score:5, Funny)

    by mapkinase ( 958129 ) on Thursday January 04, 2018 @08:00PM (#55866121) Homepage Journal

    Pentium 4.99989 disaster seems like yesterday.

  • by Joe_Dragon ( 2206452 ) on Thursday January 04, 2018 @08:02PM (#55866139)

    Or just Buy AMD & get no slow down with more pci-e lanes.

    • This incident highlights the importance of maintaining vendor diversity in data centers. Modern processors are complex enough that it is not unlikely that any given design has problems waiting to be discovered. It would seem wise for large-scale clients to hedge their bets by having a mix of devices carrying their workload. Imagine the damage if someone discovered a means of bricking Intel processors and added the payload to one of the better viruses.
      • by swb ( 14022 )

        I think this would make sense if you had the vendors at rough sales parity and the virtualization vendors had healthy experience on both platforms so all the gotchas of moving live workloads between CPU vendors were understood and mitigated.

        It might actually not work well or require heterogeneous vendor-specific clusters to avoid CPU feature masking that dumbed both vendor platforms to some lowest common denominator.

        • Google, Microsoft, and Amazon dwarf Intel. They should not be waiting around for sales parity. They should be creating vendors if the vendors they need aren't there.

          In past industries, powerful industries would foster competition amongst their suppliers even if it involved significant loss. It is a necessary business expense that leads to many benefits including competition, diversity in supply (we are vulnerable to terrorists taking out foundries and countries cutting chip supplies today), and diversity in

          • by swb ( 14022 )

            My guess is that the broadest explanation is that Google, Microsoft and Amazon largely want x86 compatibility because of the efficiencies associated with the network effect of a widely adopted processor, both in terms of software availability and in terms of platform stability.

            As AMD (and failed competitors) have shown, a competing platform to Intel's CPUs isn't easy to pull off. Google, et al, could pay a subsidy to AMD to produce a competing product but there's no guarantee they would get one and they wo

    • by AHuxley ( 892839 )
      Think of the problem as a Venn diagram and the two CPU "vulnerabilities" as lists of CPU's within the diagram.
      Some cpu generations will have both issues. Some one issue. Very few will not have any problem.
  • This isn't a "chip-level" patch. The spin control here is admirable.
    • by nyet ( 19118 )

      I definitely don't see how requiring you to replace GCC and recompile every single binary is "chip-level".

      • Re:More lies (Score:5, Interesting)

        by 110010001000 ( 697113 ) on Thursday January 04, 2018 @08:12PM (#55866183) Homepage Journal
        It isn't "chip level". The Intel PR spin is out in full effect. Meltdown is a major flaw that can only be fixed by removing the flawed Intel processor and replacing it with a processor that doesn't contain the flaw. If you don't do that, the best you can do is mitigate the effects. There is no microcode fix either. What Google is doing is recompiling everything, which is fine, but hackers aren't going to do that.
        • by atrex ( 4811433 )
          Technically, you'd have to replace the motherboard too. Really no such thing as just replacing the chip since all motherboards are pretty much designed for a specific chip series at this point, unless there exists a chip in the same series without the flaw. I certainly couldn't name a single motherboard where you could choose between installing an LGA 2066 Core i9 and a Socket sTR4 Ryzen Threadripper.
    • Exactly, you can't provide a general fix to chip-level security problems by changes to "programs". People can compile their own programs and have root access on VMs that they control.

      However, Google controls the hypervisor and presumably, it's at this level that the attack can be blocked or mitigated.

      • Exactly. The funny thing is these "cloud companies" always control their own infrastructure, so these types of "fixes" make sense. Everyone else is screwed.
  • by JoeyRox ( 2711699 ) on Thursday January 04, 2018 @08:08PM (#55866177)
    Google's technique is to patch binaries so that branches/calls don't use the branch prediction mechanism of the CPU, which has a small performance hit but much smaller than KPTI. I suppose the presumption is that harmful code which uses the technique would have to compile it into their binary since most OS's prevent the self-modification of code segments/TLB entries once they've been placed into memory by the OS loader. But what about code segments generated entirely at runtime, including from interpreters and libraries like libjit?
  • by Anonymous Coward

    Meltdown patch (KPTI) will still hurt applications with lots of syscalls, or lots of userspace->kernel context switches.

  • by PhrostyMcByte ( 589271 ) <phrosty@gmail.com> on Thursday January 04, 2018 @08:47PM (#55866367) Homepage

    Google has created "retpoline", a technique which allows an indirect branch (e.g. a vtable call) to occur in a way that effectively disables speculative execution by isolating branch target prediction into a safe effectless loop. This addresses Variant 2 (aka Spectre).

    Retpoline does not depend on or assist a CPU or an OS patch: it is done purely at the software level, per-app, by a compiler. There is no simple OS-wide patch.

    Google says a retpoline call has performance "within cycles" of a regular old mispredicted branch. The zero-cost predictions we're used to are a thing of the past, because it effectively forces misprediction. I'd be curious to see a benchmark of an indirection-heavy platform like .NET.

    This does not help address or optimize Variant 3, which is what the big kernel patches for Page Table Isolation are needed for. So, your I/O-dependent apps like databases are still going to take a big performance hit. Nor does it address Variant 1.

    • EXACTLY. The summary is horrible. It made it sound like Google invented a novel technique that makes the KPTI/Variant 3 (Meltdown) mitigation slowdown "negligible". But actually the blog post simply says:

      • They invented a technique called Retpoline that mitigates Variant 2, with negligible performance impact; and
      • When testing KPTI/Variant 3 (Meltdown) mitigation on their own workflows, they found the performance impact negligible.
  • by bongey ( 974911 ) on Thursday January 04, 2018 @08:52PM (#55866407)
    Google is dependant on Intel CPUs at the moment and has a vested interest in not saying well our cloud just got 5-30% percent slower.
    • Google is dependant on Intel CPUs at the moment and has a vested interest in not saying well our cloud just got 5-30% percent slower.

      Exactly the same as their competitors, including in-house data centers as well as other cloud providers.

  • These three exploits are instances, not three different principles. The principle is the same, and there is no reason to suspect that there won't be more instances that follow that principle. CPUs speculatively execute code and load cache lines based on that execution. Intel CPUs can furthermore access privileged memory when unprivileged code is executed speculatively. That's the principle. The way the speculatively executed code is guarded and the speculative window is widened differs between the three exp

  • by jspenguin1 ( 883588 ) <jspenguin@gmail.com> on Thursday January 04, 2018 @08:55PM (#55866429) Homepage

    Not only do they misspell the name of the mitigation technique, the "retpoline" technique only protects against the indirect branch variant of Spectre. The fix for Meltdown is still KPTI, with all the same overhead that involves. The "negligible inpact on performance" is on top of the KPTI changes.

  • little or no hit to passmark performance. I haven't got any games installed to test at the moment but passmark's GPU/CPU usually give me a good idea where I'd wind up. My VMs are running fine too.
    • SSD IO seem to get hit the hardest. check there and see where you're at. On an ancient dual Xeon system I took a 30% hit.

  • How does the RETPOLINE mitigation applied to binaries deal with dynamically (JIT) de-compressed or unencrypted code? The ability for speculative pre-fetching to gather data that's normally off-limits to a process seems like a huge can of worms for code that can be pre-processed by the mitigation.

    /unrelated/ I'm not up-to-speed on webasm, but I can see how a vuln might be crafted from an instruction stream since the assembly generator is (presumably) following a recipe.
  • It would be good to have speculative execution protection as a compiler option rather than as a patch to binaries. This could tune the protection to what is necessary for each specific processor.

Power corrupts. And atomic power corrupts atomically.

Working...