Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Google Operating Systems Hardware

Google Porting All Internal Workloads To Arm (theregister.com) 44

Google is migrating all its internal workloads to run on both x86 and its custom Axion Arm chips, with major services like YouTube, Gmail, and BigQuery already running on both architectures. The Register reports: The search and ads giant documented its move in a preprint paper published last week, titled "Instruction Set Migration at Warehouse Scale," and in a Wednesday post that reveals YouTube, Gmail, and BigQuery already run on both x86 and its Axion Arm CPUs -- as do around 30,000 more applications. Both documents explain Google's migration process, which engineering fellow Parthasarathy Ranganathan and developer relations engineer Wolff Dobson said started with an assumption "that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance." [...]

The post and paper detail work on 30,000 applications, a collection of code sufficiently large that Google pressed its existing automation tools into service -- and then built a new AI tool called "CogniPort" to do things its other tools could not. [...] Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes. That's not an enormous success rate, but Google has at least another 70,000 packages to port.

The company's aim is to finish the job so its famed Borg cluster manager -- the basis of Kubernetes -- can allocate internal workloads in ways that efficiently utilize Arm servers. Doing so will likely save money, because Google claims its Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient. Those numbers, and the scale of Google's code migration project, suggest the web giant will need fewer x86 processors in years to come.

This discussion has been archived. No new comments can be posted.

Google Porting All Internal Workloads To Arm

Comments Filter:
  • by williamyf ( 227051 ) on Wednesday October 22, 2025 @07:16PM (#65744388)

    ... Google will finish the migration from X86 to ARM right in time to start the migration from ARM to RISC-V . after all, price/performance will go over 65% if google does not have to pay royalties to ARM

    • by darkain ( 749283 ) on Wednesday October 22, 2025 @07:26PM (#65744400) Homepage

      I'm someone who's handled a few hundred packages porting to ARM, not at google, but in the F/OSS community. The reality isn't "porting to ARM", its "porting away from X86", removing or gating x86-isms. The work to port to ARM is 95% already the same work needed to port to RISC-V. The main difference in hardware platforms is when dealing w/ hardware level crypto instructions or vector instructions, but again these are gated in libraries, and adding in the gates for ARM gives the vast majority of the work already needed for RISC-V.

      • The reality isn't "porting to ARM", its "porting away from X86", removing or gating x86-isms.

        I'm curious: what sort of x86-isms did you typically encounter?

        Most of the code I write these days is either scripts (Bash and Python) or pure C++. I can't think of the last time I wrote C++ which couldn't just compile for any CPU architecture I had handy.

        Platform dependencies are another thing entirely. I'm writing some C++ code to run as a system service on Ubuntu, macOS, and another weird embedded Linux. Don't get me started on the difference between systemd and launchd, nor apt-vs-yum-vs-brew for instal

        • by darkain ( 749283 ) on Thursday October 23, 2025 @12:55AM (#65744758) Homepage

          Basically the examples I just gave: using hardware crypto or hardware vector instructions are the main thing, or things similar to these. The code usually has raw C/C++ fallbacks anyways, so its just a matter of getting the proper #ifdef [arch] gates in place, and sometimes adding in the crypto/vector instructions for the given platform to ensure it operates optimally.

          • Basically the examples I just gave: using hardware crypto or hardware vector instructions are the main thing, or things similar to these.

            Interesting. Except in a few cases (e.g. I worked on a storage system which experimented with hardware-accelerated compression and encryption), I don't think I've ever worked on a piece of code which hardware crypto or vectors. If they did, for the most part the hardware-specific stuff was in a library you could, in theory, swap out.

            Either that or I was working on an embedded systems which was so completely tied to the CPU, memory, busses, and I/O that porting it to some other platform was essentially a re-

        • by _merlin ( 160982 ) on Thursday October 23, 2025 @09:31AM (#65745304) Homepage Journal

          There's some discussion of semantic differences between x86 and classic (32-bit) ARM in Microsoft's porting guides [microsoft.com], although AArch64 (64-bit ARMv8) is a bit different.

          One that I've encountered several instances of recently on AArch64 is code making assumptions about semantics of floating point to integer conversion when the input value is out of range. Here's an example of a fix [github.com] for one such issue (it's ironic that the comment said "round in a cross-platform consistent manner" when the function was unportable).

          There's the usual stuff where you'll trigger different effects of undefined behaviour in the compiler on different platforms. I recently fixed code that was doing something like u | (((f & uint8_t(1)) << 31) >> (s - 1)) where u is uint32_t, f is uint8_t and s is int. The code isn't safe because of the implicit promotion on the left shift - (f & uint8_t(1)) is implicitly promoted to int size. On i686 and x86-64, it was being treated as an unsigned int (32 bits), so the right shift was a logical shift (shifting in zeroes); on AArch64 it was being treated as an int (32 bits), so the right shift was an arithmetic shift (propagating the sign bit). Changing it to u | (uint32_t(f & uint8_t(1)) << (32 - s)) avoided the right shift altogether and fixed it.

          • ...for one such issue (it's ironic that the comment said "round in a cross-platform consistent manner" when the function was unportable).

            There's the usual stuff where you'll trigger different effects of undefined behaviour in the compiler on different platforms. I recently fixed code that was doing something like u | (((f & uint8_t(1)) << 31) >> (s - 1))

            Ah. I'm seeing it now. Yes, FP boundary conditions and weird interactions between word sizes and bitwise operations, yeah, those could bite you in the a$$. We've been trying to tightly bound the defined and undefined behaviors for years, and warn about undefined and thus undependable behavior to no avail.

      • by _merlin ( 160982 )

        I write JIT compilers [github.com], you insensitive clod.

      • Once you have a workflow with multiple toolchains, you're on your way to porting to any number of architectures. And automated testing of architecture independence in a codebase becomes practical. That's certainly how it worked out at my company. Once we started building for Sparc and PowerPC, it made other architectures easier to add a decade later. And we had the additional complication that a big chunk of the codebase is drivers. Getting ARM systems with the right hardware into the test automation pool w

    • by AmiMoJo ( 196126 )

      This will help them flush out any x86 specific code in preparation for migration to RISC-V, if and when the time comes. It will probably be a long way off though, because right now RISC-V is not getting the amount of investment it needs to be competitive with ARM in terms of performance. Both raw compute performance, and compute per watt.

    • by juancn ( 596002 )

      Power efficiency is the main driver here.

      The royalties are CapEx, OpEx dominates in this case so unless there's a RiscV that's way more power efficient than an ARM core, I don't see that happening.

  • by jrnvk ( 4197967 ) on Wednesday October 22, 2025 @07:35PM (#65744414)

    It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.

    • It doesn't need to be. There are other reasons to wean off x86.

    • Re: On one hand (Score:5, Insightful)

      by LindleyF ( 9395567 ) on Wednesday October 22, 2025 @07:53PM (#65744446)
      At Google's scale, a 1% reduction is an extra data center.
    • Google makes so much money even their enormous electricity bill is probably only a small fraction of profit, yes. But if they can cut the amount of power they use, they can add more compute without having to increase the amount of electrical capacity in a data center (or build more data centers), which not only means less capital expense but also means they can increase compute in less time. So it's definitely a concern.

      • Google makes so much money even their enormous electricity bill is probably only a small fraction of profit, yes. But if they can cut the amount of power they use, they can add more compute without having to increase the amount of electrical capacity in a data center (or build more data centers), which not only means less capital expense but also means they can increase compute in less time. So it's definitely a concern.

        This is one of the things that scales with the growth of their business. Power costs are a _signifigant_ chunk of computing costs. Their datacenter opex is probably not as insignificant as people are assuming here.

        https://www.cnbc.com/2025/07/2... [cnbc.com]
        "In its second quarter earnings, Google reported that cloud revenues increased by 32% to $13.6 billion in the period. The demand is so high for Google’s cloud services that it now amounts to a $106 billion backlog, Alphabet finance chief Anat Ashkenazi said d

        • Google reported using 32.11 million MWh in 2024. Average cost per megawatt hour for industrial customers in the US was $87.50, so we can estimate (with wide error bars, but the right order of magnitude) about $2.8 billion in electricity cost for the year. Google's net income in 2024 was about $100B, so electricity costs are a small fraction of profit. Electricity expenses are included in their cost of revenue, which was a total of $146B.

          I believe the second-order effect of being able to scale faster will

    • Businesses don't think like that. They tend to pay attention to their expenses. Spending $1 billion dollars to save $500 million per year (completely made up numbers) would probably get management approval. This is especially true as this sort of work will be a permanent improvement - so it will save money year after year.

      With this sort of work, there is probably a side-benefit: they (collectively) know their code better and the extra work to analyze existing code will probably also find real bugs. But that

    • by AmiMoJo ( 196126 )

      Their energy bill seems to be enough of a concern to piss money away on fusion startups.

    • It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.

      Power is the biggest cost in a datacenter. They're measured in megawatts. It doesn't matter who builds them, they aren't cheap and they don't get cheaper at scale.
      I'm not sure what the argument is, Google has lots of money, therefore they should be running everything on a pack of mainframes? A flock of big iron UNIX boxes? I guess you could be forgiven after years of all the AI power demand pearl-clutching and crypto waste in the news, but what do you think a square meter of plain old boring servers in any

    • It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.

      You don't get to be a three trillion dollar company by saying "We have plenty of money so we don't need to be efficient". You also don't stay a three trillion dollar company that way.

    • Itâ(TM)s all about computation per watt. Those AI data centers are sucking up electricity and water.

      So yeah, it will make a big difference in the bottom line.

  • Qualcomm is really missing out by not supporting open source. They could dominate the datacenter business.
    • It is very funny that Qualcomm was bragging about Linux support for their new (at the time) Snapdragon X, just before it was actually launched (with multiple Windows devices, all small/light portables) https://www.qualcomm.com/devel... [qualcomm.com] . And then ever since that ... crickets.

      Meanwhile they also dropped (as in killed) their own devkit (which could have been thought as a more "regular" form factor device with these SoCs), but SO hard that clearly they were very rushed so the door doesn't hit them on their way

  • Would love to have Android emulators that run on Windows on Arm chips.
  • by Kludge ( 13653 ) on Wednesday October 22, 2025 @09:43PM (#65744606)

    The most relevant part of this story is that it is incredibly easy for Google to do this kind of porting because THEY RUN LINUX.
    Linux reliably supports these multiple architectures so easily that many major distributions have x86 and ARM versions ready for download.
    Big internet companies have made so much money over the last two decades running Linux that it boggles the mind.

    • Linus should be the Richest Man on Earth based on his contributions to the world. That is... if things were fair, and contributions were actually rewarded instead of selfishness.
      • Interesting although linux has created so much value because of its ubiquity, which would not be the case if it had been monetized - like Ultrix, Solaris, HP-UX and a hundred others.
        • by caseih ( 160668 )

          The fact remains that many people got very rich off of Linus' work. Linus is not one of them. I believe he's okay with that. But it is distasteful so much money is being made with free software but those that are struggling to write and maintain it for the enjoyment of it, live in relative poverty comparatively.

          • To make it fair, how much of the value generated by Linux would go to Linus personally, vs. all the contributors to the Linux ecosystem? For example to the Apache Foundation? Stallman seems perennially miffed that GNU built a whole environment and then Linux swooped in to host it and took all the glory.
  • The cost savings will end when Arm jacks up licensing fees. Grsviton is gonna get hit too. ARM Ltd. is tired of watching their customers rake in all the revenue. And their Qualcomm lawsuit didn't work out. So Amazon, Google, MS, etc. are their next targets.

    • The cost savings will end when Arm jacks up licensing fees. Grsviton is gonna get hit too. ARM Ltd. is tired of watching their customers rake in all the revenue. And their Qualcomm lawsuit didn't work out. So Amazon, Google, MS, etc. are their next targets.

      Hence Google's investment in RISC V. It's not yet competitive, but with some time and money it can become competitive. Also, ARM can't raise the prices too much because x86 is still right there.

      • True (wrt x86 still being present), it's just that ARM may have realized a bit too late that their Neoverse pricing might have been a bit too generous.

  • Despite having multiple built-in systems to save and restore app data, both from Google and from the phone manufacturer (if not Google, for example Samsung), you still need to perform some manual app-specific data transfer for tons of apps (most will know best Whatsapp, but there are many more of all sorts, from a bit more complex app like PodcastAddict to very basic clock widget apps and whatnot. Of course, you have no permissions to get the data yourself and back it up (despite apps being permitted to wri

Every successful person has had failures but repeated failure is no guarantee of eventual success.

Working...