Google Porting All Internal Workloads To Arm (theregister.com) 44
Google is migrating all its internal workloads to run on both x86 and its custom Axion Arm chips, with major services like YouTube, Gmail, and BigQuery already running on both architectures. The Register reports: The search and ads giant documented its move in a preprint paper published last week, titled "Instruction Set Migration at Warehouse Scale," and in a Wednesday post that reveals YouTube, Gmail, and BigQuery already run on both x86 and its Axion Arm CPUs -- as do around 30,000 more applications. Both documents explain Google's migration process, which engineering fellow Parthasarathy Ranganathan and developer relations engineer Wolff Dobson said started with an assumption "that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance." [...]
The post and paper detail work on 30,000 applications, a collection of code sufficiently large that Google pressed its existing automation tools into service -- and then built a new AI tool called "CogniPort" to do things its other tools could not. [...] Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes. That's not an enormous success rate, but Google has at least another 70,000 packages to port.
The company's aim is to finish the job so its famed Borg cluster manager -- the basis of Kubernetes -- can allocate internal workloads in ways that efficiently utilize Arm servers. Doing so will likely save money, because Google claims its Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient. Those numbers, and the scale of Google's code migration project, suggest the web giant will need fewer x86 processors in years to come.
The post and paper detail work on 30,000 applications, a collection of code sufficiently large that Google pressed its existing automation tools into service -- and then built a new AI tool called "CogniPort" to do things its other tools could not. [...] Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes. That's not an enormous success rate, but Google has at least another 70,000 packages to port.
The company's aim is to finish the job so its famed Borg cluster manager -- the basis of Kubernetes -- can allocate internal workloads in ways that efficiently utilize Arm servers. Doing so will likely save money, because Google claims its Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient. Those numbers, and the scale of Google's code migration project, suggest the web giant will need fewer x86 processors in years to come.
with 70000 packages remaining... (Score:4, Interesting)
... Google will finish the migration from X86 to ARM right in time to start the migration from ARM to RISC-V . after all, price/performance will go over 65% if google does not have to pay royalties to ARM
Re:with 70000 packages remaining... (Score:5, Interesting)
I'm someone who's handled a few hundred packages porting to ARM, not at google, but in the F/OSS community. The reality isn't "porting to ARM", its "porting away from X86", removing or gating x86-isms. The work to port to ARM is 95% already the same work needed to port to RISC-V. The main difference in hardware platforms is when dealing w/ hardware level crypto instructions or vector instructions, but again these are gated in libraries, and adding in the gates for ARM gives the vast majority of the work already needed for RISC-V.
Re: (Score:2)
The reality isn't "porting to ARM", its "porting away from X86", removing or gating x86-isms.
I'm curious: what sort of x86-isms did you typically encounter?
Most of the code I write these days is either scripts (Bash and Python) or pure C++. I can't think of the last time I wrote C++ which couldn't just compile for any CPU architecture I had handy.
Platform dependencies are another thing entirely. I'm writing some C++ code to run as a system service on Ubuntu, macOS, and another weird embedded Linux. Don't get me started on the difference between systemd and launchd, nor apt-vs-yum-vs-brew for instal
Re:with 70000 packages remaining... (Score:4, Interesting)
Basically the examples I just gave: using hardware crypto or hardware vector instructions are the main thing, or things similar to these. The code usually has raw C/C++ fallbacks anyways, so its just a matter of getting the proper #ifdef [arch] gates in place, and sometimes adding in the crypto/vector instructions for the given platform to ensure it operates optimally.
Re: (Score:2)
Basically the examples I just gave: using hardware crypto or hardware vector instructions are the main thing, or things similar to these.
Interesting. Except in a few cases (e.g. I worked on a storage system which experimented with hardware-accelerated compression and encryption), I don't think I've ever worked on a piece of code which hardware crypto or vectors. If they did, for the most part the hardware-specific stuff was in a library you could, in theory, swap out.
Either that or I was working on an embedded systems which was so completely tied to the CPU, memory, busses, and I/O that porting it to some other platform was essentially a re-
Re:with 70000 packages remaining... (Score:4, Informative)
There's some discussion of semantic differences between x86 and classic (32-bit) ARM in Microsoft's porting guides [microsoft.com], although AArch64 (64-bit ARMv8) is a bit different.
One that I've encountered several instances of recently on AArch64 is code making assumptions about semantics of floating point to integer conversion when the input value is out of range. Here's an example of a fix [github.com] for one such issue (it's ironic that the comment said "round in a cross-platform consistent manner" when the function was unportable).
There's the usual stuff where you'll trigger different effects of undefined behaviour in the compiler on different platforms. I recently fixed code that was doing something like u | (((f & uint8_t(1)) << 31) >> (s - 1)) where u is uint32_t, f is uint8_t and s is int. The code isn't safe because of the implicit promotion on the left shift - (f & uint8_t(1)) is implicitly promoted to int size. On i686 and x86-64, it was being treated as an unsigned int (32 bits), so the right shift was a logical shift (shifting in zeroes); on AArch64 it was being treated as an int (32 bits), so the right shift was an arithmetic shift (propagating the sign bit). Changing it to u | (uint32_t(f & uint8_t(1)) << (32 - s)) avoided the right shift altogether and fixed it.
Re: (Score:2)
...for one such issue (it's ironic that the comment said "round in a cross-platform consistent manner" when the function was unportable).
There's the usual stuff where you'll trigger different effects of undefined behaviour in the compiler on different platforms. I recently fixed code that was doing something like u | (((f & uint8_t(1)) << 31) >> (s - 1))
Ah. I'm seeing it now. Yes, FP boundary conditions and weird interactions between word sizes and bitwise operations, yeah, those could bite you in the a$$. We've been trying to tightly bound the defined and undefined behaviors for years, and warn about undefined and thus undependable behavior to no avail.
Re: (Score:2)
I write JIT compilers [github.com], you insensitive clod.
Re: (Score:2)
Once you have a workflow with multiple toolchains, you're on your way to porting to any number of architectures. And automated testing of architecture independence in a codebase becomes practical. That's certainly how it worked out at my company. Once we started building for Sparc and PowerPC, it made other architectures easier to add a decade later. And we had the additional complication that a big chunk of the codebase is drivers. Getting ARM systems with the right hardware into the test automation pool w
Re: (Score:2)
This will help them flush out any x86 specific code in preparation for migration to RISC-V, if and when the time comes. It will probably be a long way off though, because right now RISC-V is not getting the amount of investment it needs to be competitive with ARM in terms of performance. Both raw compute performance, and compute per watt.
Re: (Score:3)
Power efficiency is the main driver here.
The royalties are CapEx, OpEx dominates in this case so unless there's a RiscV that's way more power efficient than an ARM core, I don't see that happening.
On one hand (Score:3)
It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.
Re: (Score:1)
It doesn't need to be. There are other reasons to wean off x86.
Re: On one hand (Score:5, Insightful)
Re: (Score:3)
Google makes so much money even their enormous electricity bill is probably only a small fraction of profit, yes. But if they can cut the amount of power they use, they can add more compute without having to increase the amount of electrical capacity in a data center (or build more data centers), which not only means less capital expense but also means they can increase compute in less time. So it's definitely a concern.
Re: (Score:2)
Google makes so much money even their enormous electricity bill is probably only a small fraction of profit, yes. But if they can cut the amount of power they use, they can add more compute without having to increase the amount of electrical capacity in a data center (or build more data centers), which not only means less capital expense but also means they can increase compute in less time. So it's definitely a concern.
This is one of the things that scales with the growth of their business. Power costs are a _signifigant_ chunk of computing costs. Their datacenter opex is probably not as insignificant as people are assuming here.
https://www.cnbc.com/2025/07/2... [cnbc.com]
"In its second quarter earnings, Google reported that cloud revenues increased by 32% to $13.6 billion in the period. The demand is so high for Google’s cloud services that it now amounts to a $106 billion backlog, Alphabet finance chief Anat Ashkenazi said d
Re: (Score:3)
Google reported using 32.11 million MWh in 2024. Average cost per megawatt hour for industrial customers in the US was $87.50, so we can estimate (with wide error bars, but the right order of magnitude) about $2.8 billion in electricity cost for the year. Google's net income in 2024 was about $100B, so electricity costs are a small fraction of profit. Electricity expenses are included in their cost of revenue, which was a total of $146B.
I believe the second-order effect of being able to scale faster will
Re: (Score:3)
Businesses don't think like that. They tend to pay attention to their expenses. Spending $1 billion dollars to save $500 million per year (completely made up numbers) would probably get management approval. This is especially true as this sort of work will be a permanent improvement - so it will save money year after year.
With this sort of work, there is probably a side-benefit: they (collectively) know their code better and the extra work to analyze existing code will probably also find real bugs. But that
Re: (Score:2)
Their energy bill seems to be enough of a concern to piss money away on fusion startups.
Re: (Score:2)
It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.
Power is the biggest cost in a datacenter. They're measured in megawatts. It doesn't matter who builds them, they aren't cheap and they don't get cheaper at scale.
I'm not sure what the argument is, Google has lots of money, therefore they should be running everything on a pack of mainframes? A flock of big iron UNIX boxes? I guess you could be forgiven after years of all the AI power demand pearl-clutching and crypto waste in the news, but what do you think a square meter of plain old boring servers in any
Re: (Score:2)
It is great to be platform agnostic. On the other hand, I sincerely doubt that the difference in energy consumption between x86-64 and ARM is significant enough to be a concern at Google, considering their market cap is literally three trillion dollars now.
You don't get to be a three trillion dollar company by saying "We have plenty of money so we don't need to be efficient". You also don't stay a three trillion dollar company that way.
Re: On one hand (Score:2)
Itâ(TM)s all about computation per watt. Those AI data centers are sucking up electricity and water.
So yeah, it will make a big difference in the bottom line.
Qualcomm is Missing Out (Score:2)
Re: (Score:2)
Which mainstream AI packages don't support Strix Halo?
Re: (Score:2)
Vulkan backends tend to work, but not perform great.
ROCm on strix is a fucking nightmare.
Re: (Score:2)
Looks like it's a bit more complicated than the ac was letting on. But that has little to do with ARM so I'll leave it at that.
Re: (Score:2)
The ROCm ecosystem is just kind of a disaster.
As Vulkan kernels get better, that's less of a problem.
Re: (Score:2)
It is very funny that Qualcomm was bragging about Linux support for their new (at the time) Snapdragon X, just before it was actually launched (with multiple Windows devices, all small/light portables) https://www.qualcomm.com/devel... [qualcomm.com] . And then ever since that ... crickets.
Meanwhile they also dropped (as in killed) their own devkit (which could have been thought as a more "regular" form factor device with these SoCs), but SO hard that clearly they were very rushed so the door doesn't hit them on their way
How about emulators for Windows on Arm? (Score:1)
Left out the most relevant part of the story! (Score:4, Informative)
The most relevant part of this story is that it is incredibly easy for Google to do this kind of porting because THEY RUN LINUX.
Linux reliably supports these multiple architectures so easily that many major distributions have x86 and ARM versions ready for download.
Big internet companies have made so much money over the last two decades running Linux that it boggles the mind.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The fact remains that many people got very rich off of Linus' work. Linus is not one of them. I believe he's okay with that. But it is distasteful so much money is being made with free software but those that are struggling to write and maintain it for the enjoyment of it, live in relative poverty comparatively.
Re: (Score:2)
Enjoy it while it lasts (Score:2)
The cost savings will end when Arm jacks up licensing fees. Grsviton is gonna get hit too. ARM Ltd. is tired of watching their customers rake in all the revenue. And their Qualcomm lawsuit didn't work out. So Amazon, Google, MS, etc. are their next targets.
Re: (Score:2)
Their licensing model isn't complicated. You either get a design license or you license cores/core families. Amazon, for example, is (or has been) using fairly bog standard Neoverse setups with small-ish L3. If they intend to license newer iterations of Neoverse then they will pay accordingly.
Anyone with a full design license (such as Apple) gets access to the entire version. Apple fairly recently updated to a v9 license (M4, probably others) so they're good for awhile. Each individual license may incl
Re: (Score:2)
The cost savings will end when Arm jacks up licensing fees. Grsviton is gonna get hit too. ARM Ltd. is tired of watching their customers rake in all the revenue. And their Qualcomm lawsuit didn't work out. So Amazon, Google, MS, etc. are their next targets.
Hence Google's investment in RISC V. It's not yet competitive, but with some time and money it can become competitive. Also, ARM can't raise the prices too much because x86 is still right there.
Re: (Score:2)
True (wrt x86 still being present), it's just that ARM may have realized a bit too late that their Neoverse pricing might have been a bit too generous.
LOL you can't migrate Android to (same) Android (Score:1)
Despite having multiple built-in systems to save and restore app data, both from Google and from the phone manufacturer (if not Google, for example Samsung), you still need to perform some manual app-specific data transfer for tons of apps (most will know best Whatsapp, but there are many more of all sorts, from a bit more complex app like PodcastAddict to very basic clock widget apps and whatnot. Of course, you have no permissions to get the data yourself and back it up (despite apps being permitted to wri