

Microsoft Cloud Computing System Suffering From Global Shortage (theinformation.com) 24
Due to a confluence of crises, the second-largest cloud provider has been operating in the yellow zone, meaning its data centers have a less-than-normal level of servers available. From a report: In March 2020, Microsoft's Azure cloud buckled under the strain of companies around the world shifting to remote work, causing service outages and forcing some customers to wait to launch and update applications. Microsoft put a positive spin on the situation, characterizing it as a temporary issue that stemmed from the surging usage of its Teams collaboration software and the rapid growth in adoption it was seeing for Azure services broadly. But over two years later, more than two dozen Azure data centers in countries around the world are operating with limited server capacity available to customers, according to two current Microsoft managers contending with the issue and an engineer who works for a major customer. And in more than half a dozen Azure data centers -- including a key one in central Washington state and others in Europe and Asia -- server capacity is expected to remain limited until early next year, said one of the Microsoft managers.
Break the habit of a lifetime (Score:1)
server capacity is expected to remain limited until early next year
Time for Microsoft to start writing their apps with a view to efficiency of operation, not speed of delivery?
Re: (Score:2)
Downside (Score:4, Interesting)
Re: (Score:3)
Obviously during the age of VPS [wikipedia.org] this wasn't true. ;-) Or really the entire internet since that's a "third party" that one has little to no control over.
Re: (Score:2)
Re: (Score:2)
These big cloud providers are, I fear, a heartbeat away from big failure.
I doubt that but I do expect there are some dragons when it comes to the assumptions people are operating under.
Lots of big business has turned to the cloud for auto scaling, and the implicit assumption seem to be Microsoft/Amazon will be able to deliver whatever capacity we need when we need it. I am waiting for an F500, or some big financial services site, or government site to have a special event of some kind and discover the capacity they expect and need isn't there.
Not that is new or different from an
Re: (Score:1)
Re: (Score:2)
Shall we play a game?
Re: (Score:2)
due to limited cpu power the only sites you can play are
1. USRR
2. USA
Know your limits (Score:3, Insightful)
If you have a unified contract with Microsoft and a CSA you should be able to "know your limits". Part of their job is to be able to talk about capacity and scaling.
They should also be able to talk about capacity reservation to guarantee resources as well.
Re: Know your limits (Score:2)
Should.
Re: (Score:3)
I imagine customers who are on the higher tier packages with SLAs in place are not having problems. It's the lower "best effort" tiers that feel the pain.
Microsoft's problems are probably at least in part due to unavailability of parts. It's not easy to buy large numbers of servers these days, and prices are still high.
Re: (Score:2)
Thankfully the worst of the pandemic-related supply chain perturbations didn't line up with any significant server or switch refreshes for us; but given the ridiculous shenanigans we've had from normally c
Re: Downside (Score:3)
Private data centers have similar problems. Maybe your management doesn't see the need to invest in new servers, and is happy to let the data center run in the yellow? Maybe you can't get the hardware you want because there is a shortage? If there is a shortage the vendors are probably selling the hardware to their big customers like Microsoft first, and there is nothing left for you.
Re: (Score:2)
On the other hand, the upside is you're relying on the reliability of a third party who likely has better processes, more robust facilities, skilled staff, and access to hardware and software through multiple distributors.
Re: (Score:2)
This is one of the downsides of the infrastructure as a service model of computing. You're entirely relying on the reliability of a third party. These big cloud providers are, I fear, a heartbeat away from big failure.
No. It just happened to one provider under very specific circumstances, a damned global pandemic. Every damned thing got affected, not just cloud infrastructure. It's a function of things first affected by a supply shock, then meeting a sudden demand shock.
It's a damned miracle Amazon or GCP didn't buckle under the sudden and abnormal demand shock. We architect things to handle reasonable expectations of growth with graceful degradation during occasional blips. We don't architect things to survive a pand
How about not selling it then? (Score:2)
Who wants to bet that the salesdroids aren't even slowing down?
Re: How about not selling it then? (Score:3)
We had a 2 hour sales pitch from Microsoft on their recently acquired Metaswitch products. They talked at length about moving the ecosystem into azure, But were clear they couldn't do it today, And we're looking at 12 to 18 months timeline. I remember thinking it was odd that one of their reasonings for this was that they were concerned about being able to provide 99.99% up time. Resource availability would seem to be a good reasoning for that because the phone switches require more dedicated resources than
Re: (Score:2)
This is the truth about their ideal sales pitches -- display capacity planning in their own internal processes akin to the types of capacity planning practices they equally expect from their clients.
Within Microsoft, nothing happens in an instant (especially when it comes to resource allocations due to internal business procedures) and their best measuring sticks are the trailing-30-days metrics. By understanding the next-month impacts of decisions today, they can
Re: (Score:2)
Who wants to bet that the salesdroids aren't even slowing down?
It never stopped them in the past when it came to marking newer versions of Windonts... why should it stop them now?
Less than normal? (Score:1)
Re: (Score:2)
Or would it be "fewer"? I'm no English nerd but this sounds wrong.
In this instance I'd opt for "lower than normal" or "below normal".
"Less" is for uncountable nouns (gas, food, baggage, etc) whereas "fewer" is for countable nouns (marbles, dogs, doors, etc).
So you have less food, not fewer food, and you have more marbles, not greater.
Less baggage, not fewer baggage, except when referring to the singular collective, i.e. "bags vs "baggage", "dog" vs "dogs".
So you can have fewer bags, but not less bags.
Anyway, cheerio.
Been happening for years (Score:2)
Back in 2012 MS literally couldn't build data centers fast enough to meet the demand for Azure, so they went on a rampage and started buying up every suitable building they could find, gutting it, and making a high-density server farm out of it.
They're all over the fucking place; there's probably one or more within 10 miles of you right now.
They had all sorts of designs for rapid scaling and deployment, literally plug-and-play sea crates with racks of servers stacked inside, drop-in-place "communication spi