Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Cloud Microsoft

Microsoft Office 365 Experienced Two Major Outages Within 3 Days (crn.com) 67

On Monday long-time Slashdot reader TorinEdge wrote that Microsoft "appears to have botched an internal Office365 cloud services rollout today, with outages confirmed up and down the West Coast of North America. Confirmed roll backs were good early omens, but in the end did not appear to be successful... Symptoms may include: All 365-related services flaking out, borking, alternately approving logins and confirming they definitely do not exist."

CRN reported service was impacted for five hours. But on Thursday some users were now intermittently unable to access Microsoft Exchange from 12:52 a.m. until 10:50 p.m., "according to a Microsoft email update to Office 365 administrators..."

"Some partners believe the tech giant is grappling with a DevOps crisis." "It looks like they are pushing out software updates that are causing the outages," said a channel source impacted by one of the outages. "They have so much going on right now, rolling Teams out at a breakneck pace. I think they are running into an issue where code tested out fine but there is a configuration problem when they deploy it."

DevOps is a set of practices that, according to the Wikipedia definition, shortens the systems development life cycle and provides continuous delivery of code with high software quality... A senior executive for one of Microsoft's top partners, who did not want to be identified, said he sees both recent outages as clearly DevOps-related... "Microsoft is a development first company, well known in general for DevOps, so the question is: why is this happening?" said the executive. "I love Microsoft but why is a company that paid $7.5 billion for Github, the leading source code repository company in the world, getting taken down by code that is not being well tested or has a single point of failure. That is ridiculous. If we caused this kind of production outage for a customer we would be fired and possibly blacklisted from the ecosystem. We have to bat 1,000 as a partner."

The lesson from the outages may well be that a company's DevOps is only as "good as the humans who configure it and execute upon it," said the executive. The executive said the outages will definitely have a ripple effect in the channel. "I bet the Google G Suite sales reps threw a party when they saw this," he said.

"No cloud vendor is immune to downtime," Microsoft says in a statement quoted by CRN. "Our number one priority is to get to resolution as quickly as possible and ensure our customers stay updated along the way, as was the case here.

"We continuously invest in the resilience of our platform and focus on learning from these incidents to ultimately reduce the impact of inevitable outages..."
This discussion has been archived. No new comments can be posted.

Microsoft Office 365 Experienced Two Major Outages Within 3 Days

Comments Filter:
  • Good (Score:4, Funny)

    by backslashdot ( 95548 ) on Saturday October 03, 2020 @03:47PM (#60568982)

    People may finally have managed to get some work done.

  • Just as Redmond is about to drop update support for the old, home-based version of Office the new, cloud-based version starts crashing. Is it the virus?

    • by phantomfive ( 622387 ) on Saturday October 03, 2020 @04:36PM (#60569086) Journal

      . Is it the virus?

      Worse, it's a subscription.

    • by EvilSS ( 557649 )

      Just as Redmond is about to drop update support for the old, home-based version of Office the new, cloud-based version starts crashing. Is it the virus?

      What are you even talking about? Office 2019 is still under mainstream support, 2016 goes EOMS this month but is still under extended support until 2025, and Microsoft announced Office 2021 will also be available next year with a perpetual, non-subscription license as well.

        • That's 2010, a ten-year-old product:

          Support for Office 2010 will end on October 13, 2020 and there will be no extension and no extended security updates. All of your Office 2010 apps will continue to function. However, you could expose yourself to serious and potentially harmful security risks.

          Mind you 365 is no better:

          All of your Office 365 apps might continue to function, except for when we fuck things up yet again. In addition you will be exposing yourself to serious and potentially harmful availability risks.

          I'm so glad I'm on the offline version of 2016.

  • Innovation (Score:5, Insightful)

    by cahuenga ( 3493791 ) on Saturday October 03, 2020 @03:59PM (#60569006)
    Someday I would like to actually meet one of the masses who have been clamoring for subscription productivity software.
    • by sjames ( 1099 )

      That may be a bit difficult. The violently disturbed ward tends to have significant restrictions on visitation.

    • You can find them when you work towards the Tax and Finance departments.

      You save $$$$ if you're a business and also a college student with tax write-offs. It's a rotten deal for individuals as the only reason to use Office is for setting up meetings with bigwhigs in Outlook and sharing documents which don't make you look retarded as Libre office renders tables and fonts differently than Word.

    • Comment removed based on user account deletion
  • by rossdee ( 243626 ) on Saturday October 03, 2020 @04:00PM (#60569010)

    Does MS know this is a Leap Year ?

    • lol, i'd mod you up if I had the points

    • It would have been funny to see Office 365 down for 24 hours on that leap day.

    • After the screw up in 2016... I think, (certificate rotation failed due to trying to set an expiration date of 2/29/2017), you'd better believe that there was a lot of focus on that issue. Had a full team dedicated to ensuring everything went smoothly (now the team does code scanning to find similar ticking time bombs).

  • by I'mjusthere ( 6916492 ) on Saturday October 03, 2020 @04:00PM (#60569012)
    Excuses for days off!!

    "Sorry boss, MICROSOFT Office was down. I couldn't do anything or communicate with you."

    "Uh, there was personal email."

    "Boss, that is forbidden by company policy."

  • by I75BJC ( 4590021 ) on Saturday October 03, 2020 @04:01PM (#60569018)
    "No cloud vendor is immune to downtime," Microsoft says in a statement...

    Now that's an excuse and a rationalization...We're-Only-As-Bad-As-The-Other-Guys doesn't address the problem.

    A Pretty Lame Excuse at that!br. IMHO
    • That sort of rationalization will only lead to more downtime (if it's the team doing it, not just a PR person).
    • by dwywit ( 1109409 )

      Redmond needs to suspend those ads claiming resilience.

      • by gmack ( 197796 )

        Doubly so since office 365 is now being pushed into mission critical environments.

        That multi state 911 outage that happened at the same time wasn't a coincidence.

  • No way they guaranteed 99.99% uptime. I was actually part of our rollout of teams, and the motivation was to not manage and host the service ourselves. (I hate the idea of outsourcing to the"cloud) If you really don't want downtime, pay the big bucks to host and manage it yourself.
    • If you really don't want downtime, pay the big bucks to host and manage it yourself.

      That works but it depends on the quality of your IT team.

      • If you really don't want downtime, pay the big bucks to host and manage it yourself.

        That works but it depends on the quality of your IT team.

        But the quality of your IT team depends on your willingness to pay the big bucks (to them).

    • Their current SLA for O365 is 99.9%, so just over 8 hours downtime.

      • 99.9% uptime is a funny thing. What is the denominator - 24h? 7 Days? 30 days? ... 1 Yr?

        This is key information that tells one what duration of downtime is tolerable for a given downtime. ---- 8 hrs is for 30 days at 99.9%. This is terrible for email software.

  • by fahrbot-bot ( 874524 ) on Saturday October 03, 2020 @04:19PM (#60569048)

    "We continuously invest in the resilience of our platform and focus on learning from these incidents to ultimately reduce the impact of inevitable outages..."

    Gene Kranz had that in his notes but eventually went with, "Failure is not an option." [wikipedia.org] when speaking to his Apollo 13 Mission Control team. (Which makes me think of this: Houston [xkcd.com].)

    • by sjames ( 1099 )

      Sadly, only 20 years later, we'd just appreciate it if it wasn't the default for a change.

  • by bmimatt ( 1021295 ) on Saturday October 03, 2020 @04:30PM (#60569066)
    As a practitioner, automating Windows environments is not easy. It did get a lot better in the recent few years. Still, the paradigm of having to reboot to pick up almost any meaningful changes does not lend itself to reliable hands-off automation. I will not even go into acrobatics of dealing with Domain Controllers and AD. That's almost black magic to people coming from *nix/Linux world :)
    • OSX seems to have recently gone down the road of "every update is a reboot." I'm not really sure why.
    • I help to run 37 ADs :). With that in mind, I cannot for the life of me understand how SED and AWK work. I have to google it every time and pray the end result works.
    • Try working with a corrupted SCCM and an unhealthy A.D. schema. It's a nightmare to get anything done and our CIO won't let us have working tools as he promised the CEO 0 outages.

  • ....people will trust "The Cloud" and continue moving everything to "The Cloud", almost as if nothing had happened.

    Yes, the public really is that conditioned.

  • came on a disk. Ahh .... The good ol days.

    • When was that? Oh wait, you must mean the first version to come out on CD-ROM, gotcha.

      ----
      For you young'uns reading this: Even in the old days before the beginning of Eternal September, Microsoft Office was a "big" program. It came on multiple floppy disks. What's a "floppy disk" you may ask? Well, ask Siri or Alexa, see if they know. What's "Eternal September?" If you don't know, you were probably weren't born yet on August 31st.

  • by Rosco P. Coltrane ( 209368 ) on Saturday October 03, 2020 @04:36PM (#60569090)

    "No cloud vendor is immune to downtime,"

    And that's why you should stay away from the cloud if you value your uptime. That way you don't have unknown code monkeys disrupting your company's operations willy-nilly, and you have the ability to do something about it when shit happens.

    Even Microsoft knows it. They ain't stupid: they don't put their stuff in the cloud :)

    • And that's why you should stay away from the cloud if you value your uptime

      The cloud as better up time than my local hardware.

      • by King_TJ ( 85913 )

        Yeah.... I'm really not a fan of this move to cloudify everything. But I have to also admit that the things people are demanding today from their application software are a bit different than what we had to support traditionally in I.T. Primarily, there's this desire to collaborate on documents as a team.

        I don't see how anyone is realistically going to say they'd rather keep everything on their local hardware and still support ability for a large group to open/view a spreadsheet or other document from var

      • by gweihir ( 88907 )

        And that's why you should stay away from the cloud if you value your uptime

        The cloud as better up time than my local hardware.

        My local hardware has redundancy the cloud seems to lack. Even on a power-outage, I can still work for a few hours.

      • If you run your own HW, you can track down the idiot/incompetent boob/scapegoat and make an example of him. Microsoft looks dimly on outsiders firing their own people. That's something they'd rather do.
    • You're silly. Your local servers will have downtime too. How about this, you're probably not saving babies, your shit isn't that important and there will be outages. continue your work when the stuff it back up.

      • At least if it's your servers, you can do something about it. With some of these large cloud vendors, you're just one customer of many, and if you aren't a big or important one then their response can end up being along the lines of "yeah, yeah, when we get around to it....".

        • No they have service guarantees. If it's your servers you can wait for repair or for a major backbone provider to unfutz itself (like the recently renamed "Lumen" formerly known as CenturyLink that is trying to get away from their shame of dropping 3.5% the globe's internet last month)

      • > You're silly. Your local servers will have downtime too. How about this, you're probably not saving babies, your shit isn't that important and there will be outages. continue your work when the stuff it back up.

        We have redundant fail-over machines on mission critical hardware.
        UPSes and after 2 min a big diesel generator kicks in. Spare jerrys of fuel.
        2 separate internet connections running on different pipes.

        No we do not save babies, but people do rely on us, and many do go to the hospital, doc
        • Yes I've worked at such places too that had all that, and they still sometimes had outages for various reasons. Your failover machiens are at the same place, or merely two locations? You're going to go down at times. No better than cloud for uptime.

  • continuous delivery of code with high software quality

    Throwing more developers at a project does not make it go faster after a certain point.

    • continuous delivery of code with high software quality

      Throwing more developers at a project does not make it go faster after a certain point.

      Fred Brooks expressed that idea well in The Mythical Man-Month https://en.wikipedia.org/wiki/... [wikipedia.org] .

      • by gweihir ( 88907 )

        continuous delivery of code with high software quality

        Throwing more developers at a project does not make it go faster after a certain point.

        Fred Brooks expressed that idea well in The Mythical Man-Month https://en.wikipedia.org/wiki/... [wikipedia.org] .

        It has also proven in practice time and again by now. Yet most "managers" are unaware of this basic fact.

  • by PPH ( 736903 )

    I remember the good old days. When I could turn to my boss and say, "Sorry. I can't finish that assignment. The mainframe is down."

  • That's what makes it cloudy. All these companies are doing the exact opposite and putting everything on the same service from the same location, under the same tornado that just blew the roof off. Say goodnight...

  • Now they know how end users feel every time a Windows 10 update is pushed out which borks their system and they have to spend time trying to figure out how to get their system stable and usable again.

    I would suggest doing testing before they roll code out into production systems, but like KISS, we don't do that here.

    • It's great for certain uses. However, I did find the charting of spreadsheet data to be either very broken or hard to use.

  • No cloud vendor is immune to downtime.

    This is completely true, no matter what the cloud vendor does by way of reliability and resiliency. Which is why your applications should absolutely never be dependent on a single cloud vendor (unless you like uncontrollable outages).

  • "A distributed system is one in which I cannot get something done because a machine I've never heard of is down."

    Whoever thought the cloud would be a good idea for mission-critical software is a fool and unaware of history.

    • Whoever thought the cloud would be a good idea for mission-critical software is a fool and unaware of history.

      That would be Accounting, they share with Middle Management. First elevator on the left, floors 13-18. Pick one and have a nice rest of your day...

      • by gweihir ( 88907 )

        Whoever thought the cloud would be a good idea for mission-critical software is a fool and unaware of history.

        That would be Accounting, they share with Middle Management. First elevator on the left, floors 13-18. Pick one and have a nice rest of your day...

        Ah, yes. That place of the building that, if it falls of the face of the earth, first nobody notices, and then people begin to find things are working suspiciously well..

  • Seriously, I found Like button in web interface of Outlook for each received correspondence, never been there before. Do they count the mail messages, users liked? Do they build lists of pleasing mailings for the receiver? Do they return notification to the sender, that their message was liked?

    Too big a break of traditional mailing concept, to comprehend easily.

    Then, first messages in the subject threads were not appearing this week, another illustration of basic functionality getting broken, while unnecess

  • I use LibreOffice instead of Microsoft Office since I retired. I have Microsoft Office but deleted it. For my needs now LibreOffice works just fine and I never have a "down time" to deal with..

"An idealist is one who, on noticing that a rose smells better than a cabbage, concludes that it will also make better soup." - H.L. Mencken

Working...