Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Microsoft IT Technology

Microsoft Hits Back at Delta in Clash Over System Breakdown (bloomberg.com) 166

Microsoft said Delta Air Lines turned down repeated offers for assistance following last month's catastrophic system outage, echoing claims by CrowdStrike in an increasingly contentious conflict between the carrier and its technology partners. From a report: Microsoft employees reached out to Delta to give technical support every day from July 19 through July 23, and "each time Delta turned down Microsoft's offers to help," according to a letter Tuesday from the technology giant's attorneys to Delta's representatives. Microsoft Chief Executive Officer Satya Nadella also personally emailed Delta CEO Ed Bastian and never heard back. "Even though Microsoft's software had not caused the CrowdStrike incident, Microsoft immediately jumped in and offered to assist Delta at no charge," according to the letter, which was signed by Mark Cheffo of Dechert LLP. The claims, in response to Delta's hiring of attorney David Boies, heighten the tension after Delta suggested it would try to seek compensation for a breakdown it expects to cost it $500 million this quarter. The airline was slower to recover than competitors after an errant software update from CrowdStrike affected Microsoft systems, creating a cascading effect that led Delta to cancel thousands of flights over several days.
This discussion has been archived. No new comments can be posted.

Microsoft Hits Back at Delta in Clash Over System Breakdown

Comments Filter:
  • Makes me wonder (Score:5, Interesting)

    by jenningsthecat ( 1525947 ) on Tuesday August 06, 2024 @11:23AM (#64685304)

    What might Delta have to hide, that they turned down clear offers from a company that was in a position to provide good, useful help? To me this smells like something a bit shadier than garden-variety c-suite incompetence.

    • Re:Makes me wonder (Score:4, Informative)

      by nightflameauto ( 6607976 ) on Tuesday August 06, 2024 @11:31AM (#64685322)

      What might Delta have to hide, that they turned down clear offers from a company that was in a position to provide good, useful help? To me this smells like something a bit shadier than garden-variety c-suite incompetence.

      Or it could be as simple as a contact flub. Some support personnel at Microsoft makes the auto-generated call request to Delta, gets the wrong support monkey and the conversation is as follows:

      "This is Microsoft support contacting you regarding incident (blah blah blah) and checking if you need any assistance."
      Support Monkey that's on the phone, because they aren't involved in attempting to rectify the current clusterfuck: "Uh, I don't see anybody around to ask, so I guess we're good?"
      "OK." Click.

      I mean, I'm sure it could be something nefarious, but assuming so seems counter to the way the real world works. It's probably just too many levels of people too separated from one another to have idea WTF is going on as its happening. Post-mortem would be interesting, but being all private companies I doubt the public will ever find out the truth.

      • Re:Makes me wonder (Score:4, Insightful)

        by DarkOx ( 621550 ) on Tuesday August 06, 2024 @11:40AM (#64685346) Journal

        Its also not always simple to just bring in outside people either. What do policies say about exposing confidential data to non-employees? Depending on industry they may be legal or regulatory hurtles there?

        Keep in mind too the decision makers who are in a position to suspend such policies, are very busy with aspects of crisis management. You might be some Director in IT, getting a call from Microsoft Support, the VP you need to allow you to "read them in" is currently doing CNN interviews..

        Nothing is easy to do in the middle of that kind of shit show. That is WHY everyone makes disaster recover run-books, etc. Which logically don't include "and them Microsoft hopefully calls us and offers help."

        • by tysonedwards ( 969693 ) on Tuesday August 06, 2024 @11:45AM (#64685374)
          It is also worth calling out: we sent them an email asking if they needed help - while they were suffering from a multi-day long full system outage - and they did not respond. It seems very plausible that the people involved were very busy, and the normal communication channels were impacted.
          • Aw, there you go raining on my conspiracy theory. Makes sense though.

          • by Bahbus ( 1180627 )

            They were only very busy because the people are incompetent. So incompetent that they can't even check their emails on any other device except their work computers.

        • Its also not always simple to just bring in outside people either. What do policies say about exposing confidential data to non-employees? Depending on industry they may be legal or regulatory hurtles there?

          Keep in mind too the decision makers who are in a position to suspend such policies, are very busy with aspects of crisis management. You might be some Director in IT, getting a call from Microsoft Support, the VP you need to allow you to "read them in" is currently doing CNN interviews..

          Nothing is easy to do in the middle of that kind of shit show. That is WHY everyone makes disaster recover run-books, etc. Which logically don't include "and them Microsoft hopefully calls us and offers help."

          Makes perfect sense - no idea why you were modded down.

      • Or it could be as simple as a contact flub.

        According to the blurb, the head of Microsoft emailed the head of Delta and never heard back.

        I mean, I'm sure it could be something nefarious,

        Such as incompetence? Stupidity? Ego? Pride? Take your pick.
        • by gtall ( 79522 )

          Head of Delta: Hmmm....a message from the Head of Microsoft. Looks like a fishing attempt to me. Underling, has any one from MS called me.

          Underling: Nope, we check the logs, nada.

          Head of Delta: Well, if he wants to get a hold of me, he can just use a phone like a normal person instead of sending something a dubious as an email message. Underling, if we get any calls from MS, make sure you find out who it is and we will call them back, we cannot trust an AI-infected world anymore.

        • Or it could be as simple as a contact flub. According to the blurb, the head of Microsoft emailed the head of Delta and never heard back. I mean, I'm sure it could be something nefarious, Such as incompetence? Stupidity? Ego? Pride? Take your pick.

          Heads of such diverse companies rarely expect contact from one another. For me, not that I'm a CEO, but anything from a higher up at Microsoft goes right to the spam filter. In fact, Microsoft's spam filter would probably catch it before mine did.

          • Yes, but this is microsoft and delta. Iâ(TM)m sure ceos had multiple contacts in the course of developing their relationship. Iâ(TM)m sure theyâ(TM)re in each otherâ(TM)s mobile. Iâ(TM)m also sure, if they really thought it was a microsoft failure, ceo would be reaching out with a big ol WTF?

        • Comment removed based on user account deletion
        • by taustin ( 171655 )

          Or an assumption that it was a phishing attack.

        • by klubar ( 591384 )

          Probably emailed something like info @ delta.com or executives @ delta.com. There call center is probably overwhelmed with emails to executives with refund issues and cancelled flights.

          There probably is an senior Microsoft account executive who probably has a real email address for the Delta CTO.

    • by Anonymous Coward on Tuesday August 06, 2024 @11:34AM (#64685324)

      The eight most terrifying words in the English language are "We're from Microsoft, and we're here to help."

    • What can MS do in this situation? It all comes down to man hours. Delta has who knows how many thousand machines and every one needs a person to sit there and enter the bitlocker code so the machine can boot from a flash drive.

    • Re: Makes me wonder (Score:4, Informative)

      by topham ( 32406 ) on Tuesday August 06, 2024 @11:56AM (#64685412) Homepage

      If Delta is seriously considering suing they cannot bring in Microsoft or CrowdStrike like this. You don't invite the fox into the henhouse.

    • Microsoft wouldn't have offered to do it for free. What I'm wondering: what was the price tag?

    • Comment removed based on user account deletion
      • by gweihir ( 88907 )

        Exactly. The "argument" MS is trying to make here is completely worthless. They are just posturing for the press.

      • by dnaumov ( 453672 )

        Or it could simply be that Delta, holding Microsoft at least partly to blame for creating the problem, didn't want to take a chance that Microsoft would exacerbate rather than ameliorate their difficulties?

        One could only hope Delta is not THAT much of a trainwreck of a company to have a person holding these kinds of opinions in a position of any sort of power.

        This was not and is not, at any point, "a Microsoft issue".
        You are beyond probably even professional medical help if you genuinely believe the people who designed the OS will somehow make your situation worse.

    • by XXongo ( 3986865 )

      What might Delta have to hide, that they turned down clear offers from a company that was in a position to provide good, useful help?

      "Microsoft support" calls me every couple of weeks telling me that they can fix my computer over the phone. Even though I don't have a Microsoft computer. Don't think it's good, useful help, though.

      They probably just hung up on them.

    • by tlhIngan ( 30335 )

      Perhaps Delta's infrastructure is in worrying shape - turning down Microsoft and CrowdStrike might be a way to avoid showing third parties how bad it actually is.

      I mean, there is usually an order to restoring services - usually you'd get the Active Directory server up and running first as directory services is foundational to getting the rest of the systems active and permissions and such.

      But then, Microsoft and CrowdStrike will likely ask - where are your AD servers? Are you using a cloud AD as well (which

    • What might Delta have to hide, that they turned down clear offers from a company that was in a position to provide good, useful help?

      Their own hubris? I've worked for plenty of companies who think they know better than vendors when it comes to the vendor's own products.

  • Fingers be flying! (Score:5, Insightful)

    by RitchCraft ( 6454710 ) on Tuesday August 06, 2024 @11:28AM (#64685316)

    All three companies are at fault here. Delta for having an insufficient IT infrastructure and disaster recovery plan. Microsoft for it's abysmal operating system offering a vector for these types of catastrophes. CrowdStrike for it's incredible ineptness and handling of updates and QA procedures. Hell, big tech in general is an absolute shit show and needs regulation now or this is just going to keep happening. This was a big storm but the perfect storm is still on the horizon heading our way.

    • by gtall ( 79522 )

      Bingo, too many people these days assume there is a single cause for something, a cause they can honk on about 'till weare all sick of hearing from them.

    • by gweihir ( 88907 )

      Exactly. But he human race has this inability to prevent large-scale catastrophes due to abject stupidity. Hence we will get that urgently needed regulation only when the world experiences a week (or more) of shutdown of _everything_ . All that crap built on MS trash and around it is incredibly fragile.

    • Microsoft for it's abysmal operating system offering a vector for these types of catastrophes.

      By offering a vector this means allowing kernel drivers to be installed by third parties. I don't agree that places Microsoft at fault for anything. A bug in the Falcon kernel module for Linux could easily have resulted in the exact same outcome.

    • by dnaumov ( 453672 )

      All three companies are at fault here. Delta for having an insufficient IT infrastructure and disaster recovery plan. Microsoft for it's abysmal operating system offering a vector for these types of catastrophes.

      How about your stop with the ridiculous FUD? If Microsoft is to blame for offering "a vector for these types of catastrophes", just how much do you hate Linux? After all, you do realize Linux offers a whole ton MORE of these kinds of vectors and they are publicly available to anyone?

    • Microsoft for it's abysmal operating system offering a vector for these types of catastrophes.

      Yeah sure. Let's just lock you out of your own system so that you can't ever install any software that messes with your machine even if you're an administrator on that machine. That's what you want don't you? A walled garden where you're protected from yourself?

  • They had a partnership with MS and had WinMo devices for all their cabin crew. That died. I think there is no love here.
  • by TheNameOfNick ( 7286618 ) on Tuesday August 06, 2024 @11:37AM (#64685344)

    We have detected a problem with your machiine.

  • by Murdoch5 ( 1563847 ) on Tuesday August 06, 2024 @11:44AM (#64685368) Homepage
    Microsoft can't claim innocence, they absolutely were the cause of the issue. They might not have been the root cause of the issue, but their software shit the bed, and locked up. Crowd Strike released a buggy update, we all accept that, but Windows should have handled it better.

    I'll give credit that maybe Microsoft didn't know this was possible, or, some safety failed, or any number of weird IT / Developer voodoo happened. Even if Microsoft had no knowledge of lockup being possible, they still caused it.

    The best thing Microsoft can do, own it. Microsoft needs to jump in front of the bullet, admit they share at least 50% of the blame, and provide a real solution going forwards. Microsoft should take this event and turn themselves around, they have an excellent open door to move from hobby / creepy uncle peephole software, to a professional powerhouse software company.

    Why should Delta have accepted the offer of help? Has anyone honestly gotten even passable custom service from Microsoft? Look at how they're handling the issue after the fact, and being they can't accept ownership, do you think they would have actually helped?
    • Re:Not really (Score:4, Insightful)

      by ArchieBunker ( 132337 ) on Tuesday August 06, 2024 @11:59AM (#64685416)

      CrowdStrike was writing equally shitty kernel modules that crashed Linux. https://www.theregister.com/20... [theregister.com]

      How do you mitigate against third party kernel modules other than outright denying access?

      This would have absolutely happened if Delta had 10,000 instances of Red Hat.

      • I'm not going to suggest this is a Windows only issue, it's not, but Microsoft's response has been terrible. How do you protect against bad data in the kernel? Well, how do you protect about bad data anywhere else? Now that the problem is known, Microsoft should add additional protections to assure in such a case going forwards, it mitigates it, and keep running in a usable state. If Windows had just disabled Crowd Strike, because of the broken service, then fair enough, they'd be clear and clean.
      • Re:Not really (Score:4, Informative)

        by test321 ( 8891681 ) on Tuesday August 06, 2024 @12:15PM (#64685508)

        The crash on linux was a linux bug that apparently is now corrected. From your link:
        "Updated to add on July 24
        We understand now that CrowdStrike's software on Linux crashed due to a kernel bug involving BPF, which will need to be patched as per advisories from distro makers. Falcon Sensor code running at the kernel level was not affected; code at the user level using BPF to do its work was affected."

        • Which one? There's two listed in TFA, one is related to an eBPF call that was a kernel bug, and the other is related to loading of a kernel module. Or do you think that it's not possible to kernel panic a Linux machine by loading a kernel module?

          That's the whole point of the discussion. Do you want the ability to run code on your machine, or do you want your hardware to be a walled garden of code blessed by the OS vendor (though reading through the comments here people are expecting less walled garden and m

      • Re: (Score:2, Insightful)

        by gweihir ( 88907 )

        Only that on Linux, this did not prevent boot and you had a few minutes to fix things. Not comparable.

        • Pretty sure a kernel panic will interrupt your boot process.

        • Only that on Linux, this did not prevent boot and you had a few minutes to fix things. Not comparable.

          Depends on which one you're talking about. Crowdstrike has had a history of multiple Linux related crashes and one of them definitely happened on boot, literally right as their kernel module (you know kernel modules right? as in low level code capable of fucking up anything) was loaded.

          Even if you were right and one of the Linux cases didn't result in a kernel panic on boot, are you actually crediting them with pure luck in that it would haven't (but did) happen?

    • I read that a very similar thing happened to Debian systems a few months back. It doesn't seem Windows specific, just more systems impacted.

      • Yes, something happened on Linux, but that doesn't change anything. The Linux Kernel runs into a problem, anyone can go in and provide a fix, and hopefully they have, but Microsoft is only the only can fix their side, and they're still refusing to accept accountability.
        • The larger problem here was still having to enter a 48 digit bitlocker key on every affected system. Yeah MS or CrowdStrike had a quick fix but Delta had to have people physically enter a unique key on every machine. Everything from scheduling to the signage displays at airports.

          Delta has the biggest blame here for allowing every single box to auto update at once.

          • Sure, Delta should have run a smaller test update, but Windows still should haven't locked up the way it did.

            BitLocker is one of those software products that you honestly have to wonder how anyone thought it was a good idea. BitLocker is a mess of an encryption software, it's trigger-happy, it's unstable, and it can enable itself with little warning about what should do. Being fair, again, if you had to reboot a Linux system with LUKS active, you'd need to enter the password, but at least then you have r
            • Sure, Delta should have run a smaller test update, but Windows still should haven't locked up the way it did.

              This is incorrect. Windows did the correct thing by catching the unhandled exception in a kernel mode driver and halting operation. Continuing after such a failure has occurred would be wholly unacceptable behavior for a general purpose operating system.

              • I disagree, it should have halted the service / driver, since it's third party, and kept on going.
                • I disagree, it should have halted the service / driver, since it's third party, and kept on going.

                  This behavior would be disastrous as ignoring exceptions would compromise the integrity of the system. In this case it is doubly so because the halted service imposes security related operational constraints that would no longer be enforced.

                  • Not really, as I just stated in another thread take the related subsystems off-line to make sure you've failed safe, and then at least you're booted, bootable and people can easily fix the issue without having to jump through hoops. I can see why disabling security software is a bad idea, but not disabling it, in this case, was the major issue.
        • by Targon ( 17348 )

          The fix would be to remove CrowdStrike from the drivers of the machine, but that would require access to the filesystem. Now, if you had an encrypted filesystem for Linux, you'd have the same issue of, "can't fix it if you are loading drivers into the kernel that are bad".

          • You can protect the run space, so that the driver failing doesn't take out everything. Regardless that it happened, Microsoft has made no honest direct statement they'll fix the underlying issue from happening again.
      • by gweihir ( 88907 )

        Not actually similar. On Linux you could still boot and then fix the issue.

      • by bill_mcgonigle ( 4333 ) * on Tuesday August 06, 2024 @12:53PM (#64685660) Homepage Journal

        Linux had a bug, Windows has a fundamental design flaw.

        That's markedly different.

        Microsoft would have to resurrect the NT 3.51 driver model that they ditched.

        It's possible that everybody would prefer this in 2024, especially for servers, on modern hardware, especially with worker architectures and distributed databases.

        The gamers could keep their dangerous drivers for performance.

        This is basically what has to happen at this point. Even using virt assist with drivers now.

        Crowdstrike should move to Win-eBPF which they use on linux.

    • by bill_mcgonigle ( 4333 ) * on Tuesday August 06, 2024 @12:48PM (#64685648) Homepage Journal

      > I'll give credit that maybe Microsoft didn't know this was possible

      Nah - NT 3.51 had separation between drivers and kernel for robustness. It was roughly GUI VMS and got an Orange Book certification.

      NT4 moved drivers into kernel space for performance.

      • Ya, but they might have thought it would handle itself better, so I'll still give them some wiggle room. Regardless, they still have to fix the underlying issue.
    • Microsoft can't claim innocence, they absolutely were the cause of the issue. They might not have been the root cause of the issue, but their software shit the bed, and locked up.

      I'll give credit that maybe Microsoft didn't know this was possible, or, some safety failed, or any number of weird IT / Developer voodoo happened. Even if Microsoft had no knowledge of lockup being possible, they still caused it.

      This makes no sense. Why would Microsoft be responsible for failures in other peoples code? Windows did exactly what it was designed to do catch an unhandled exception in someone else's flawed kernel code and halt execution.

      • They're not responsible for other people's code, they're responsible for their kernel. The kernel shit the bed, and it might have shit the bed because of someone else, but now Microsoft have to make sure in the future the same kind of error, only takes out the offending product.
        • They're not responsible for other people's code,

          Other peoples code caused the problem.

          they're responsible for their kernel. The kernel shit the bed, and it might have shit the bed because of someone else,

          The kernel did not fail. It did exactly what it was designed to do.

          but now Microsoft have to make sure in the future the same kind of error, only takes out the offending product.

          Lets for the sake of argument assume Microsoft snapped its fingers and got rid of third party kernel drivers altogether or went with some sort of radically different Microkernel architecture able to fully isolate consequences of failure.

          Is it your contention the acceptable course of action would have been for the system to continue operation without the benefit of the failed security software? It shoul

          • What should have happened is the system disabling Crowd Strike, and then carrying on like normal. If carrying on like normal is too risky, then disable the related subsystems like networking, but keep the system up and bootable. The entire issue is that Microsoft entered a failed state where nothing worked, and where the solution required jumping through a lot of headache.
            • What should have happened is the system disabling Crowd Strike, and then carrying on like normal.

              Uncaught exceptions are fatal in kernel space. The only possible correct answer is halting operation.

              One could simply ignore reality and instead argue the equivalent of while approaching a cliff vehicles should automatically sprout wings and fly to safety rather than plummeting to its doom. Never mind this was never part of the deal and nobody was ever at any point in time confused that it was.

              If carrying on like normal is too risky, then disable the related subsystems like networking, but keep the system up and bootable.

              Well shit if only the networking stack was disabled Delta wouldn't have suffered any downtime.

              The entire issue is that Microsoft entered a failed state where nothing worked, and where the solution required jumping through a lot of headache.

              Yea like totally...

              • What's your argument exactly? That Windows should have bricked itself, because in 2024 Microsoft can't handle a simple memory exception or driver failure? The best option was and is to disable the offender, Crowd Strike would have been disabled, and anything stemming from that, would be the fault of Crowd Strike.

                At least then the systems are up, and debugging the issue is simpler, then locking into a blue screen and bleeping the system access. However, either way, what happened, happened and now Micr
                • What's your argument exactly?

                  I think I've made myself clear the only acceptable response to an uncaught exception in kernel mode is to halt operation.

                  That Windows should have bricked itself

                  Windows didn't brick itself.

                  because in 2024 Microsoft can't handle a simple memory exception or driver failure?

                  That's right it can't and neither can Linux. The reason for that is kernel level operations (including code executed by third party kernel drivers) have global access and so when something unexpected happens there is no way to isolate the consequences or reason about the implications of the unexpected failure. Your choice is to cross your fingers and hope for

    • The software has to lock up. A security program like cloud strike needs to run at Ring zero to be effective. And if it does that and it does something it's not supposed to then it's just as likely that the software has been compromised as it's just a defective. As a result for this kind of software under normal operating parameters you would want it to blue screen rather than just keep humming along.

      The one at fault is crowd strike for not testing. Now I suppose the argument could be made that the compa
  • Delta was forced into this position. why should they be further forced to accept help from the companies that fucked up their systems in the first place???

    • by Bahbus ( 1180627 )

      Delta drunkenly walked their own asses into this position. And if one of the companies is responsible, offers free help, and I refuse it or ignore it, then I don't deserve lawsuit money from them either. Everyone that works at Delta is a grade A moron.

  • by smooth wombat ( 796938 ) on Tuesday August 06, 2024 @11:48AM (#64685384) Journal

    All those calls made by Microsoft should be preserved online for everyone to see. The email Nadella sent should be posted online (minus the final email address). Microsoft should be showing everything they did to help and Delta either refused or ignored them.

    While not a fan of Microsoft, in a case like this it is entirely believable they did what they could to offer assistance. Showing their efforts would put everything to rest in a heartbeat and throw it back on Delta to explain their side of the story.

  • Microsoft: it's your own fault, you should have bought even more of our crap.

  • They turned down the offers because the only solution available to them, as far as I know, was to manually update each affected machine with some physical presence. Not sure they can accept much help in that self-inflicted case, right? Or did I miss something here?

    • by Targon ( 17348 )

      I don't know about Delta, but the servers would have been primarily virtual machine based, and those wouldn't need physical presence to get to. Individual workstations would be the next thing to work on, but one person per location could get it done in a reasonable amount of time. Even with Bitlocker, as long as the keys were available, getting through to knock out the CrowdStrike drivers and remove them doesn't take THAT long.

      I'm not saying the effort is trivial, but the, "how do you respond and recove

  • Offering help after you cause the failure means nothing for liability. That like lighting a building on fire due to negligence and saying it's not your fault because you offered to help put out the fire after building has burned to the ground.
    • The liability is a trickier issue here, actually.

      There is liability on several sides.

      CrowdStrike is certainly liable for the initial issue, but they had a fix issued within a couple hours. News reports have said about 60% of Fortune 500 businesses had outages, and more than half had resolved the outages within 12 hours, most within 24 hours. The other airlines around the globe had resolved the technical outage on a similar scale, about 12-24 hours depending on the business to get the technical side figure

  • If Microsoft's security was better, would a third party security solution be needed?
  • Our 55-employee company was hit too. We never got any offer to help from Microsoft.

    I guess Microsoft only reaches out to help customers when they're big enough to generate bad PR for Microsoft.

    • Well, yeah. It boils down to exposure: Delta going off-line for an extended time is going to cost Delta - and MS, by extension - a metric shit-ton of money. Your 55 person company, while very important to the 55 employees and their customers, won't have the same impact. Can you even afford to sue MS for the lost productivity and related costs for the downtime?
  • "Satya Nadella also personally emailed Delta CEO Ed Bastian and never heard back"

    Our CEO has been WhatsApping random employees with "I need your attention for a quick task"
    Everyone's ignoring him, because security.

  • I buy a fleet of Ford's equipped with a LoJack system that disables the car when reported stolen. This Lojack has a flaw which could fry the Ford's electrical system. Something happens at LoJack, and not only are all my cars disabled, but their electrical systems are fried too. Should I just go after LoJack, or should I go after Ford too because they "should've built an absolutely bulletproof unfryable electrical system"? All OSes have vulns, save for the very bare metal minimalist "must be 100%" OSes that
    • - and the LoJack was something I have installed as an aftermarket item without Ford even knowing about it, just to tie up loose ends.

How many hardware guys does it take to change a light bulb? "Well the diagnostics say it's fine buddy, so it's a software problem."

Working...