Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google Cloud

How an 'Unprecedented' Google Cloud Event Wiped Out a Major Customer's Account (arstechnica.com) 50

Ars Technica looks at what happened after Google's answer to Amazon's cloud service "accidentally deleted a giant customer account for no reason..."

"[A]ccording to UniSuper's incident log, downtime started May 2, and a full restoration of services didn't happen until May 15." UniSuper, an Australian pension fund that manages $135 billion worth of funds and has 647,000 members, had its entire account wiped out at Google Cloud, including all its backups that were stored on the service... UniSuper's website is now full of must-read admin nightmare fuel about how this all happened. First is a wild page posted on May 8 titled "A joint statement from UniSuper CEO Peter Chun, and Google Cloud CEO, Thomas Kurian...." Google Cloud is supposed to have safeguards that don't allow account deletion, but none of them worked apparently, and the only option was a restore from a separate cloud provider (shoutout to the hero at UniSuper who chose a multi-cloud solution)... The many stakeholders in the service meant service restoration wasn't just about restoring backups but also processing all the requests and payments that still needed to happen during the two weeks of downtime.

The second must-read document in this whole saga is the outage update page, which contains 12 statements as the cloud devs worked through this catastrophe. The first update is May 2 with the ominous statement, "You may be aware of a service disruption affecting UniSuper's systems...." Seven days after the outage, on May 9, we saw the first signs of life again for UniSuper. Logins started working for "online UniSuper accounts" (I think that only means the website), but the outage page noted that "account balances shown may not reflect transactions which have not yet been processed due to the outage...." May 13 is the first mention of the mobile app beginning to work again. This update noted that balances still weren't up to date and that "We are processing transactions as quickly as we can." The last update, on May 15, states, "UniSuper can confirm that all member-facing services have been fully restored, with our retirement calculators now available again."

The joint statement and the outage updates are still not a technical post-mortem of what happened, and it's unclear if we'll get one. Google PR confirmed in multiple places it signed off on the statement, but a great breakdown from software developer Daniel Compton points out that the statement is not just vague, it's also full of terminology that doesn't align with Google Cloud products. The imprecise language makes it seem like the statement was written entirely by UniSuper.

Thanks to long-time Slashdot reader swm for sharing the news.
This discussion has been archived. No new comments can be posted.

How an 'Unprecedented' Google Cloud Event Wiped Out a Major Customer's Account

Comments Filter:
  • by DrMrLordX ( 559371 ) on Saturday May 18, 2024 @04:43PM (#64481769)

    They're great at deleting products and services. No surprises here!

    • by Anonymous Coward

      There ALWAYS is a reason things happen.

      • The chance of a business destroying cloud failure is 100% given enough time for every business relying on the cloud.

        All it takes is for Bank of America, Wells Fargo, Citigroup, Goldman Sacs, Travelers insurance, or any of a host of too big to fail companies fails due to the cloud.

        If Bank of America cannot vouch that their counterparty hedging billions of low credit loans can pay, Bank of America will have to declare itself insolvent. And the 2008 financial crisis begins again.

        The chain of cross-hedging bet

  • " Google Cloud is supposed to have safeguards that don't allow account deletion, but none of them worked apparently" --- sounds like major code flaw OR someone did it that had access to do it. Either way very bad look for google either way.
    • Yep, you're a conspiracy theorist.

      If there were any suspicion on the part of UniSuper that it was intentional, they'd be suing Google for a lot more than actual damages. Those kinds of events *always* leave a trail.

    • by Zarjazz ( 36278 ) on Saturday May 18, 2024 @07:31PM (#64481953)

      I read elsewhere that because it was such a big and complex account there was some manual elements involved in the original accounts creation. It was that manually set configuration that eventually led to this clusterfuck and why it was a "one off" event - such things aren't done or allowed anymore. So my guess is that some engineer years ago either bypassed or failed to set some flag(s) that the normal tools would configure.

    • Itâ(TM)s called an account manager, Google just finished laying off a few hundred of them.

  • by billybob2001 ( 234675 ) on Saturday May 18, 2024 @04:56PM (#64481793)

    How an 'Unprecedented' Google Cloud Event Wiped Out a Major Customer's Account

    Precedented [slashdot.org]

  • by AcidDan ( 150672 ) on Saturday May 18, 2024 @05:10PM (#64481805)

    Seriously though, I swear Google's corporate ADHD is getting... Oh look here's some AI!

  • by Anonymous Coward

    Newer trust that your backup is secure on someone else's server !

    • There are no "backups on a server". What you have "on a server" is a local copy. A "backup" is a copy that is kept off-site, hopefully on a piece of media that is safe from accidental over-writing.

      • by Anonymous Coward

        There are no "backups on a server". What you have "on a server" is a local copy. A "backup" is a copy that is kept off-site, hopefully on a piece of media that is safe from accidental over-writing.

        There is one and only one single requirement that differentiates a backup from a copy.
        That is a backup taken in the past cannot ever be affected by changes to the source in the present or future.

        If you make separate copies every day, those copies are not copies but backups.
        If you make one copy that is overwritten, for any reason, but especially by being replaced, that is a copy, not a backup.

        The device copying the bits around does not matter. A server can make backups just the same as a desktop, laptop, or

        • In the cloud, you do not have backups, you have redundancies. But if you delete stuff in part a, it will get replicated to part b and c of the datacenter. Depending on how much you pay you may or may not get some local (as in not at the same datacenter) or buy it twice to get geographical redundancies.

  • by redback ( 15527 ) on Saturday May 18, 2024 @05:25PM (#64481825)

    Kinda glad this happened to someone with enough weight to throw around to make Google care.

    Imagine if they did this to your small business. Good luck getting a joint statement from the CEO then.

  • by gweihir ( 88907 ) on Saturday May 18, 2024 @06:15PM (#64481865)

    And if they do crappy stuff and automate things that should never be automated, arbitrary things can happen to your data and infrastructure. All of it vanishing without warning, for example.

  • by PubJeezy ( 10299395 ) on Saturday May 18, 2024 @07:01PM (#64481919)
    What happens when you connect a high trust system to a low trust system? Do you get a medium trust system? Nope. You get a low trust system. What if the C-Suite at Google is a low-trust system?

    Their entire business model is based on an outdated and predatory view on data collection. They quite simply cannot be trusted. Their rap sheet includes 43 offenses and $2 billion in fines: https://violationtracker.goodj... [goodjobsfirst.org]

    Industry-darling tech bloggers keep having tiny conversations about specific products failing at specific times instead of having the real conversation. Corporate structuring and liability shielding has removed any incentive for Google to run an honest and secure business. The security flaw isn't in the software or middle management. Google corporate leadership is a low trust system. Any resource, platform or product connected to Googles board of directors is going to be insecure. That's their job.
  • by Anonymous Coward

    Google PR confirmed in multiple places it signed off on the statement

    Since when do you have to get permission from a company to tell the world about them fucking you over?

  • by spazmonkey ( 920425 ) on Saturday May 18, 2024 @07:29PM (#64481947)

    Replace the term "Cloud" with "Other Peoples Computers". This cures many of the worst impulses C-level mooks have about cloud storage magically solving all problems.

    • The cloud is what saved them. They had an off-site backup at a different cloud provider.

      If they were working out of an on-premises data center and it failed, they still would have needed an off-site backup to restore from.

      • by sjames ( 1099 )

        True, but on the other hand, they probably wouldn't have accidentally deleted the data center.

    • Replace the term "Cloud" with "Other Peoples Computers".

      Heaven sounds like a weird place now.

  • I try to make it a habit to download all my bank, brokerage, etc. monthly statements. It's classic recordkeeping just like in the days of putting paper statements in file folders. Those are your records and evidence if and when something like this happens and you need to prove what's yours.

    • Sounds like a hoarding issue. Who keeps all that paper these days? Do you keep every single statement for 10 years? Ugh, no thanks.
    • Completely agree with you Mean Variance !!

      There's a particularly annoying [to me at least] advert running in the UK at present.

      it features an office type staring at a PC alone in the office after hours - (a stereotype of a working all hours banker type) who hears a colleague using a shredder. He leaps up saying "Don't do it, that document's the only proof we've got!" to which his mate replies "We don't need this, everone's gone paperless these days". The advert finishes with a voice over that TV Licences ne

  • Doesn't the US keep a copy of all Google data?
    The pension fund should ask the US for the missing data.

  • The wrong lesson (Score:2, Interesting)

    by radarskiy ( 2874255 )

    Know what the difference is between this and an internally-run data center failing and taking its local backup with it?

    Nothing.

    There's a slot of smooth-brains that will squawk that "other people's computer" line, which is by far the dumbest thing jwz has ever said.

    The reality is that having your working copy of your data in the cloud is no more or less dangerous that having your working copy on-premises. The danger is having all of your copies in one location.

    Note that in this scenario there were backups at

    • Re:The wrong lesson (Score:4, Interesting)

      by mrfaithful ( 1212510 ) on Sunday May 19, 2024 @06:13AM (#64482561)

      There is one fundamental difference and that's the management of the server configuration and the data it controls. On premises it's your own staff, on cloud it's not google staff, it's a few thousand lines of python and yaml or whatever. Someone makes a change, tests it against their test data, gets all the green OKs and pushes it to production. And it promptly screws over a handful of customers whose systems need special attention and processes not properly documented.

      I had this happen to me twice. We have very simple requirements, defaults are usually good enough, but we have the problem that once it's set up we rarely have to talk to the hosting company. It just keeps on trucking. And so nobody is looking at all that old configuration data. And it's all good until their IaaS platform needs updating and we find out that our defaults are not the current defaults and oh shit, your config is hosed.

      This is the reason I want to get away from a managed service. Because their management is concerned about different things to our management and move-fast-and-break-things is at odds with our do-it-once-and-use-it-forever approach.

  • by joshuark ( 6549270 ) on Sunday May 19, 2024 @03:46AM (#64482429)

    And then Google posted that "We take our customer's data seriously and value their patronage..." in a cue from the Microsoft playbook...

    The corporate bafflegab for "Whoops! But its all your fault..."

    JoshK.

    • by Briareos ( 21163 )

      And then Google posted that "We take our customer's data seriously and value their patronage..." in a cue from the Microsoft playbook...

      I think there's some punctuation missing:

      "We take our customer's data, seriously - and value, their patronage..."

      Maybe tack a Zuckerbergian "dumb fucks..." at the end...

      • Quite. Your point is well taken, indeed.

        One might use a LISP (lith-p) and say "th-eriously"

        "We take our customer's data, th-eriously - and value, their patronage..."

        Yes, good addendum, the mother-zucker comment, or the Enron "asshole" while the mic is still on...

        JoshK.

  • How do you screw that up? Deleting everything permanently? I'm glad the company gave a shit about a real DR plan and used a secondary cloud provider.

Ocean: A body of water occupying about two-thirds of a world made for man -- who has no gills. -- Ambrose Bierce

Working...