Forgot your password?

typodupeerror
The Internet Technology

Seattle Data Center Outage Disrupts E-Commerce 118

Posted by ScuttleMonkey
from the no-sigmas-for-you dept.
1sockchuck writes "A major power outage at Seattle telecom hub Fisher Plaza has knocked payment processing provider Authorize.net offline for hours, leaving thousands of web sites unable to take credit cards for online sales. The Authorize site is still down, but its Twitter account attributes the outage to a fire, while AdHost calls it a 'significant power event.' Authorize.net is said to be trying to resume processing from a backup data center, but there's no clear ETA on when Fisher Plaza will have power again."
This discussion has been archived. No new comments can be posted.

Seattle Data Center Outage Disrupts E-Commerce

Comments Filter:
  • by Cysgod (21531) on Friday July 03 2009, @01:29PM (#28573285) Homepage

    Apparently Verizon has a single point of failure for much of its FiOS for the metro areas of Western Washington state in this building as well so the FiOS customers are offline as well right now.

    • Clownshoes: Have no failover plan and be singly homed.
    • Meh: Have a failover plan.
    • Good: Have a failover plan that requires humans and exercise it regularly.
    • Better: Have a failover plan that is automated and exercise it regularly.
    • Best: Eliminate single points of failure so failover is turning off the flake or fail and going back to drinking a beer.

    Hot/Hot is always a more ideal solution than Hot/Warm or Hot/Cold for disaster recovery (and increasing equipment utilization/ROI), and this event demonstrates why.

  • by Cysgod (21531) on Friday July 03 2009, @01:39PM (#28573381) Homepage

    Looks like from twitter comments that Verizon finished their failover since people's FiOS is coming back now.

  • Re:Heh (Score:3, Informative)

    by Anonymous Coward on Friday July 03 2009, @01:41PM (#28573395)

    It's interesting how many companies have assumed redundancy in place but never take the time to do proper testing. They figure that once a disaster happens, that everything will automatically work because their vendor or staff said so. To achieve true redundancy a company needs to do semi-frequent testing to ensure that everything is working properly. Authorize.net might have had what was assumed a redundant system in place, but once the disaster happen they soon realized their system wasn't designed or configured properly. It is expensive and time consuming to test redundancy, let alone actually paying for the redundant equipment/staff/etc, but in times like this it shows how one gets their moneys worth in doing so.

  • by Anonymous Coward on Friday July 03 2009, @01:41PM (#28573405)

    Fisher Plaza is supposed to be a regional telecomm / communications / medical care hub for the Seattle area. It was designed and built to *not* crash, even in a magnitude 9.5 quake. Sounds like they've got work to do ...

  • System failure (Score:5, Informative)

    by ErkDemon (1202789) on Friday July 03 2009, @01:44PM (#28573431) Homepage
    There are four main factors that can take a part of a society's key infrastructure offline.

    1: ACTS OF GOD
    Meteor strike, lightnight strike, extreme weather ...

    2: ACTS OF MALICE
    War, terrorism, extortion, employee sabotage, criminal attacks ...

    3: WEAK INFRASTRUCTRUCTURE
    Underpowered networks, inadequate UPS backups, skeleton staffing, the shaving of safety margins as an efficiency exercise, inadequate rate of replacing old hardware ...

    4: MANAGEMENT ARSINESS
    This is when a problem starts, and the people in charge either don't know how to react, don't care, or prioritise face-saving over actual problem-solving. This happens when you get an outage, and instead of system management promptly calling all their critical clients to inform them, and warn them that there's maybe twenty minutes of UPS capacity in the routers if the system's not fixed by then, they instead cross their fingers and hope that things'll work out, and worry about what to tell the clients afterwards.

    Fisher Plaza seems to have suffered from a case of #4 recently, so it's not surprising that they've gone down again. The first time should have been the wakeup call to show them that their human systems were in need of an overhaul. Without that overhaul, you're setting up a dynamic in which the second time it happens, things are even worse (because now people are locked into defensive mode).

    No matter how advanced your technological systems, if the people running it have the wrong mindset, you're gonna go down. And when you go down, you're gonna go down far far harder than necessary.

  • by johnncyber (1478117) on Friday July 03 2009, @01:46PM (#28573461)
    ...except it failed as well. From their twitter:

    "@gotwww The backup data center was impacted too. Don't have info as to why. The team is solely focused on getting us back up for now."
  • Geocaching.com too (Score:5, Informative)

    by dickens (31040) on Friday July 03 2009, @01:56PM (#28573555) Homepage

    And on a holiday. Bummer. :(

  • by Anonymous Coward on Friday July 03 2009, @01:57PM (#28573559)

    Not just FIOS it looks like, I was wondering why my DSL was offline. Nearly all network services I would guess.

  • by PPH (736903) on Friday July 03 2009, @02:17PM (#28573741)

    ... who's broadcast facilities reside in this building (they were broadcasting from a park on Queen Anne hill this morning), it was due to a transformer vault fire. The resulting sprinkler operation rendered their backup generator inoperable.

    Being in the power biz, this sort of thing is to be expected in typical office buildings. Sometimes the power goes out. Live with it. What really puzzles me is how someone can take such a structure, install a raised floor and some big A/C units on the roof and sell it as a data center. This kind of crap goes on all the time, as I've seen purpose built data centers go down for single point failures.

  • by Achromatic1978 (916097) <robert AT chromablue DOT net> on Friday July 03 2009, @04:36PM (#28574853)
    Come on, the guy's sig is a link to some comic rant about "its versus it's" which, whilst it annoys me no end, is most definitely a good indicator that he is, no doubt, an insufferable pedant.
  • Not the first time (Score:1, Informative)

    by Anonymous Coward on Friday July 03 2009, @05:11PM (#28575151)

    This is the 2nd fire since 2008... Apparently Internap rent the power from the building so they have no control over the quality/maintenance of these generators and UPSes.

    The fire which started around 11:30 PM (or maybe earlier, but first signs were around that time) damaged badly some of the electrical risers, so they are unable to get power back so some parts of the datacenter. According to their last update they're getting external generators to bypass the damaged equipment and power up the rest of the datacenter, which should be completed late this evening... At best it's going to be a nearly full day outage for some of their customers.

  • by funkboy (71672) on Friday July 03 2009, @08:51PM (#28576641) Homepage

    An auto switching power Y-cable with two inputs, and one output? ive never seen or heard of these.. Do you have a manufacturer or part number?
    id defiantly like some.

    Well, it ain't just a Y cable and they're not super-cheap, but still affordable if you're running anything that needs anywhere near the level of redundancy that they provide.

    It's called a static transfer switch [apc.com] and can be had for a few hundred bucks from most APC dealers (and MGE dealers, now that the merger is complete).

    What's nice about them is that unlike a UPS, colo providers don't mind if you stick an STS in your rack, as a UPS removes the colo provider's ability to completely shut off everything in the datacenter with their automated power systems if the shit really hits the fan (trust me, if there's a fire in the datacenter, you'd much rather have your servers suffer a cold shutdown than sucking in smoke and FM200 and all the other tasty stuff in the air, not to mention fanning or even directly contributing to an electrical fire if it's in your rack). An STS still enables them to completely kill the juice in an emergency while providing good & economic redundancy for single-feed machines, not to mention being close to 100% efficient.

Many are called, few volunteer.

Working...