Become a fan of Slashdot on Facebook


Forgot your password?
The Internet Businesses

Explosion At ThePlanet Datacenter Drops 9,000 Servers 431

An anonymous reader writes "Customers hosting with ThePlanet, a major Texas hosting provider, are going through some tough times. Yesterday evening at 5:45 pm local time an electrical short caused a fire and explosion in the power room, knocking out walls and taking the entire facility offline. No one was hurt and no servers were damaged. Estimates suggest 9,000 servers are offline, affecting 7,500 customers, with ETAs for repair of at least 24 hours from onset. While they claim redundant power, because of the nature of the problem they had to go completely dark. This goes to show that no matter how much planning you do, Murphy's Law still applies." Here's a Coral CDN link to ThePlanet's forum where staff are posting updates on the outage. At this writing almost 2,400 people are trying to read it.
This discussion has been archived. No new comments can be posted.

Explosion At ThePlanet Datacenter Drops 9,000 Servers

Comments Filter:
  • 9000::7500?

    So I guess a "customer" in this case is a company or business, not an individual? Unless many of the individuals have several servers each.
    • Re: (Score:3, Insightful)

      by ChowRiit ( 939581 )
      Only a few people need to have a lot of servers for there to be 18 servers for every 15 customers. To be honest, I'm surprised the ratio is so low, I would have guessed most hosting in a similar environment would be by people who'd want at least 2 servers for redundancy/backup/speed reasons...
      • by 42forty-two42 ( 532340 ) <`bdonlan' `at' `'> on Sunday June 01, 2008 @02:38PM (#23618971) Homepage Journal
        Wouldn't people who want such redundancy consider putting the other server in another DC?
        • Re: (Score:3, Informative)

          by bipbop ( 1144919 )
          At my last job, BCP guidelines required both: a minimum of four servers for anything, two of which must be at a physically distant datacenter.
        • Re: (Score:3, Insightful)

          by cowscows ( 103644 )
          I think it depends on just how mission critical things are. If your business completely ceases to function if your website goes down, then remote redundancy certainly makes a lot of sense. If you can deal with a couple of days with no website, then maybe it's not worth the extra trouble. I'd imagine that a hardware failure confined to a single server is more common than explosions bringing entire data-centers offline, so maybe a backup server sitting right next to it isn't such a useless idea.

          • Yep (Score:4, Insightful)

            by Sycraft-fu ( 314770 ) on Monday June 02, 2008 @02:50AM (#23623877)
            For example someone like probably has a redundant data centre. Reason being that if their site is down, their income drops to 0. Even if they had the phone techs to do the orders nobody knows their phone number and since the site is down, you can't look it up. However someone like probably doesn't. If their site is down it's inconvenient, and might possibly cost them some sales from people who can't research their products online, but ultimately it isn't a big deal even if it's gone for a couple of days. Thus it isn't so likely they'd spend the money on being in different data centres.

            You are also right on in terms of type of failure. I've been at the whole computer support business for quite a while now, and I have a lot of friends who do the same thing. I don't know that I could count the number of servers that I've seen die. I wouldn't call it a common occurrence, but it happens often enough that it is a real concern and thus important servers tend to have backups. However I've never heard of a data centre being taken out (I mean from someone I know personally, I've seen it on the news). Even when a UPS blew up in the university's main data centre, it didn't end up having to go down.

            I'm willing to bet that if you were able to get statistics on the whole of the US, you'd find my little sample is quite true. There'd be a lot of cases of servers dying, but very, very few of whole data centres going down, and then usually only because of things like hurricanes or the 9/11 attacks. Thus, a backup server makes sense, however unless it is really important a backup data centre may not.
      • Re: (Score:3, Insightful)

        I'm guessing that most of the customers are virtual-hosted, and therefore have only a fraction of a server, but some customers have many servers.
    • by p0tat03 ( 985078 ) on Sunday June 01, 2008 @02:42PM (#23619011)
      ThePlanet is a popular host for hosting resellers. Many of the no-name shared hosting providers out there host at ThePlanet, amongst other places. So... Many of these customers would be individuals (or very small companies), who in turn dole out space/bandwidth to their own clients. The total number of customers affected can be 10-20x the number reported because of this.
      • a bit wrong (Score:3, Insightful)

        by unity100 ( 970058 )
        its not the 'no name' hosting resellers who host at the planet. no name resellers do not employ an entire server, they just use whm reseller panel that is being handed out by a company which hosts servers there.
  • by Anonymous Coward on Sunday June 01, 2008 @02:11PM (#23618759)
    Electricity is a fickle mistress, one moment she's gently caressing your genitals through gingerly applied electrodes the next she's blowing up your data centers.
  • by QuietLagoon ( 813062 ) on Sunday June 01, 2008 @02:12PM (#23618761)
    ... for posting frequent updates to the status of the outage.
    • Re: (Score:3, Interesting)

      by imipak ( 254310 )
      Little-known fact: The Planet were the first ever retail ISP offering Internet access to the general public - from 1989. Hmmm, so the longest-established ISP in the world that they're not only working hard to get that DC back online, they're posting pretty open summaries of the state of play... coincidence? I don't think so.
    • by larien ( 5608 ) on Sunday June 01, 2008 @02:47PM (#23619065) Homepage Journal
      It's probably less effort to spend a few minutes updating a forum than it would be to man the phones against irate customers demanding their servers be brought back online.
      • by QuietLagoon ( 813062 ) on Sunday June 01, 2008 @03:22PM (#23619347)
        man the phones against irate customers

        It does not sound like the type of company that thinks of its customers as an enemy, as your message implies.

    • Update 11:14 PM CST (Score:4, Informative)

      by Solokron ( 198043 ) on Monday June 02, 2008 @12:33AM (#23623161) Homepage
      As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service. As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed. There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running. We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production. Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues. We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do. We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed. I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service. I plan to have an audio update tomorrow evening. Until then, Douglas J. Erwin Chairman & Chief Executive Officer
  • explosion? (Score:5, Funny)

    by Anonymous Coward on Sunday June 01, 2008 @02:14PM (#23618775)
    Lesson learned: don't store dynamite in the power room.
  • by z_gringo ( 452163 ) <<z_gringo> <at> <>> on Sunday June 01, 2008 @02:20PM (#23618805)
    At this writing almost 2,400 pelople are trying to read it. Posting it on slashdot should help speed it up.
  • by Scuzzm0nkey ( 1246094 ) on Sunday June 01, 2008 @02:20PM (#23618809)
    I wonder what the dollar value of the repairs will run? I'm sure insurance covers this kind of thing, but I'd love to see hard figures like in one of those mastercard commercials: Structural damage: $15000 Melted hardware: $70000 Halon refill: $however much halon costs Real-Life Slashdot effect: Priceless
    • Not to mention the cost of pulling all those consultants in, overnight, on a weekend... Also, only the electrical equipment (and structural stuff) was damaged - networking and customer servers are intact (but without power, obviously).
      • Re:Recovery costs (Score:5, Insightful)

        by macx666 ( 194150 ) * on Sunday June 01, 2008 @02:30PM (#23618893) Homepage

        Not to mention the cost of pulling all those consultants in, overnight, on a weekend...

        Also, only the electrical equipment (and structural stuff) was damaged - networking and customer servers are intact (but without power, obviously).
        I read that they pulled in vendors. Those types would be more than happy to show up at the drop of a hat for some un-negotiated products that insurance will pay for anyway, and they'll even throw in their time for "free" so long as you don't dent their commission.
        • The support thread talks about both, so I'd assume they (or their insurance, anyway) is paying out the nose for dozens of contractors to come in on short notice right about now.
    • Re: (Score:3, Funny)

      by Geak ( 790376 )
      Maybe they'll just haul the servers to another datacenter:

      Dollys - $500, Truck rentals - $5000, Labour - $10000, Sending internets on trucks - Priceless
  • by Izabael_DaJinn ( 1231856 ) * <slashdot.izabael@com> on Sunday June 01, 2008 @02:21PM (#23618819) Homepage Journal
    Clearly this is bad karma resulting from all their years of human rights violations....especially Tiananmen Square...oh wait--
  • have that can explode like this? All I can think of are all those cheap electrolytic caps. They really do put on quite a show, don't they? Put the transformer up on the roof, ok?
    • by Hijacked Public ( 999535 ) on Sunday June 01, 2008 @02:30PM (#23618895)
      Probably less traditional explosion and more Arc Flash [].
  • by Pyrex5000 ( 1038438 ) on Sunday June 01, 2008 @02:22PM (#23618835)
    I blame Kevin Hazard.
  • by quonsar ( 61695 ) on Sunday June 01, 2008 @02:22PM (#23618839) Homepage
    At this writing almost 2,400 people are trying to read it

    and as of this posting, make that 152,476.
    • by Amigori ( 177092 ) *
      and you know its bad when the Coral Cache is running slower than the nearly slashdotted forum itself. 3100+ users right now in the official forum.
  • by PPH ( 736903 ) on Sunday June 01, 2008 @02:26PM (#23618855)

    Being in the power systems engineering biz, I'd be interested in some more information on the type of building (age, original occupancy type, etc.) involved.

    To date. I've seen a number of data center power problems, from fires to isolated, dual source systems that turned out not to be. It raises the question of how well the engineering was done for the original facility, or the refit of an existing one. Or whether proper maintenance was carried out.

    From TFA:

    electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room.
    Properly designed systems should never result in any fault to become uncontained in this manner.
    • by gmack ( 197796 )
      Yeah but as we both know in these days of excessive growth that infrastructure tends to lag behind more visible changes.

      I'm sure at one point it was well designed.. but that was, I'm guessing, a few years ago and at a lot lower current and more than a few modifications ago.

      That's of course not counting the possibility of contractor stupidity.

      I don't know what makes people so freaking stupid when it comes to electricity. But then I'm still annoyed by a roofing contractor having two employee's lives
    • The short happened in a conduit (behind a wall, I'm assuming), FWIW.
    • Re: (Score:3, Insightful)

      by aaarrrgggh ( 9205 )
      This isn't that uncommon with a 200kAIC board with air-power breakers, if there is a bolted fault. Instantaneous delays. Newer insulated-case style breakers all have an instantaneous override which will limit fault energy,

      The other possibility was that a tie was closed and the breakers over-dutied and could not clear the fault.

      Odd that nobody was hurt though; spontaneous shorts are very rare-- most involve either switching or work in live boards, either of which would kill someone.
  • Explosion? (Score:4, Insightful)

    by mrcdeckard ( 810717 ) on Sunday June 01, 2008 @02:30PM (#23618891) Homepage

    The only thing that I can imagine that could've caused an explosion in a datacenter is a battery bank (the data centers I've been in didn't have any large A/C transformers inside). And even then, I thought that the NEC had some fairly strict codes about firewalls, explosion-proof vaults and the like.

    I just find it curious, since it's not unthinkable that rechargeable batteries might explode.

    mr c
  • by martyb ( 196687 ) on Sunday June 01, 2008 @02:30PM (#23618897)

    Kudos to them for their timely updates as to system status. Having their status page listed on /. doesn't help them much, but I was encouraged to see a Coral Cache link to their status page. In that light, here's: a link to the Coral Cache lofiversion of their status page:

  • by Zymergy ( 803632 ) * on Sunday June 01, 2008 @02:30PM (#23618901)
    I am wondering what UPS/Generator Hardware was in use?
    Where would the "failure" (Short/Electrical Explosion) have to be to cause everything to go dark?
    Sounds like the power distribution circuits downstream of the UPS/Generator were damaged.

    Whatever vendor provided the now vaporized components are likely praying that the specifics are not mentioned here.

    I recall something about Lithium Batteries exploding in Telecom DSLAMs... I wonder if their UPS system used Lithium Ion cells? [] [] []
    • Re: (Score:2, Informative)

      by Anonymous Coward
      If you'd read the linked status report, you'd see that there was a short in a high voltage line. They are dark because the fire department told them not to power up their back-up generators.
  • kaboom (Score:2, Funny)

    by rarel ( 697734 )
    Clearly these Sony batteries had to be replaced one way or another...
  • by Anonymous Coward
    I have 5 servers. Each of them is in a different city, on a different provider. I had a server at The Planet in 2005.

    I feel bad for their techs, but I have no sympathy for someone who's single-sourced, they should have propagated to their offsite secondary.

    Which they'll be buying tomorrow, I'm sure.

    • by aronschatz ( 570456 ) on Sunday June 01, 2008 @08:28PM (#23621645) Homepage
      Yeah, because everyone can afford redundancy like you can.

      Most people own a single server that they make backups of in case of it crashing OR have two servers in the same datacenter in case one fails.

      I don't know how you can easily do offsite switch over without a huge infrastructure to support it which most people don't have the time and money to do.

      Get off your high horse.
  • by 1sockchuck ( 826398 ) on Sunday June 01, 2008 @02:46PM (#23619059) Homepage
    Data Center Knowledge has a story on the downtime at The Planet [], summarizing the information from the now Slashdotted forums. Only one of the company's six data centers was affected. The Planet has more than 50,000 servers in its network, meaning that one on five customers are offline.
  • by cptnapalm ( 120276 ) on Sunday June 01, 2008 @03:06PM (#23619209)
    They need to build the building out of what ever they build the servers out of.
  • by Sentry21 ( 8183 ) on Sunday June 01, 2008 @03:27PM (#23619379) Journal

    electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room.
    But the fourth wall stayed up! And that's what you're getting, son - the strongest data centre in all of Texas!
  • by JoeShmoe ( 90109 ) <> on Sunday June 01, 2008 @03:45PM (#23619515)

    Everyone loves firemen, right? Not me. While the guys you see in the movies running into burning buildings might be heroes, the real world firemen (or more specifically fire chiefs) are capricious, arbitrarty, ignorant little rulers of their own personal fiefdom. Did you know that if you are getting an inspection from your local firechief and he commands something, there is no appeal? His word is law, no matter how STUPID or IGNORANT. I'll give you some examples later.

    I'm one of the affected customers. I have about 100 domains down right now because both my nameservers were hosted at the facility, as is the control panel that I would use to change the nameserver IPs. Whoops. So I learned why I need to obviously have NS3 and ND4 and spread them around because even though the servers are spread everywhere, without my nameservers none of them currently resolve.

    It sounds like the facility was ordered to cut ALL power because of some fire chief's misguided fear that power flows backwards from a low-voltage source to a high-voltage one. I admit I don't know much about the engineering of this data center, but I'm pretty sure the "Y" junction where AC and generator power come together is going to be as close to the rack power as possible to avoid lossy transformation. It makes no sense why they would have 220 or 400 VAC generators running through the same high-voltage transformer when it would be far more efficient to have 120 or even 12VCD (if only servers would accept that). But I admit I could be wrong, and if it is a legit safety issue...then it's apparently a single point of failure for every data center out there because ThePlanet charged enough that they don't need to cut corners.

    Here's a couple of times that I've had my hackles raised by some fireman with no knowledge of technology. The first was when we switched alarm companies and required a fire inspector to come and sign off on the newly installed system. The inspector said we needed to shut down power for 24 hours to verify that the fire alarm would still work after that period of time (a code requirement). No problem, we said, reaching for the breaker for that circuit.

    No no, he said. ALL POWER. That meant the entire office complex, some 20-30 businesses, would need to be without power for an entire day so that this fing idiot could be sure that we weren't cheating by sneaking supplimentary power from another source.


    We ended up having to rent generators and park them outside to keep our racks and critical systems running, and then renting a conference room to relocate employees. We went all the way to the country commmissioners pointing out how absolutely stupid this was (not to mention, who the HELL is still going to be in a burning building 24 hours after the alarm's gone off) but we told that there was no override possible.

    The second time was at a different place when we installed a CO alarm as required for commercial property. Well, the inspector came and said we need to test it. OK, we said, pressing the test button. No no, he said, we need to spray it with carbon monoxide.

    Where the HELL can you buy a toxic substance like carbon monoxide, we asked. Not his problem but he wouldn't sign off until we did. After finding out that it was illegal to ship the stuff, and that there was no local supplier, we finally called the manufacturer of the device who pointed out that the device was void the second it was exposed to CO because the sensor was not reusuable. In other words, when the sensor was tripped, it was time to buy a new monitor. You can see the recursive loop that would have devloped if we actually had tested the device and then promptly had to replace it and get the new one retested by this idiot.

    So finally we got a letter from the manufacturer that pointed out the device was UL certified and that pressing the test button WAS the way you tested the device. It took four weeks of arguing before he finally found an excuse that let him safe face and
    • Look, when I go into a building in gear and carrying an axe and an extinguisher, breathing bottled air, wading through toxic smoke I couldn't give crap number one about your 100 sites being down.

      I have a crew to protect. In this case, I'm going into an extremely hazardous environment. There has already been one explosion. I don't know what I'm going to see when I get there, but I do know that this place is wall to wall danger. Wires everywhere to get tangled in when its dark and I'm crawling through the smoke. Huge amounts of currents. Toxic batteries everywhere that may or may not be stable. Wiring that may or may not be exposed.

      If its me in charge, and its my crew making entry, the power is going off. Its getting a lock-out tag on it. If you wont turn it off, I will. If I do it, you won't be turning it on so easily. If need be, I will have the police haul you away in cuffs if you try to stop me.

      My job, as a firefighter -- as a fire officer -- is to ensure the safety of the general public, of my crew, and then if possible of the property.

      NOW -- As a network guy and software developer -- I can say that if you're too short sighted or cheap to spring for a secondary DNS server at another facility, or if your servers are so critical to your livelihood that losing them for a couple of days will kill you but you haven't bothered to go with hot spares at another data center then you sir, are an idiot.

      At any data center - anywhere - anything can happen at any time. The f'ing ground could open up and swallow your data center. Terrorists could target it because the guy in the rack next to yours is posting cartoon photos of their most sacred religious icons. Monkeys could fly out of the site admin's [nose] and shut down all the servers. Whatever. If its critical, you have off site failover. If not, you're not very good at what you do.

      End of rant.
  • by Animats ( 122034 ) on Sunday June 01, 2008 @03:58PM (#23619595) Homepage

    They supposedly had a "short in a high-volume wire conduit." That leads to questions as to whether they exceeded the NEC limits [] on how much wire and how much current you can put through a conduit of a given size. Wires dissipate heat, and the basic rule is that conduits must be no more than 40% filled with wire. The rest of the space is needed for air cooling. The NEC rules are conservative, and if followed, overheating should not be a problem.

    This data center is in a hot climate, and a data center is often a continuous maximum load on the wiring, so if they do exceed the packing limits for conduit, a wiring failure through overheat is a very real possibility.

    Some fire inspector will pull charred wires out of damaged conduit and compare them against the NEC rules. We should know in a few days.

  • by Animats ( 122034 ) on Sunday June 01, 2008 @04:09PM (#23619667) Homepage

    YouTube's home page is returning "Service unavailable". Is this related? (Google Video is up.)

  • _The_ Power Room? (Score:3, Insightful)

    by John Hasler ( 414242 ) on Sunday June 01, 2008 @04:13PM (#23619691) Homepage
    > ...they claim redundant power...

    How the hell could they claim redundant power with only one power room?
    • Re: (Score:3, Insightful)

      by CFD339 ( 795926 )
      Redundant power they have. Redundant power distribution grids they do not. This is common. The level of certification in redundancy on power for fully redundant grids is (I think) called 2N where they only claim N+1 -- which I understand means failover power. Its more than enough 99.9% of the the time. To have FULLY redundant power plus distribution from the main grid all the way into the building through the walls and to every rack is ridiculously more expensive. At that point, it is more sensible to
  • My servers dropped off the net yesterday afternoon, and if all goes well they'll be up and running late tonight. At 1700PST they're supposed to do a power test, then start bringing up the environmentals, the switching gear, and blocks of servers.

    My thoughts as a customer of theirs:

    1. Good updates. Not as frequent or clear as I'd like, but mostly they didn't have much to add.

    2. Anyone bitching about the thousands of dollars per hour they're losing has not credibility to me. If your junk is that important, your hot standby server should be in another data center.

    3. This is a very rare event, and I will not be pulling out of what has been an excellent relationship so far with them.

    4. I am adding a fail over server in another data center (their Dallas facility). I'd planned this already but got caught being too slow this time.

    5. Because of the incident, I will probably make the new Dallas server the primary and the existing Houston one the backup. This is because I think there will be long term stability issues in this Houston data center for months to come. I know what concrete, drywall, and fire extinguisher dust does to servers. I also know they'll have a lot of work in reconstruction ahead, and that can lead to other issues.

    For now, I'll wait it out. I've heard of this cool place called "outside". maybe I'll check it out.
  • by fm6 ( 162816 ) on Sunday June 01, 2008 @05:53PM (#23620489) Homepage Journal

    This goes to show that no matter how much planning you do, Murphy's Law still applies.
    I am so tired of hearing that copout. Does the submitter know for a fact that ThePlanet did everything it could to keep its power system from exploding? I don't have any evidence one way or the other, but if they're anything like other independent data center operators, it's pretty unlikely.

    The lesson you should be taking from Murphy's Law is not "Shit Happens". The lesson you should be taking is that you can't assume that an unlikely problem (or one you can con yourself into thinking unlikely) is one you can ignore. It's only after you've prepared for every reasonable contingency that you're allowed to say "Shit Happens".

Mr. Cole's Axiom: The sum of the intelligence on the planet is a constant; the population is growing.