Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
The Internet Networking United States News

Army DNS ROOT Server Down For 18+ Hours 154

An anonymous reader writes "The H-Root server, operated by the US Army Research Lab, spent 18 hours out of the last 48 being a void. Both the RIPE's DNSMON and the site show this. How, in this day and age of network engineering, can we even entertain one of the thirteen root servers being unavailable for so long? I mean, the US army doesn't even seem to make the effort to deploy more sites. Look at the other root operators who don't have the backing of the US government money machine. Many of them seem to be able to deploy redundant instances. Even the much-maligned ICANN seems to have managed deploying 11 sites. All these root operators that have only one site need a good swift kick, or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability."
This discussion has been archived. No new comments can be posted.

Army DNS ROOT Server Down For 18+ Hours

Comments Filter:
  • An Oxymoron indeed!

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      It was probably outsourced to the cheapest bidder. Either that or some incompetent idiots got the winning bid
      by greasing a few palms.

      • by Mr2cents ( 323101 ) on Saturday October 02, 2010 @09:43AM (#33769814)

        Don't be so harsh on the US military. They only have a trillion dollar budget, you know? How are you ever going to set up redundant systems if all you get is pocket change? You have to cut corners somewhere. Maybe it's time to increase their funding a bit more.

        • Re: (Score:3, Interesting)

          by sumdumass ( 711423 )

          Actually, given the size and scope of the US military, you are right, 1 trillion dollars is about pocket change to most people.

          I'm for increasing their budget more too. But I'm not sure that this outage wasn't planned. How better to test the ability to withstand a "cyber attack" then to lose your DNS servers and see if the your departments can fully function without them. This ability would greatly decrease the time needed to change to an alternative system if ever needed or more likely regroup resources an

        • Re: (Score:2, Funny)

          Careful - don't lump all the military together. It's the ARMY under discussion. My navy has problems, to be sure, but my navy can keep a server up and running. Not to mention, the navy wrote the book on repetitive redundancy. I think congress should take the server away from the army, and give to the navy. Overall security should improve, and physical security will most certainly improve. Our marines haven't lost a server yet!
          • Re: (Score:3, Insightful)

            Er... the Navy has outsourced to HP. In fact, to get out of the agreement they are having to pay to even receive information about the network configuration.

        • Did anyone actually notice the outage?
          • Not really, there is some redundancy in the system. But 7.5% of the DNS system was down. We should definitely increase the number of people who control the DNS system. Preferably all over the world. Heck, let's give Iran a couple too and be fair.

  • by Anonymous Coward on Saturday October 02, 2010 @09:35AM (#33769778)

    So the Internet worked as it should, and routed around this disruption. The other root servers were unaffected, and still functioned fine. So what exactly is the problem?

    • by jayhawk88 ( 160512 ) <> on Saturday October 02, 2010 @11:34AM (#33770360)

      Because it's Saturday, and we don't have anything else to get upset about! WE HAVE TO HAVE SOMETHING TO GET UPSET ABOUT, DON'T YOU UNDERSTAND?! How can I be expected to face the day if I'm not pissed off about something that doesn't directly affect me in any meaningful way?

      • Re: (Score:1, Interesting)

        by Anonymous Coward

        Umm--having retired from the military, and having also been a networking professional for fifteen years in education and industry, I have less than a great respect for the products of the U.S. Army Signal School, who happens to operate that server. I was activated for service in Iraq, and watched a fellow captain, a graduate of that school, and someone with at least five years experience, insist that Ethernet Cat 5 had a maximum single link distance of 185 meters. And he designed his network around that p

        • Re: (Score:3, Insightful)

          We've all made links in cat5 > 200 meters that work perfectly fine. Granted, perfect reliability is something else, but for a backup link in a datacenter that charges an arm and a leg for fiber connections and < 10% of that price for copper ... I've even been known to stick that link in a 10G copper interface card to see if it'd work (even if it didn't work). But I've had reliable gigabit copper links over > 250meters operational for years.It helps a lot if they're the only ethernet link in a metal

          • by Dravik ( 699631 )

            Ever had an ethernet link inside a bundle of VDSL links ?

            Were you running shielded cable? That probably would have helped a lot more than going to Cat 6.

      • Re: (Score:1, Offtopic)

        by NekSnappa ( 803141 )
        I'm upset that the Jaguar XKRs aren't doing well at Prtite LeMans, and the Europeans seem to be making a comeback at the Ryder Cup.
    • I think the problem is something not exactly unlike Army's root server was down for 18 hours.
      and all you can say is DON'T PANIC !?
      wait, what was the question again?
    • by sjames ( 1099 )

      EXACTLY. That's why we have more than just

  • They didn't want YOU to access their servers?

  • by sjs132 ( 631745 ) on Saturday October 02, 2010 @09:40AM (#33769796) Homepage Journal

    Because they don't have redundancy? Everyone gets mad because the USA wants to control the internet, but let something go bad and then someone wants to point fingers? Really? I just don't get the mentality of "We want you to do this for free" and then people turn around and B&M about the service being down for a bit.

    • by Sprouticus ( 1503545 ) on Saturday October 02, 2010 @09:57AM (#33769860)

      It has nothing to do with this being a US Army server. It has everything to do with bad design. The people given the responsibility of a root server should NOT take that responsibility lightly.

    • If the US wants to "control the internet", which we do 'cuz there's Internet Money [] to be had, then we have the responsibility to keep the infrastructure up and running. How are we going to combat a 'cyber attack' without redundancy ... I mean really, even /. has a backup site, right?
    • by Joce640k ( 829181 ) on Saturday October 02, 2010 @10:27AM (#33770000) Homepage

      Rest assured, the government isn't holding back. Those non-redundant Army servers already cost an order of magnitude more then everybody else's redundant servers.

    • by amorsen ( 7485 )

      It would probably be reasonably easy to get someone else to run the H server cluster. The DNS protocol itself limits the number to 13, quite by accident, and there was no grand design when it was decided who was getting them.

      If the US army can't run their server properly, they should offer the slot to someone else.

      • Army:"No, we can't let you do that!"
        Army:"National Security [i.e. PR]. If you don't shut up now, we'll give your name to the FBI!" ...
        Do you really expect any other result?

      • by JWSmythe ( 446288 ) <jwsmythe@jwsmy[ ].com ['the' in gap]> on Saturday October 02, 2010 @01:17PM (#33770876) Homepage Journal

            Actually, most of the root "servers" are "anycast" now (9 of 13), so a single site failure doesn't matter. The US DoD runs two (G and H). G is anycast. H isn't. There wasn't clarification to what the issue was. It's easy to be quick to say "oh they suck", but shit happens sometimes. That's part of why we don't run on just one root nameserver. :)

            For all we know, it could have been a planned outage. I kinda doubt it with that size window, but who knows. It was only 1 of 13, which makes it more like 1 of an awful lot since 9 of the "servers" are really servers distributed world wide. I was doing some monitoring a while back, showing how our traffic moved, and that included monitoring the root servers. It made some really screwy routes, where one check would be in the US, and the next one would be somewhere in Europe.

        • Re: (Score:2, Interesting)

          by amorsen ( 7485 )

          I know most are anycast. I still think DoD should give up their slot to someone else, especially since they have 2. There is no reason why any organisation should have two slots; the only reason for that is historical.

    • Re: (Score:1, Troll)

      by darkpixel2k ( 623900 )

      Because they don't have redundancy?

      What do you mean they don't have redundancy? Last time I checked there were something like 13 root servers. The entire purpose of having multiple root servers is to keep the internet up when one or even a few go down.

    • I just don't get the mentality of "We want you to do this for free"

      You must be new here. The US Army should provide the internet for free, and make its money by doing live gigs in the Middle East, and selling action figures.

  • by Anonymous Coward

    What's the problem? The point of redundancy isn't to keep all redundant instances up all the time. The system is designed to allow for downtime of quite a few servers.

  • Lowest bidder (Score:4, Insightful)

    by pixiekhatt ( 1344865 ) on Saturday October 02, 2010 @09:41AM (#33769800)
    This is what happens when you give contracts to the lowest bidder. The military may have tons of money, but that doesn't mean they spend it wisely. Even if it's not a contracted company taking care of these servers, and it's government employees (there's a difference), a LOT of those employees get their jobs based on keywords and general qualifications and several have a 'I did my time in the military and retired, they owe me this for all the hard work I did before' attitude. Not everyone is like that, and I've met some government employees (in the tech field) who really did know their stuff.. and not all contracts are bad -- but they can turn sour when a company steps in, says they'll do all that and more for this much less, and they really don't know what they're doing. I've seen that happen too. And if it's managed by soldiers.. well. They always told us, you're a soldier first, and a 'whatever your job is' after. Most technically trained soldiers don't know how to do their job well, or even at all. They just tough it out until they're an NCO, and then they're supposed to be a leader and tell their underlings to do the work.
    • Re:Lowest bidder (Score:5, Interesting)

      by Isao ( 153092 ) on Saturday October 02, 2010 @10:19AM (#33769948)
      There are two main approaches to government contracting: Lowest Cost and Best Value. Contrary to popular belief, Lowest Cost is not always the one chosen, by a long shot. I also previously misunderstood "Close enough for government work." Turns out most "government work" has very specific requirements and specifications, or you don't get paid. If you see something different, please call Waste, Fraud & Abuse.
      • Re: (Score:1, Insightful)

        by Anonymous Coward
        To tack on, often times those extremely expensive "military spec" tools and such are expensive due to having to meet standards that would normally considered ridiculous. The reason being that it's usually not too hard to head off and buy a new one, but in the middle of a war zone, it's both time consuming and risky to assume you can get a new one.
      • by AK Marc ( 707885 )
        If you see something different, please call Waste, Fraud & Abuse.

        So every government project that runs over budget, I should report it as the fraud it is? There are only a few firms that have mastered the hoops to get a government contract, and they seem to bid low, spend high, and send the bill to the feds. And the feds pay it and come back for more. Are you saying that reporting them for that will have any effect? Because from where I sit, a disproportionate number of contracts are over budget an
        • The military industrial complex probably helps conceal a lot of fraud in much the same way that lobbyists and politicians schmooze.

          No politician is going to give up a cushy position at a company by burning a bridge through enforcement.

    • by hsmith ( 818216 )
      Lowest bidder does not equate getting the contracting job. Not in this day and age.
    • Re: (Score:3, Insightful)

      by John Hasler ( 414242 )

      > This is what happens when you give contracts to the lowest bidder.

      Because they'd obviously get better results by giving them to the highest bidder...

      Try to get your head around concepts like "requirements", "specifications", and "lowest qualified bid". You not only do not get paid if you don't do the job you agreed to do, you may even have to pay the extra cost of having someone else do it over.

      • by sjames ( 1099 )

        That in connection with sometimes screwy requirements is why the job always goes to large corporations whose primary skill is convincing government agencies that they'd better sign off on the half-assed job that was done.

  • by Anonymous Coward on Saturday October 02, 2010 @09:59AM (#33769870)

    Hardware fails. That's just how it is. Even with the highest end hardware available today, outages can happen. This is why there are 13 root servers to start with. So long as they don't all go down at once, all is good. As far as 18 hours to recover, why is that bad? With 12 others to pick from, should this one be a high priority? I think not. Getting one's panties in a bunch because a server fails and takes some time to recover makes you sound like a silly management type. Most of us lived at least a large part of our lives without any root servers - or any servers at all. It's not the end of the world if DNS goes down. It will be ok, I promise.

    • by horatio ( 127595 )
      Too bad you posted as A/C, because you make a good point. Further, quoting the summary:

      ...or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability.

      Maybe, before opening your fat mouth and posting on /. something you have no facts on, but seem to confidently be able state that the US Army has "acted stupidly" - you research what went wrong and then pass judgement. The parent is correct - there are 13 root servers so that one or two or three CAN go down - either because of a failure or for maintenance - without killing the whole interwebs.

    • by Bengie ( 1121981 )

      18 hours down is only 99.6% uptime averaged over the year assuming no other failures. A well maintained server can have 99.99%-99.999% uptime. They should have a virtual server that can failover to other hardware without an end user noticing. Each server should have multiple network connections in case a NIC or switch fails. Not to mention a major server should have one admin on hand 24/7 and a recovery plan that can get the server back-up-and-running in MUCH less than 18 houirs. We're not talking about som

      • 18 hours down is only 99.6% uptime averaged over the year assuming no other failures. A well maintained server can have 99.99%-99.999% uptime.

        And here you just went and mixed two different time periods. Do you seriously believe a 99.99% or 99.999% uptime is measured over a single year? Lets look at when the previous time the Army rootserver went down. Was it anytime within the last two years? If no, then they have 99.99% uptime.

      • by sjames ( 1099 )

        Why, do we just have money to burn?

        I can't speak for everyone, but I didn't even notice that it was down, so how much can it be worth to make sure it never happens again?

      • A well maintained server can have 99.99%-99.999% uptime.

        And I do believe that the DNS Root Server meets that, if not betters it. You'll notice that while the one node was down, the other 12 were up and running. The DNS root system is, when you get right down to it, a 13-node HA cluster. The system stayed up and serviceable, even though one specific node was down. Functioned as designed.

  • by Anonymous Coward on Saturday October 02, 2010 @10:06AM (#33769882)

    They're sticking to their moto and deploying an Army of one.

  • wow (Score:5, Insightful)

    by buddyglass ( 925859 ) on Saturday October 02, 2010 @10:13AM (#33769928)
    Whine much?
  • by Antique Geekmeister ( 740220 ) on Saturday October 02, 2010 @10:15AM (#33769936)

    I've seen numerous instances where the monitoring system, itself, was confused or detached. The results on a chart are then quite confusing, unless you know how to backfill the data in the chart.

    Why, no, I've never been asked to do that for a 99.999% uptime SLA monitored site when some confused person in the offsite monitoring station put a bad IP address in /etc/hosts. No, no, no, couldn't happen.

    • by Anonymous Coward on Saturday October 02, 2010 @02:12PM (#33771190)
      Classification: UNCLASSIFIED
      Caveats: NONE

      > FYI, the H root server is currently experiencing an outage
      > due to a SONET ring outage possibly caused by flooding from
      > the tropical storm on the east coast. No estimated repair time.

      H root returned to service at 12:30 UTC today. Fiber cut due to downed
      utility poles. Repair was delayed due to high water.

  • They are too busy getting blocked by my PeerBlock application to deploy more DNS sites.

  • I think you are overreacting a little bit. The expectation always was that one or more root servers would be unavailable at any one time - hence why there are 13 different root server systems available. More than one can be unavailable for days, and due to redundancy and caching it won't affect anything - as expected, nobody has really noticed this blip.

    There should be a good mix of technologies used in the different root server systems - different architectures, OS, etc. Some sites use anycast which gives

  • Non-story (Score:3, Interesting)

    by A beautiful mind ( 821714 ) on Saturday October 02, 2010 @11:20AM (#33770278)
    You have to realise that the layout of the root dns server hierarchy is historical. It is composed of organizations that are vastly different now than they were 20 years ago. The H root server people don't seem to care about things very much and there are a couple of other root servers where the organizations operating them don't put too much effort into things.

    Luckily, the internet doesn't really depend on them, as there are a couple of big organizations with heavy investment into making sure the root servers stay accessible all the time, like RIPE or Verisign. They operate thousands of physical machines at dozens of geographically distributed locations, all structured under one ip address, via anycast. This results in the situation where one logical root server outweights the other one in terms of physical boxes at least 100:1, if not more.

    My last information about the Verisign operated root servers from a couple years ago for example is that they are ridiculously overprovisioned, operating well under 1% used capacity, even when subjected to a fairly large DDOS. As far as I know, the common dns servers all support rtt banding, so basically using a random list of dns servers for a given resource that fall below a threshold of latency, therefor they wouldn't really notice the H root being down.
  • by Xemu ( 50595 ) on Saturday October 02, 2010 @11:32AM (#33770350) Homepage

    Could this simply be a part of the Cyber Storm III information warfare exercise? []

    • Its either planned or SNAFU.

      I'd lean toward planned. Somewhere that has to be some infographics showing the Internet doing its thing in reorganizing small whole in the DNS.

      I've heard stories of .gov with 3 letter names alligator clipping batteries to the powercords of servers in order to move them "uninterrupted" So I think they got the right stuff to keep the hardware going.

  • ..and which Microsoft Product are you running?
  • *Unplugs toaster oven and plugs back in server*

  • My guess is that since this root server is designed to operate on MILNET after disconnecting from the Internet, they may have been running a drill to do just that. Also, I highly doubt that this is the only root server on MILNET. I expect that they have multiple sites and plenty of redundant locations, but they only give out the Maryland location for security reasons.

  • by Klync ( 152475 ) on Saturday October 02, 2010 @02:27PM (#33771282)

    > All these root operators that have only one site need a good swift kick...

    Alright, anonymous coward, I nominate YOU to be the one to go and give the US Army a "good swift kick". See ya when you get back!

  • How, in this day and age of network engineering, can we even entertain one of the thirteen root servers being unavailable for so long?

    Just a small, minor issue there... 1 of the root dns went down... only another 12 still up. Not really a problem even if it had been down for a week.

    The whole reason for having 13 root servers is that you can lose a fair few of them before anyone needs to start worrying.

  • I heard through the grapevine that a cable at ARL was cut. I can't find anything to substantiate this other than a slightly related "unscheduled network maintenance" notice here []

Have you ever noticed that the people who are always trying to tell you `there's a time for work and a time for play' never find the time for play?