Army DNS ROOT Server Down For 18+ Hours 154
An anonymous reader writes "The H-Root server, operated by the US Army Research Lab, spent 18 hours out of the last 48 being a void. Both the RIPE's DNSMON and the h.root-servers.org site show this. How, in this day and age of network engineering, can we even entertain one of the thirteen root servers being unavailable for so long? I mean, the US army doesn't even seem to make the effort to deploy more sites. Look at the other root operators who don't have the backing of the US government money machine. Many of them seem to be able to deploy redundant instances. Even the much-maligned ICANN seems to have managed deploying 11 sites. All these root operators that have only one site need a good swift kick, or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability."
Army Intelligence? (Score:2, Funny)
An Oxymoron indeed!
Re: (Score:1, Insightful)
It was probably outsourced to the cheapest bidder. Either that or some incompetent idiots got the winning bid
by greasing a few palms.
Re:Army Intelligence? (Score:5, Funny)
Don't be so harsh on the US military. They only have a trillion dollar budget, you know? How are you ever going to set up redundant systems if all you get is pocket change? You have to cut corners somewhere. Maybe it's time to increase their funding a bit more.
Re: (Score:3, Interesting)
Actually, given the size and scope of the US military, you are right, 1 trillion dollars is about pocket change to most people.
I'm for increasing their budget more too. But I'm not sure that this outage wasn't planned. How better to test the ability to withstand a "cyber attack" then to lose your DNS servers and see if the your departments can fully function without them. This ability would greatly decrease the time needed to change to an alternative system if ever needed or more likely regroup resources an
Re: (Score:2)
How about, hmm, I don't know, reducing the scope? Did you ever see president Eisenhowers' farewell speech? If you didn't, you can always find it online.
And no, one trillion dollar isn't pocket change. That's an insane amount of money. Just take a calculator and calculate the cost per capita, the "trillion" figure is a unit people aren't very familiar with, and looks deceivingly small. It isn't.
Re: (Score:2)
http://en.wikipedia.org/wiki/Military_budget_of_the_United_States#Budget_Breakdown_for_2011 [wikipedia.org]
I know nothing, heh? Fucktard.
Re: (Score:2, Funny)
Re: (Score:3, Insightful)
Er... the Navy has outsourced to HP. In fact, to get out of the agreement they are having to pay to even receive information about the network configuration.
Hmmm (Score:1)
Re: (Score:1)
Not really, there is some redundancy in the system. But 7.5% of the DNS system was down. We should definitely increase the number of people who control the DNS system. Preferably all over the world. Heck, let's give Iran a couple too and be fair.
So the Internet worked as it should... (Score:5, Insightful)
So the Internet worked as it should, and routed around this disruption. The other root servers were unaffected, and still functioned fine. So what exactly is the problem?
Re:So the Internet worked as it should... (Score:5, Funny)
Because it's Saturday, and we don't have anything else to get upset about! WE HAVE TO HAVE SOMETHING TO GET UPSET ABOUT, DON'T YOU UNDERSTAND?! How can I be expected to face the day if I'm not pissed off about something that doesn't directly affect me in any meaningful way?
Re: (Score:1, Interesting)
Umm--having retired from the military, and having also been a networking professional for fifteen years in education and industry, I have less than a great respect for the products of the U.S. Army Signal School, who happens to operate that server. I was activated for service in Iraq, and watched a fellow captain, a graduate of that school, and someone with at least five years experience, insist that Ethernet Cat 5 had a maximum single link distance of 185 meters. And he designed his network around that p
Re: (Score:3, Insightful)
We've all made links in cat5 > 200 meters that work perfectly fine. Granted, perfect reliability is something else, but for a backup link in a datacenter that charges an arm and a leg for fiber connections and < 10% of that price for copper ... I've even been known to stick that link in a 10G copper interface card to see if it'd work (even if it didn't work). But I've had reliable gigabit copper links over > 250meters operational for years.It helps a lot if they're the only ethernet link in a metal
Re: (Score:2)
Ever had an ethernet link inside a bundle of VDSL links ?
Were you running shielded cable? That probably would have helped a lot more than going to Cat 6.
Re: (Score:1, Offtopic)
Re: (Score:2)
and all you can say is DON'T PANIC !?
wait, what was the question again?
Re: (Score:2)
EXACTLY. That's why we have more than just root-server.net.
Maybe.. just maybe.. (Score:2)
They didn't want YOU to access their servers?
Why is it their problem? (Score:3, Insightful)
Because they don't have redundancy? Everyone gets mad because the USA wants to control the internet, but let something go bad and then someone wants to point fingers? Really? I just don't get the mentality of "We want you to do this for free" and then people turn around and B&M about the service being down for a bit.
Re:Why is it their problem? (Score:5, Insightful)
It has nothing to do with this being a US Army server. It has everything to do with bad design. The people given the responsibility of a root server should NOT take that responsibility lightly.
Re: (Score:2)
"backing of the US government money" (Score:5, Insightful)
Rest assured, the government isn't holding back. Those non-redundant Army servers already cost an order of magnitude more then everybody else's redundant servers.
Re: (Score:2)
It would probably be reasonably easy to get someone else to run the H server cluster. The DNS protocol itself limits the number to 13, quite by accident, and there was no grand design when it was decided who was getting them.
If the US army can't run their server properly, they should offer the slot to someone else.
Re: (Score:2)
Army:"No, we can't let you do that!" ...
Me:"Why?"
Army:"National Security [i.e. PR]. If you don't shut up now, we'll give your name to the FBI!"
Do you really expect any other result?
Re:Why is it their problem? (Score:5, Informative)
Actually, most of the root "servers" are "anycast" now (9 of 13), so a single site failure doesn't matter. The US DoD runs two (G and H). G is anycast. H isn't. There wasn't clarification to what the issue was. It's easy to be quick to say "oh they suck", but shit happens sometimes. That's part of why we don't run on just one root nameserver. :)
For all we know, it could have been a planned outage. I kinda doubt it with that size window, but who knows. It was only 1 of 13, which makes it more like 1 of an awful lot since 9 of the "servers" are really servers distributed world wide. I was doing some monitoring a while back, showing how our traffic moved, and that included monitoring the root servers. It made some really screwy routes, where one check would be in the US, and the next one would be somewhere in Europe.
Re: (Score:2, Interesting)
I know most are anycast. I still think DoD should give up their slot to someone else, especially since they have 2. There is no reason why any organisation should have two slots; the only reason for that is historical.
Re: (Score:1, Troll)
Because they don't have redundancy?
What do you mean they don't have redundancy? Last time I checked there were something like 13 root servers. The entire purpose of having multiple root servers is to keep the internet up when one or even a few go down.
Re: (Score:2)
You must be new here. The US Army should provide the internet for free, and make its money by doing live gigs in the Middle East, and selling action figures.
One down, several dozens up (Score:2, Insightful)
What's the problem? The point of redundancy isn't to keep all redundant instances up all the time. The system is designed to allow for downtime of quite a few servers.
Re: (Score:1)
Re: (Score:2)
Lowest bidder (Score:4, Insightful)
Re:Lowest bidder (Score:5, Interesting)
Re: (Score:1, Insightful)
Re: (Score:2)
So every government project that runs over budget, I should report it as the fraud it is? There are only a few firms that have mastered the hoops to get a government contract, and they seem to bid low, spend high, and send the bill to the feds. And the feds pay it and come back for more. Are you saying that reporting them for that will have any effect? Because from where I sit, a disproportionate number of contracts are over budget an
Re: (Score:2)
The military industrial complex probably helps conceal a lot of fraud in much the same way that lobbyists and politicians schmooze.
No politician is going to give up a cushy position at a company by burning a bridge through enforcement.
Re: (Score:2)
Re: (Score:3, Insightful)
> This is what happens when you give contracts to the lowest bidder.
Because they'd obviously get better results by giving them to the highest bidder...
Try to get your head around concepts like "requirements", "specifications", and "lowest qualified bid". You not only do not get paid if you don't do the job you agreed to do, you may even have to pay the extra cost of having someone else do it over.
Re: (Score:2)
That in connection with sometimes screwy requirements is why the job always goes to large corporations whose primary skill is convincing government agencies that they'd better sign off on the half-assed job that was done.
There are 12 others - pick one. (Score:5, Insightful)
Hardware fails. That's just how it is. Even with the highest end hardware available today, outages can happen. This is why there are 13 root servers to start with. So long as they don't all go down at once, all is good. As far as 18 hours to recover, why is that bad? With 12 others to pick from, should this one be a high priority? I think not. Getting one's panties in a bunch because a server fails and takes some time to recover makes you sound like a silly management type. Most of us lived at least a large part of our lives without any root servers - or any servers at all. It's not the end of the world if DNS goes down. It will be ok, I promise.
Re: (Score:2)
...or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability.
Maybe, before opening your fat mouth and posting on /. something you have no facts on, but seem to confidently be able state that the US Army has "acted stupidly" - you research what went wrong and then pass judgement. The parent is correct - there are 13 root servers so that one or two or three CAN go down - either because of a failure or for maintenance - without killing the whole interwebs.
Re: (Score:2)
Calm down, it's not FaceBook that went down. The redundancy is designed specifically so these machines are allowed to go down from time to time, and designed that way in a time when redundancy behind the address was not very likely.
The net effect of this was a little more traffic on the other servers.
Re: (Score:3, Interesting)
Go ahead and rub your nose in it until you get over your "how DARE you claim incompetence within the Army" offense.
First, let me start by saying that the guy you replied to was rude, and I don't see why he needed to insult you to make his point. However...
What went wrong is that a server that's not supposed to ever go down went down.
Your argument seems circular. Your assumption is that this root server is never supposed to go down. In this physical world, that's a pretty huge assumption to make.
And no, saying that the server went down is no proof positive that it should never have happened. The fact is, there was redundancy and the redundancy kicked in as it was supposed to. Now we're saying t
Re: (Score:2)
18 hours down is only 99.6% uptime averaged over the year assuming no other failures. A well maintained server can have 99.99%-99.999% uptime. They should have a virtual server that can failover to other hardware without an end user noticing. Each server should have multiple network connections in case a NIC or switch fails. Not to mention a major server should have one admin on hand 24/7 and a recovery plan that can get the server back-up-and-running in MUCH less than 18 houirs. We're not talking about som
Re: (Score:2)
18 hours down is only 99.6% uptime averaged over the year assuming no other failures. A well maintained server can have 99.99%-99.999% uptime.
And here you just went and mixed two different time periods. Do you seriously believe a 99.99% or 99.999% uptime is measured over a single year? Lets look at when the previous time the Army rootserver went down. Was it anytime within the last two years? If no, then they have 99.99% uptime.
Re: (Score:2)
Why, do we just have money to burn?
I can't speak for everyone, but I didn't even notice that it was down, so how much can it be worth to make sure it never happens again?
Re: (Score:2)
And I do believe that the DNS Root Server meets that, if not betters it. You'll notice that while the one node was down, the other 12 were up and running. The DNS root system is, when you get right down to it, a 13-node HA cluster. The system stayed up and serviceable, even though one specific node was down. Functioned as designed.
Re:There are 12 others - pick one. (Score:5, Insightful)
Meh. It's just one of 13 roots. Almost nobody queries it directly. If I have my DNS pointing to my ISP DNS, or to Google DNS, or to my own recursive caching DNS Server which uses one of those as an upstream, all 13 root servers could be down for literally days and it's likely that almost nobody would ever notice. Most DNS servers will retain large caches of most domains. If something freaks out when the roots disappear, a few small ISP's might need to make some quick configuration changes. Some DNS changes wouldn' propagate properly until the DNS root servers were back online. But, frankly, life would go on. Making all of DNS go away would be pretty much impossible, short of taking out every node on the Internet.
Yes, if *All 13* root servers suddenly died, there would be a few people who would get a late night at the office, but I certainly wouldn't see the effects directly.
Re: (Score:1, Informative)
It's just one of 13 roots.
Actually, it's one out of over 200. There are only 13 IP addresses, but behind most of these addresses (anycast) there are multiple sites.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Ummmmm, no. Not at all. Not even close. Not even remotely comparable in any way. Not a comparison that survives even one second of rational thought. Not a sentence that I was able to finish without thinking the very phrase you started your post with.
Re: (Score:1, Troll)
Re: (Score:2)
How many e-mails were dropped as a result of the one (out of 13) DNS server that was down? How many web sites were unavailable? Did you even know there was a problem before seeing it here on /.?
Redundancy is there for a reason - to make sure that things continue even if one or more systems are unavailable.
Step back, take a breath, and get on with your weekend.
Re: (Score:2)
Re: (Score:2, Informative)
Right at the beginning of the thread you fricken moron.
Re: (Score:1, Insightful)
What is the cost of a missed email?
1 phonecall
How about a thousand of them?
Less spam. (seriously, can you think of anyone else needing to send 1000 mails in one hour?)
If one non-spam email in 10,000 contains an urgent piece of information what's the cost of missing an hour's worth?
1. SMTP does not guarantee timely delivery.
2. Sending an email now does not give you any assurance on when it will be read.
Anyone using email for time-critical information transport runs a risk -- that can be foreseen.
How about purchases?
"Oh noes! Amazon is offline! Where do I shop now?!?!?"
What if every internet based store, currency trading mechanism, bond exchange and commodity exchange lost an hour's income?
What if? Seriously: what if???
Some companies making money don't do so for 1 hour. Oh dear.
Oh wait, I'm not paid by money-making companies
Re: (Score:2)
You left off "for one whole hour" to the end of every sentence. Of those 1,000,000 phone calls, all the important ones would be made once the lines are back up. Ditto the research, the telecommuters, most purchases, email, etc. The tow trucks would get there. The markets wouldn't collapse if you couldn't trade for one hour, in fact, if no one could trade, there would be no change at all.
It would cost money, but not anywhere near the ridiculous claim you made.. Destroying a city is permanent, you never
Re: (Score:2)
Except it won't happen that way. If the DNS is all down, the mail servers will just retry and find it back up later. Email is only required to either deliver the mail or report it undeliverable within FIVE DAYS. If you actually depend on faster delivery, you're using the wrong service.
If every online store goes down for an hour, do you know what will REALLY happen? Everyone will try to order again later and it will go through just fine. A bit of an anti-climax really.
Besides that, nothing remotely like that
Re: (Score:1, Offtopic)
From the small to the great the world is online now and even an hour's outage of the internet would be a disaster comparable in economic and social cost to the complete destruction of a small city somewhere in the world.
And I would gladly watch it all crash and burn.
Re: (Score:1)
We have a billion people using social networks for hours.
Well maybe we'll gain productivity if they get to work instead of wasting time!
Re: (Score:2)
"Now we have everything from business to business transactions" so sad. Businesses might have to pick up a phone.
"stock trading" Aw, the high frequency traders will have to take a day off!
"government bonds to consumer purchases" Government bonds are pretty slow turnover. A one day holiday would be no big deal. If you've just gotta have your bonds, there's always the phone. Or actually going to a bank! As for consumer purchases, a day off from that wouldn't hurt anyone either. And there's always getti
Really, I'm going to be the first? (Score:4, Funny)
They're sticking to their moto and deploying an Army of one.
Re: (Score:3, Funny)
When the movie comes out, will it be Stephen Spielberg, James Cameron, or Mel Brooks?
wow (Score:5, Insightful)
Was it the monitoring system? (Score:3, Interesting)
I've seen numerous instances where the monitoring system, itself, was confused or detached. The results on a chart are then quite confusing, unless you know how to backfill the data in the chart.
Why, no, I've never been asked to do that for a 99.999% uptime SLA monitored site when some confused person in the offsite monitoring station put a bad IP address in /etc/hosts. No, no, no, couldn't happen.
Re:Was it the monitoring system? (Score:5, Informative)
https://lists.dns-oarc.net/pipermail/dns-operations/2010-October/006142.html
Classification: UNCLASSIFIED
Caveats: NONE
> FYI, the H root server is currently experiencing an outage
> due to a SONET ring outage possibly caused by flooding from
> the tropical storm on the east coast. No estimated repair time.
H root returned to service at 12:30 UTC today. Fiber cut due to downed
utility poles. Repair was delayed due to high water.
mod parent up - it has actually *gasp* information (Score:3, Funny)
Wish I had mod points...
Of the 64 comments I see in full, only this one has actual pertinent information about the downtime.
...
I must be new here. :)
The Army Research Lab? (Score:1)
They are too busy getting blocked by my PeerBlock application to deploy more DNS sites.
Meant to happen (Score:2)
I think you are overreacting a little bit. The expectation always was that one or more root servers would be unavailable at any one time - hence why there are 13 different root server systems available. More than one can be unavailable for days, and due to redundancy and caching it won't affect anything - as expected, nobody has really noticed this blip.
There should be a good mix of technologies used in the different root server systems - different architectures, OS, etc. Some sites use anycast which gives
Non-story (Score:3, Interesting)
Luckily, the internet doesn't really depend on them, as there are a couple of big organizations with heavy investment into making sure the root servers stay accessible all the time, like RIPE or Verisign. They operate thousands of physical machines at dozens of geographically distributed locations, all structured under one ip address, via anycast. This results in the situation where one logical root server outweights the other one in terms of physical boxes at least 100:1, if not more.
My last information about the Verisign operated root servers from a couple years ago for example is that they are ridiculously overprovisioned, operating well under 1% used capacity, even when subjected to a fairly large DDOS. As far as I know, the common dns servers all support rtt banding, so basically using a random list of dns servers for a given resource that fall below a threshold of latency, therefor they wouldn't really notice the H root being down.
It's just a drill: Cyber Storm III (Score:4, Interesting)
Could this simply be a part of the Cyber Storm III information warfare exercise?
http://www.military-technologies.net/2010/09/29/test-of-first-us-cyber-blitz-response-plan-begins/ [military-t...logies.net]
Re: (Score:2)
Its either planned or SNAFU.
I'd lean toward planned. Somewhere that has to be some infographics showing the Internet doing its thing in reorganizing small whole in the DNS.
I've heard stories of .gov with 3 letter names alligator clipping batteries to the powercords of servers in order to move them "uninterrupted" So I think they got the right stuff to keep the hardware going.
And which Product? (Score:2)
You must be mistaken, check again (Score:2)
--BOFH
Re: (Score:2)
Probably not unplanned (Score:2)
My guess is that since this root server is designed to operate on MILNET after disconnecting from the Internet, they may have been running a drill to do just that. Also, I highly doubt that this is the only root server on MILNET. I expect that they have multiple sites and plenty of redundant locations, but they only give out the Maryland location for security reasons.
A good swift kick? (Score:3, Funny)
> All these root operators that have only one site need a good swift kick...
Alright, anonymous coward, I nominate YOU to be the one to go and give the US Army a "good swift kick". See ya when you get back!
Errr.. (Score:2)
Just a small, minor issue there... 1 of the root dns went down... only another 12 still up. Not really a problem even if it had been down for a week.
The whole reason for having 13 root servers is that you can lose a fair few of them before anyone needs to start worrying.
Fibre was cut (Score:2)
I heard through the grapevine that a cable at ARL was cut. I can't find anything to substantiate this other than a slightly related "unscheduled network maintenance" notice here [hpc.mil]
Re: (Score:2)
Re: (Score:2)
For the most part, I can confirm that, but there was an 18+ hour period where a percentage of my queries simply reported that Mike Muss was unknown. Odd.
Re:Not the biggest problem out there.,,, (Score:5, Interesting)
Agreed.
From the offending server's website: "BRL volunteered to host one of the original root servers ... to provide a root server for the MILNET in the event that MILNET had to be disconnected from the Internet."
The purpose of the G/H servers is not to support the greater good (that's a side benefit), but to ensure that the MILNET can function if the DoD cuts itself off from the rest of the internet.
And besides, If my math is correct, there are a total of 205 redundant root sites (http://www.root-servers.org/), so imagine going up asking for funding...
[IT Guy] "General, we need money to add another redundant root server site, if all the sites go down the internet collapses!"
[General] "That sounds bad! How many redundant sites are there now?"
[IT Guy] "Only 205"
[General]
Re: (Score:2)
Re: (Score:1, Funny)
I'm betting that some muscle bound, Rambo GI Joe type was trampling around inside of your rectum last night with his third leg.
Re: (Score:2)
Re: (Score:2)
This is what we need for the DNS system - a decentralized distributed directory.
Oh wait that's EXACTLY WHAT IT IS! Notice how there was no outage because the H-root was down.
And like sibling jojoba points out- p2p DNS would be HORRIBLE from a trust perspective.
Re: (Score:2)
Re: (Score:2)
*Looks up map of U.S. on Wikipedia*
Nope, I guess the I, K and M servers are safely controlled by NON-US organizations.
I'll agree the U.S. probably has too many of them, but aside from the military I'll assume the other orgs know what they're doing.
Stop spreading FUD.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
all are in the same monkey league, and league's governance is in usa
The US is by far the most powerful of the Western allies, and therefore it sets the agenda. No conspiracy there.
any country which joined nato, had to get this organization's equivalent.
They also had to commit troops and put them under US command, they were scrutinized by US intelligence, and they had to allow US troops to be stationed on their soil, imagine that!
The still wanted to join, voluntarily, because of the Soviet threat and becau
Re: (Score:2)
and, the dns root servers have no impact on politics
Re: (Score:2)
You or anybody can set their DNS service to whatever they like; it's less than a minute. You can even run your own DNS service. Most people just happen to prefer the DNS services running in "American satellites", probably because they also do their shopping, their socializing, and their traveling in those "American satellites".
You already have that problem (Score:2)
And like sibling jojoba points out- p2p DNS would be HORRIBLE from a trust perspective.
Actually, what it means, is that we would have to actually fix once and for all, the identity/trust/reputation problem that the Internet already engenders. Unless you use https for everything, signed emails etc you are already trusting people all over the place.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I'm betting it had more to do with the Tropical Storm that hit the US East Coast, as referenced in the announcement and "back online" emails sent from the US Army. Maybe they're in on the conspiracy.