Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
The Internet Bug

Entire .SE TLD Drops Off the Internet 207

Posted by timothy
from the absolut-typo dept.
Icemaann writes "Pingdom and Network World are reporting that the SE tld dropped off the internet yesterday due to a bug in the script that generates the SE zone file. The SE tld has close to one million domains that all went down due to missing the trailing dot in the SE zone file. Some caching nameservers may still be returning invalid DNS responses for 24 hours."
This discussion has been archived. No new comments can be posted.

Entire .SE TLD Drops Off the Internet

Comments Filter:
  • No big deal (Score:2, Informative)

    by RPoet (20693)

    The downtime lasted 30 minutes, and most domains were probably cached by nameservers anyway.

    • Re:No big deal (Score:4, Informative)

      by wsanders (114993) on Tuesday October 13, 2009 @11:11AM (#29732973) Homepage

      Yeah, been there done that. *My* fumble only brought 10,000 domains down for about 10 minutes, and no one noticed. (I think all the domains hosted only cat pictures anyway.)

      Sorry, that's as big a responsibility as any employer has ever deemed suitable for my incompetent ass.

    • by eldavojohn (898314) * <eldavojohnNO@SPAMgmail.com> on Tuesday October 13, 2009 @11:11AM (#29732987) Journal

      The downtime lasted 30 minutes, and most domains were probably cached by nameservers anyway.

      I once viddied an animated documentary about a small town in Colorado that lost the internet for 22 minutes [wikipedia.org]. It was not pretty. Our hearts and minds go out to you, people of Sweden. I cannot even fathom what that would be like ... I hope the looting and rioting has died down with the restoration of the internet.

      • Nah. In Sweden, when you want to see hot chicks, you just have to go outside. Even looking out the window might suffice. ^^

        • Yes, but all that damn annoying ABBA and Ace of Base is so distracting you can't DO anything with(to) the hot chicks.
    • Re: (Score:3, Insightful)

      by scott_karana (841914)

      While the impact of this is no big deal, it's still kind of scary that the people running a decently-sized ccTLD would make such a novice mistake on their zonefile.

      • Re: (Score:3, Insightful)

        by MrMista_B (891430)

        You expect them to be absolutely perfect all the time no matter what, forever and ever? /That's/ unrealistic.

    • by CorporateSuit (1319461) on Tuesday October 13, 2009 @11:45AM (#29733417)

      The downtime lasted 30 minutes, and most domains were probably cached by nameservers anyway.

      I didn't notice the DNS freak out, but I did notice the internet's smug meter had dropped about 30%.

    • Re:No big deal (Score:5, Insightful)

      by eln (21727) on Tuesday October 13, 2009 @11:55AM (#29733561) Homepage
      The actual downtime is no big deal, but the reason it happened is. Evidently, the registrar for an entire country's domain likes to roll out changes to the primary zone file without any sort of testing or syntax checking first. Simply having a small network (one or two computers) running a test root server, and running your scripts against that first, would have discovered the bug.

      DNS is very simple, but it's just as prone to human error as anything else. If you're responsible for the records of a large number of domains (like, say, an entire country), you probably ought to take some time to develop proper testing and change control procedures before you fiddle with it. It sounds like these guys didn't take it seriously enough and got burned. I hope they'll learn their lesson from this and change their procedures.
      • by CorporateSuit (1319461) on Tuesday October 13, 2009 @01:35PM (#29734951)

        DNS is very simple, but it's just as prone to human error as anything else.

        Are you kidding? I've been programming DNS for a long time, and if theirs one thing I learned, its that programmers like me don't make errors.

        • Re: (Score:2, Redundant)

          by marsu_k (701360)

          Are you kidding? I've been programming DNS for a long time, and if theirs one thing I learned, its that programmers like me don't make errors.

          If one doesn't count spelling errors, apparently.

    • by corbettw (214229)

      No big deal? No big deal??? Where the hell else am I supposed to go to look at pictures of hot Swedish women hitting the nightclub scene (in a way that's at least a little SFW) if I can't get to http://www.thelocal.se/ [thelocal.se]?

    • Re: (Score:3, Insightful)

      by mcgrew (92797) *

      I wish browsers would store the IP address of the page as well as the domain name in bookmarks. That way if the DNS server goes down you could still get to the site. Of course, the primary lookup should still be the domain name, since a site can have its address changed; the browser would only look at the IP if the DNS lookup failed.

      • by dissy (172727)

        That feature would be very handy.

        The main reason one can't simply record host/ip pairs right now, is due to named-based virtual web servers.
        Even if you put in the IP manually, without sending the correct domain in the http request, you won't get the proper page.

        Having the IP as a separate field in the bookmarks would let the browser connect to any IP you put there (be it cached, or manually changed when a server is renumbered), but it would still have the needed data to send in the http request to make the

  • by Anonymous Coward on Tuesday October 13, 2009 @11:04AM (#29732903)

    Goat.se

    • Re: (Score:3, Funny)

      by Tetsujin (103070)

      Goat.se

      Huh... that's interesting. I've never heard of that one before... I think, though, that based on your recommendation I'll share the link with the rest of the office. I've seen a lot of your posts here in Slashdot, Anonymous Coward, and all the ones I've seen have been pretty highly rated, so I'm guessing you wouldn't link me to a website that wasn't interesting.

      • (humor)
        The satellite Microsoft Retro Fan Site Windows98.se also went down.

        And look. My sig this month is all about your joke.
        (No Closing tag. The humor never ends.)

    • Don't worry, there's plenty of mirrors......unfortunately.

    • Re: (Score:3, Funny)

      by AliasMarlowe (1042386)

      Goat.se

      Arrgh... the horror... http://goat.se/cx [goat.se] You'll want to claw your eyes out!

  • by SuperBanana (662181) on Tuesday October 13, 2009 @11:18AM (#29733063)

    I seriously hope someone is fired or loses a contract over this. Where was the validation, change control, etc? I would expect that at the TLD level, a change to a configuration file would have to be inspected by someone AND run through some syntax-checking scripts...

    As for the person who was modded up for saying "hey, no big deal, fixed in 30 minutes!", not quite. DNS servers (and individual computers!) cache negative results. Anything anyone did a query on during those 30 minutes will be negatively cached by their system and their local DNS server. Granted, a whole lot of local Swedish ISPs and network providers have probably flushed their DNS server caches, but it's still going to seriously impact traffic to many, many sites, especially for everyone outside Sweden.

    • by Aphoxema (1088507) *

      It really isn't a big deal. The mistake was made, the world has the opportunity to learn from it and the economic impact was probably small but scalable enough to take seriously.

      Now if it happened again I'd hope action were taken... don't be so vengeful, SuperBanana!

    • Re: (Score:3, Insightful)

      by e2d2 (115622)

      I'll go one better and say we should try him in a military tribunal and sentenced to hard time in ADX. That will send the world a message - NO MISTAKES OR ELSE.

      Get real man, this is a human error. Your struggle for perfection baffles my monkey brain.

    • by Mathness (145187) on Tuesday October 13, 2009 @11:50AM (#29733467) Homepage

      I seriously hope someone is fired or loses a contract over this.

      You'll be happy to know that the person responsible have been found. The person in question was described as having unusual bushy eyebrows and speaking in a thick Swedish accent. His last comment about the incident, before being dragged away, was "bork bork bork".

    • I seriously hope someone is fired or loses a contract over this.

      It seems a silly idea to fire somebody just after having invested $(whatever_this_snafu_is_supposed_to_have_cost) into his education.

      • by vlm (69642)

        I seriously hope someone is fired or loses a contract over this.

        It seems a silly idea to fire somebody just after having invested $(whatever_this_snafu_is_supposed_to_have_cost) into his education.

        Disagree... Obviously that file was being maintained by hand, BS press releases about scripts to the contrary. So the failure was at the management level for allowing such a crazy working procedure with no testing infrastructure at all. The only "education" the peon got was "typos cause problems", not exactly a Nobel prize winning contribution to human knowledge (although in comparison to a recent winner...) Since management doesn't make mistakes, and someone has to be the fall guy... the excuse will pro

    • by RabidMonkey (30447) <canadaboyNO@SPAMgmail.com> on Tuesday October 13, 2009 @12:30PM (#29734063) Homepage

      As a DNS admin myself, touching high value zones, let me tell you, missing a stupid dot happens all the time. All the change control in the world doesn't help when you just don't type one little period. Even more helpfully, most tools won't notice and the zone will pass a configuration check because missing the trailing "." is syntactically correct.

      Let me add as well that "change management" that you want is just fantastic .. no making changes during core hours. When you run a 24/7 business, non-core hours means something like 2am. at 2am, I, and most mammals, are not at their mental best, so missing a single dot isn't horribly hard.

      The only thing I'd suggest they do is use an offline test box for zones, then promote that change to prod. Then, you can load all the mistakes you want, do your digs, and if stuff works, THEN you move it to prod. I never ever make changes on production servers, they are done offline, tested, then put into prod with scripts. It makes it a lot harder for missing periods to make it into production.

      Finally, this is a good reason why negative caching should have low TTLs. If you run a DNS server that can't handle low neg-caching TTLs, it's time to upgrade from a 386.

      Cheers.

      • by drinkypoo (153816)

        I think the big failure here is that anyone is ever editing the file by hand. It should be created programatically and edited only with a tool so that an error like this can never happen. (Of course, other errors are possible; now you have to vet your code. But the tool need not be complex, and in fact should be small enough to be provable if you so desire.)

      • Re: (Score:3, Insightful)

        by Chris Mattern (191822)

        Even more helpfully, most tools won't notice and the zone will pass a configuration check because missing the trailing "." is syntactically correct.

        Not if the configuration check you wrote checks for the trailing "." anyways. And if it doesn't, you need to rewrite it.

      • by rs79 (71822)

        It's not "a" dot, it's "every" dot. A bad script adn DNSSEC are to blame. Note that this is version 4 (5?) of dnssec. The earlier ones just didn't work.

        And there's a real bad gotcha in the current one they haven't found yet that has still to raise it's ugly head. In time.

      • by Blakey Rat (99501)

        at 2am, I, and most mammals, are not at their mental best,

        I'm a black-footed ferret, you insensitive clod!

      • by Eil (82413)

        I'm no DNS expert, but I can't fathom why negative responses are cached at all. You have many, many more requests for valid domains than you do for invalid ones and the vast majority of the invalid ones are one-off typos. I just don't see what the benefit is. We could do away with an entire class of sysadmin headaches if all resolver software configuration and network policies defaulted to not caching negative responses.

    • by Burdell (228580)

      Obviously, it passed syntax-checking, or the server wouldn't have loaded it. What you are looking for is semantic-checking, which is much more difficult. I expect that the generation scripts will be expanded to check for more things; that's generally what happens (you check for what you can think of, and expand the checking when someone thinks of a better way to break things).

      Negative caching (in BIND anyway) tops out at 3 hours (it looks like .se has it set to 2 hours). The NS record TTL is 2 days, so o

    • by Krneki (1192201)
      Chill out dude. Go got a beer or a coffee, life is too good to waste it complaining about problems.

      And if you get so emotional for 30min of Internet downtime you will probably die out of stress too soon.
    • "I would expect that at the TLD level, a change to a configuration file would have to be inspected by someone AND run through some syntax-checking scripts..."

      Expect price and time-to-activation increase for second level domains way beyond current status then.

      "DNS servers (and individual computers!) cache negative results."

      Yeah, but in practice only for individual resources, not whole domains, since negative answers from authoritative sources must include SOA references as per RFC2308.

      "Anything anyone did a

  • by 6Yankee (597075) on Tuesday October 13, 2009 @11:21AM (#29733093)
    ...borked!
    • by vandelais (164490)

      I'm chopping up the zone files if that's ok with you (tosses random shyte over shoulder)
      We'll scoop up all the trailing dots and put them in the stew

      BORKBORKBORK!

    • by Verdatum (1257828)
      I can't believe I had to scroll through this many comments to find the first BORK joke! I was starting to get nervous!
  • One missing character, repeated a whole lot of times, results in an entire TLD going offline. Awesome.
  • by nimbius (983462) on Tuesday October 13, 2009 @11:36AM (#29733295) Homepage
    an admin has popped back from lunch and asked, "hey guys did someone turn my computer off while i was gone? there was a file i was working on......"
  • DNS is the problem (Score:5, Interesting)

    by cthulhuology (746986) on Tuesday October 13, 2009 @11:37AM (#29733315) Homepage
    It still boggles my mind that anyone thought zone files are a good idea. The file format is so damn brittle, that a single byte can spell disaster. On top of that, the hierarchical naming structure presents an inherent systemic risk for all sub-domains as exhibited by this .se fiasco. Nevermind the injection attacks, Pakistan taking out Youtube, and the rest, you have organizations like Verisign which profit immensely off of keeping the system broken. And don't even bother mentioning DNSSEC, as it still doesn't resolve this fundamental issue. The next systemic fuckup will simply be a signed fuckup.
    • by mypalmike (454265)

      And your robust solution to a scalable global directory of name-to-ip address mapping is... ?

    • by upside (574799) on Tuesday October 13, 2009 @11:49AM (#29733449) Journal

      Except the Pakistan affair was about the BGP routing protocol. I agree the file format is nutty, though.

      I can't think of a better alternative to the hierarchical system, perhaps you have a suggestion. A flat namespace would be an administrative impossiblity, not to mention the stress it would put on name servers. Increasing the number of TLDs would lessen the impact of a single failure, though.

    • Re: (Score:3, Insightful)

      by RalphSleigh (899929)

      Pakistan taking out Youtube had absolutely nothing to do with DNS, they wrongly propagated a BGP announcement for the youtube IPs outside of Pakistan, so about 1/3 of the internet routed traffic into their black hole instead of to Youtube. Pretty effective blocking had they kept it internal, but they didn't.

    • Re: (Score:3, Informative)

      by Skuld-Chan (302449)

      Well in the 1980's when the RFC was written for zone files (1034/1035) it probably sounded like a perfectly sound way to configure this sort of thing, same with DNS in general (RFC's for which were also written in the 1980's).

      If it were invented from scratch today I'm sure it would resemble something like LDAP.

      The fact we haven't had more mass DNS failures like this is actually surprising.

    • It still boggles my mind that anyone thought zone files are a good idea. The file format is so damn brittle, that a single byte can spell disaster. On top of that, the hierarchical naming structure presents an inherent systemic risk for all sub-domains as exhibited by this .se fiasco. Nevermind the injection attacks, Pakistan taking out Youtube, and the rest, you have organizations like Verisign which profit immensely off of keeping the system broken. And don't even bother mentioning DNSSEC, as it still doesn't resolve this fundamental issue. The next systemic fuckup will simply be a signed fuckup.

      Yes, it's a shame you were still in diapers when this solution was developed. They could have benefited from your vast wisdom. Or maybe not, if you think the problem with YouTube in Pakistan was due to DNS rather than BGP.

    • Re: (Score:2, Insightful)

      by bwalling (195998)
      You do recognize that most of the protocols and specifications running the Internet are decades old, right? The fact that they've lasted this long is really rather impressive.

      Besides, if we redesigned it now, it would be insanely complex and bloated, not to mention never fully implemented (CSS? ha!), as there would be too many parties "contributing".
    • by photon317 (208409) on Tuesday October 13, 2009 @12:38PM (#29734165)

      Part of the problem with DNS these days, which your post exemplifies, is that from very early on "BIND's implementation of DNS", and "DNS The Protocol" have been mashed together and confused by the RFC authors (who were involved with the BIND implementation and had motive to encourage the world to think only in BIND terms) and basically everyone who ever used DNS in any capacity. Zonefiles are not implicit in DNS address resolution (neither for authoritative servers or recursive caches). They really aren't any part of the wire DNS protocol for resolving names. They *are* part of a wire protocol for secondary servers that slave zonefiles from primary servers, but even in that case it's really more a "BIND convention" than a necessity. Ultimately how you transfer a zone's records from a master server to a slave server is up to however those two servers and their administrators agree to do so. You can skip the AXFR protocol that uses zonefiles and instead do something else that works for both of you. Inventing a new method of slaving zone data is easy and doesn't involved much complicated rollout. Some people just rsync zonefiles for instance instead of using AXFR today.

      It's really frustrating (believe me, I've done it) when you try to implement a new DNS server daemon from scratch from the RFCs, and you have to wade through this mess of "what's a BIND convention that doesn't matter and what's important to the actual DNS protocol for resolving names on the wire".

      • by rs79 (71822)

        BIND was the spec for DNS for a while. But recently Vixie has washed his hands of that mess by saying "Don't use BIND as a spec".

        Like that helps Paul.

        • Re: (Score:3, Interesting)

          It gets worse. In 2007, Paul Vixie wrote an article in ACM Queue [acm.org] basically praising the vagueness of the DNS protocol specifications:

          From this overview, it is possible to conclude that DNS is a poorly specified protocol, but that would be unfair and untrue. DNS was specified loosely, on purpose. This protocol design is a fine example of what M.A. Padlipsky meant by “descriptive rather than prescriptive” in his 1984 thriller, The Elements of Networking Style (Prentice Hall). Functional interoperability and ease of implementation were the goals of the DNS protocol specification, and from the relative ease with which DNS has grown from its petri dish into a world-devouring monster, it’s clear to me that those goals were met. A stronger document set would have eliminated some of the “gotchas” that DNS implementers face, but the essential and intentional looseness of the specification has to be seen as a strength rather than a weakness.

          • by bertok (226922)

            It gets worse. In 2007, Paul Vixie wrote an article in ACM Queue [acm.org] basically praising the vagueness of the DNS protocol specifications:

            From this overview, it is possible to conclude that DNS is a poorly specified protocol, but that would be unfair and untrue. DNS was specified loosely, on purpose. This protocol design is a fine example of what M.A. Padlipsky meant by “descriptive rather than prescriptive” in his 1984 thriller, The Elements of Networking Style (Prentice Hall). Functional interoperability and ease of implementation were the goals of the DNS protocol specification, and from the relative ease with which DNS has grown from its petri dish into a world-devouring monster, it’s clear to me that those goals were met. A stronger document set would have eliminated some of the “gotchas” that DNS implementers face, but the essential and intentional looseness of the specification has to be seen as a strength rather than a weakness.

            Correlation does not imply causation.

            DNS didn't grow to be huge because it was designed loosely, it happened to grow big because coincidentally the Internet took off and become huge, and the Internet happened to use DNS. It would be a bit of a stretch to say that the Internet become the size it is today because one of the many underpinning protocols and standards was loosely specified.

            The Internet could have used any number of alternate name lookup systems, and it would have grown to its current size just f

    • Re: (Score:3, Interesting)

      by Kynde (324134)

      The file format is so damn brittle, that a single byte can spell disaster.

      You know what, so is ELF. Who said you should write zonefiles by hand let alone without any kind of syntax verification.

      Input syntax is never really an issue. You only ever lack the necessary tools or you are unable to use them properly. It can always be hidden behind a precompiler or whatever necessary.

      Hmmm... wait, termcap. I stand corrected.

  • This is why MaraDNS [maradns.org] (my open-source DNS server) uses a special zone file format.

    MaraDNS uses a zone file format that, for the most part, resembles BIND zone files. However, the zone file format has some minor differences so the common "Forgot to put a dot at the end of a hostname" and the "forgot to update the SOA serial number" problems do not happen; a domain name without a dot at the end in a syntax error in MaraDNS' zone file parser; if you want to end a hostname with the name of the zone in questio

    • Re: (Score:3, Insightful)

      by grumbel (592662)

      Can MaraDNS handle IPv6 now? Last time I used it I had to ditch it in end as IPv6 support was lacking.

  • by 93 Escort Wagon (326346) on Tuesday October 13, 2009 @12:45PM (#29734263)

    Wi nøt trei a høliday in Sweden this yer?

    See the løveli lakes

    The wonderful telephøne system

    And mani interesting furry animals

  • That's how it looked like in Thunderbird's RSS reader.

  • Because Unix admins never test-run their code.

  • Everybody set all your TTLs to 1.

I don't have any use for bodyguards, but I do have a specific use for two highly trained certified public accountants. -- Elvis Presley

Working...