Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Networking Security The Internet Technology

How a Router's Missed Range Check Nearly Crashed the Internet 196

Barlaam writes "A bug by router vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from router vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is — this is just the latest example." Vendor A, in this case, is a Latvian router vendor called MikroTik.
This discussion has been archived. No new comments can be posted.

How a Router's Missed Range Check Nearly Crashed the Internet

Comments Filter:
  • by Anonymous Coward on Sunday February 22, 2009 @02:31AM (#26946791)

    Is this related to the story posted that stated:

    "One Broken Router Takes Out Half the Internet?"

    http://tech.slashdot.org/article.pl?sid=09/02/16/2233207 [slashdot.org]

    It just amazes me how differently presented this story is compared with the previous.

    In fairness, there is much more information about this 'outage' now.

    This news is alarming. Thanks for not making in alarmist this time.

    • by Anthony_Cargile ( 1336739 ) on Sunday February 22, 2009 @02:54AM (#26946897) Homepage

      It just amazes me how differently presented this story is compared with the previous.

      Previous story: kdawson. Current story: Timothy. Do you need any more explanation than that?

      • Mmm. We should get rid of kdawson. (Of course, /.'s board of corporate overlord directors probably likes all the ad revenue that he brings in. :/ )

      • by Anonymous Coward

        ...A Slashdot "Editor" notices these posts and mods them into oblivion.

        But is that better or worse than having them modded down by sycophantic Slashdot readers?

        My Slashdot login - a four-digit userid - is worthless now.

        It's been stuck on Karma:-1, Terrible for a couple of years.

        What did I do to deserve that terrible fate?

        My sin was to post a message critical of dear Michael Sims and his editing methods and practices here on Slashdot.

        • Re: (Score:3, Informative)

          by Wakko Warner ( 324 )

          That happened to my account once when I bitched about an editor too, almost ten years ago now. (Within a week of pretty simple, thought-free karma-whoring comments, I was back posting at +2.)

  • Vendor B (Score:5, Informative)

    by CSFFlame ( 761318 ) on Sunday February 22, 2009 @02:32AM (#26946799)
    Vendor B is Cisco btw. Dunno why they were being vague.
    • Re:Vendor B (Score:5, Insightful)

      by mysidia ( 191772 ) on Sunday February 22, 2009 @02:42AM (#26946841)

      It seems like we live in a world now where media go ridiculously out of their way to soften the blow and protect the parties who screwed up and shipped software that had mistakes in it, by playing PR on their behalf and hiding their name.

      They had a bug; they deserve to be called on that fact, authors should be honest and direct, and always mention them by name. ESPECIALLY in this case, so people who bought their product KNOWM they need to update, even if they didn't notice the fact that they were impacted by the bug (not everyone impacted necessarily knows what caused their problems, a lot of people may still be wide open to the bug but not know about it).

      Seriously, if you develop an implementation of an exterior routing protocol that untrusted devices participate in BY DESIGN...

      How do you justify NOT taking basic steps to validate what happens in your implementation if another party decides to play dirty, and hit you with a ridiculously long or corrupt entry in a field (like AS path) ?

      How does your QA team miss the potential consequences of how such a case can impact your re-advertisements of that long path? And miss testing that the result you send is still valid, or that you at least block it properly.

      It doesn't mean they're totally inept, i'm sure their QA team does a lot of good work. But something fundamental seems to be missing, if these sort of elementary bugs slip through the cracks.

      It may be hard on them PR wise, but the public deserves to know the facts, without the names being changed to protect the guilty.

      • Re:Vendor B (Score:5, Insightful)

        by Shakrai ( 717556 ) on Sunday February 22, 2009 @03:00AM (#26946917) Journal

        It seems like we live in a world now where media go ridiculously out of their way to soften the blow and protect the parties who screwed up and shipped software that had mistakes in it, by playing PR on their behalf and hiding their name.

        Well that may be the case but in this case the criticism doesn't really seem deserved. For better or worse /. generally posts exactly what was written by the person who submitted the article. Blame that person for trying to "soften" the blow.

        • by Lars T. ( 470328 )

          Well that may be the case but in this case the criticism doesn't really seem deserved. For better or worse /. generally posts exactly what was written by the person who submitted the article. Blame that person for trying to "soften" the blow.

          But it was timothy who felt the need to point out who vendor A was, but not vendor B.

      • Re: (Score:2, Interesting)

        by troll8901 ( 1397145 ) *

        They had a bug; they deserve to be called on that fact, authors should be honest and direct, and always mention them by name.

        The writer is probably trying to facilitate discussions, instead of playing the blame game.

        Names trigger emotions in us (right brain). Identifiers triggers logic in us (left brain).

        The writer is probably relying on us to suggest how to get top-level ISPs to implement filtering. It's a human and business issue ... not a technical issue.

      • Re:Vendor B (Score:5, Informative)

        by afidel ( 530433 ) on Sunday February 22, 2009 @04:00AM (#26947099)
        The Cisco bug had been fixed for about forever so anyone running an affected version probably had a million other known bugs as well, just most didn't bring their primary function to a screeching halt. Some of the time admins choose to run with the devil they know rather than finding all the new bugs waiting in new code, this time it bit a bunch of them hard and hence bit their customers. They will now upgrade to newer software or implement a workaround for this bug, if they upgrade their customers will probably have some additional downtime while the new bugs are found and worked around. Unfortunately this is how IT works, it's a complex web of systems built, programmed, and administered by fallible humans.
        • Re:Vendor B (Score:4, Insightful)

          by Anonymous Coward on Sunday February 22, 2009 @09:13AM (#26948033)

          Actually, no. The problem is that you need to pay big bucks to have access to IOS updates, and too many people just buy the router, whatever IOS comes with it, and NEVER want to hear from Cisco's overpriced services ever again.

          Really, critical internet infrastructure needs to be *easy* (as in low cost and not many technical pitfalls) to keep up-to-date, and we need to start doing Very Bad Things to those that don't implement BCP-38 (you're a danger to all your customers and downstream if you don't), egress filtering (good neighborhood requirements), automated up-to-date bogon filtering (or you will cause troubles for everyone that gets a new block of IP space freshly handed to a RIR), and strict BGP filtering...

          Cisco's IOS update policies REALLY have a part of the blame on this.

          • Cisco update policy? (Score:3, Interesting)

            by TheLink ( 130905 )
            Cisco update policy? Isn't that called Juniper or Huawei?

            Cisco used to be the best option (they weren't that great in product terms, but everyone else was worse, and Cisco had good service and support).

            They're getting squeezed from both the top and bottom.
        • by Darkk ( 1296127 )

          You hope Cisco will provide the firmware update for free for those who don't have a current service contract on their routers.

          Cisco charges for everything from stupid cables to firmware updates.

      • Dude, relax. People make mistakes, no harm was done... breathe into a paper bag for a few minutes and come back to us when you've calmed down, ok? You're going to hurt yourself with all this outrage over an almost-trivial software bug.

    • Re:Vendor B (Score:5, Interesting)

      by thsths ( 31372 ) on Sunday February 22, 2009 @04:29AM (#26947171)

      Should be obvious, hm? Because Vendor B is the one really to blame: as far as I can see, one router from Vendor A misbehaved, but thousands or more from Vendor B. Unfortunately, Vendor B is also the one with deep pockets for legal action, so you cannot possible put the blame on them. Oops, hope Ido not get sued.

    • And as I understand it the bug was pre-IOS 12.0-something.

      Looks like the Net needed a good round of forklift upgrades anyway.

  • by Mrs. Grundy ( 680212 ) on Sunday February 22, 2009 @02:33AM (#26946803) Homepage

    I'm sure nobody here would argue with me if I suggested that the internet would be a much safer place without routers.

  • If people had upgraded their routers this wouldn't have happened. Newsflash: software has bugs. Not upgrading your software will bite you in the ass eventually, especially if this software runs critical systems like your routers.
    • Re: (Score:3, Insightful)

      by vux984 ( 928602 )

      Newsflash: software has bugs. Not upgrading your software will bite you in the ass eventually, especially if this software runs critical systems like your routers.

      Newsflash: software has bugs. Upgrading your software will bite you in the ass eventually, especially if this software runs critical systems like your routers.

      See? The statement is true either way... update or don't update. It doesn't matte. One way you'll get bitten by dormant bugs in the old version, the other way will bite you with bugs introdu

      • by Shakrai ( 717556 ) on Sunday February 22, 2009 @02:45AM (#26946853) Journal

        From long experience most people agree... if it isn't broken, don't fix it.

        Reminds me of an old "offensive" fortune quote: Working computer hardware is a lot like an erect penis. It stays up as long as you don't fuck with it.

        If you have no clue what offensive fortunes are try 'fortune -o'. They are great when you are stoned, drunk or just bored at work. If you don't have fortune installed then you are clearly on the wrong website ;)

      • Re: (Score:3, Insightful)

        if it isn't broken, don't fix it.

        That also implies, if it is broken, fix it.

        From long experience, we all get bitten sooner or later. I would say we most often remember the upgrades as being more hazardous, because we blame ourselves for those -- should've known better than to use that new, untrusted code. At least with inaction (not patching), it's negligence, rather than active incompetence -- harder to blame yourself, or for others to blame you.

        But this should not be about escaping blame, it should be about minimizing risk.

        • "But this should not be about escaping blame, it should be about minimizing risk."

          Sadly enough, in too many places not upgrading till obviusly broken *is* a minimizing risk strategy... Minimizing employ risk, I mean.

    • by Skinkie ( 815924 )
      If this kind of software was 'free' because you bought an appliance that actually should work instead of upgraded to a different set of bugs, then you might have a point... I honestly think the firmwares that are deployed lack a critical view of some outsiders, but then again I was raised with the open source spirit, Cisco bought itself into it.
    • Re: (Score:2, Insightful)

      Did you RTFA? The problem was due to a router misconfiguration - a human error - and a worldwide ISP tendency of not reading/filtering garbage from what they pass along. Not bugs, not upgrades.

      • by seifried ( 12921 ) on Sunday February 22, 2009 @03:29AM (#26947029) Homepage

        Speaking of RTFA'ing you should maybe take your own advice:

        As it turns out, the reason for all those routing resets and general instability was due to a previously unknown Cisco bug involving AS paths close to 255 in length. If you try to prepend to a long path that you receive and by doing so, create a path longer than 255, you are toast. So the maps we gave in our our last blog were more of an indication of Cisco market share (at least among prependers), rather than the propensity of outdated routers. Kudos to Ivan for figuring this out.

        • by ThePromenader ( 878501 ) on Sunday February 22, 2009 @04:02AM (#26947101) Homepage Journal

          The Cisco 'bug' is an oversight - with its own configuration system (where the actual AS path is written out, not an algorithm treating the same set earlier in a variable), there can be no problem. Cisco does not take into account possible errors (garbage) created by the configuration of other-type routers, thus the problem. True, this also reveals a laziness on the behalf of network engineers who assume that all routers use the dominant Cisco-ish configuration language - not. So what is needed is a means of filtering errored garbage from all platforms and sources, and this job would be most efficient were it undertaken by ISP's.

          • by pyite ( 140350 )

            The Cisco 'bug' is an oversight

            How is this an "oversight"? There really is a bug. Cisco Bug ID CSCdr54230, to be exact. The bug was fixed in various code versions, but that doesn't change the fact that by Cisco's own admission it is classified as "1 - catastrophic" (in red letters, even).

            Normal measures like blocking routes with an as-path length greater than n (for some reasonable value of n) stop you from passing it on to others, but if you ran an affected IOS, it would still hurt you.

        • by Tony Hoyle ( 11698 ) * <tmh@nodomain.org> on Sunday February 22, 2009 @09:41AM (#26948127) Homepage

          It wasn't 'previously unknown' it was fixed over 3 years ago.

          A router that hasn't been updated in 3 years has problems - including a couple of security holes that have been discovered in the interim.

          • Trouble is, you can't just go and download cisco updates... Even if you own their harware, they make it difficult to download anything... You need a support contract and valid account to download most stuff, and their website is absolutely horrendous to navigate.
            It's pretty stupid, just about every other vendor makes the updates freely downloadable.

            • by ScrewMaster ( 602015 ) on Sunday February 22, 2009 @01:07PM (#26949363)

              Trouble is, you can't just go and download cisco updates... Even if you own their harware, they make it difficult to download anything... You need a support contract and valid account to download most stuff, and their website is absolutely horrendous to navigate. It's pretty stupid, just about every other vendor makes the updates freely downloadable.

              Cisco is where they are because they monetize everything.

    • by davester666 ( 731373 ) on Sunday February 22, 2009 @02:48AM (#26946877) Journal

      I wonder why the summary went out of it's way to use company A & B, then tagged a small Latvian vendor for their range-check bug, but didn't name the much larger vendor that also has a range-check bug, namely Cisco...

    • by Kaboom13 ( 235759 ) <kaboom108@@@bellsouth...net> on Sunday February 22, 2009 @03:46AM (#26947071)

      You have to have a support agreement with Cisco to get the latest IOS. They won't even give you the last version when your support contract ran out. Also, older routers do not always have upgrades available for various reasons, either they do not have enough space or hardware limitations or Cisco End-of-Lifed it and hasn't bothered.

      There's also the "if it isn't broke don't fix it" mentality in the networking world. A new version may fix some bugs but it might add some bugs as well. An upgrade, even if minor, generally means a lot of work testing and reconfiguring before you roll it out. Network engineers are expensive and that time isn't free. Sometimes the devil you know is better then the devil you don't.

      In an ideal world it wouldn't be an issue, but when it comes to networking it's NEVER an ideal world. There's always too much to do and never enough budget/manpower to do it. Every network admin probably has 10 things on his mental wishlist right now, upgrades he would like to make, redundant hardware he would like to purchase, failover contingencies he needs to test, etc. Upgrading IOS on an old router in a rack somewhere (and hoping it doesn't blow up in your face) can be pretty far down the list.

      • by geirnord ( 150896 ) on Sunday February 22, 2009 @05:57AM (#26947399)

        Untrue. Cisco TAC wil give you the latest firmware for free, provided you tell then n\you need it due to security flaws discovered in your current version. Yoy may need to point to their blletin about the bug, but that should be trivial (http://www.cisco.com/en/US/products/products_security_advisories_listing.html)

        Since Cisco almost exclusivly patches current versions due to security bugs, all their IOS are belong to us for free.

        • Re: (Score:3, Insightful)

          by Bert64 ( 520050 )

          Which is a lot more hassle than the update mechanisms offered by pretty much every other vendor.

  • ...so ISP's should filter AS paths!

    • by Shakrai ( 717556 )

      ...so ISP's should filter AS paths!

      I always thought they did. Back in my ISP days we had multihomed connections and all three of our uplink providers filtered what we sent to them. It just seems like common sense. What's the reason for not doing it? Laziness?

      • Re: (Score:2, Interesting)

        by tomstorey ( 1444585 )

        I always thought they did.

        Most already do. The problem was not the ASPATH itself, it was the length of it. The routers affected did not handle updates for a prefix which required more than one AS_SEQUENCE segments in order to obtain the full AS path. The existence of the additional AS_SEQUENCE segment is what triggered the bug, causing the receiving router to treat the update as invalid, and the BGP session is dropped.

  • by gad_zuki! ( 70830 ) on Sunday February 22, 2009 @02:50AM (#26946887)

    except in the kdawson style it was a single link to a message board posting about a router "taking out half the internet." Dupe? Correction? I dont care as long as kdawson is kept away from the site for a while.

    • Re: (Score:2, Interesting)

      by timmarhy ( 659436 )
      "timothy" is actually kdawson's alter ego from which he posts the same crap
      • by Bryan Ischo ( 893 ) on Sunday February 22, 2009 @03:22AM (#26946993) Homepage

        That explains alot.

        I complained to CmdrTaco a year ago or so about kdawson's terrible editing and article judgement. The site would be SOOO much better without him. But CmdrTaco stood up for him, arguing that he does "a pretty good job".

        I lost alot of faith in Slashdot that day. I only continue to read out of habit. But I skip more articles now and I get a chuckle when I see lame stories posted by lame editors with sub-100 comments. I only wish that *no one* would read and comment on the lame stories (I should be taking my own advice here!) so that maybe the Slashdot editor cabal would get the hint.

        • by ion.simon.c ( 1183967 ) on Sunday February 22, 2009 @04:17AM (#26947135)

          You should check out alterslash.org. It's an excellent way to sort through the shitty /. comments and get to some decent threads.

        • Re: (Score:2, Interesting)

          by troll8901 ( 1397145 ) *

          But CmdrTaco stood up for him, arguing that he does "a pretty good job".

          I see the old "should a boss side with his subordinates or customers" argument.

          I only wish that *no one* would read and comment on the lame stories (I should be taking my own advice here!) so that maybe the Slashdot editor cabal would get the hint.

          What's the reason for not filtering out kdawson and timothy in Preferences > Index > Authors? (I'm not saying you're a complainer, I'm just wondering if "not wanting to miss out on the news" is the reason.)

          Of course, I agree that it's important to present a better Slashdot with higher quality news to the casual visitor.

          • Re: (Score:3, Insightful)

            by Bryan Ischo ( 893 )

            As you speculated, it's a "not wanting to miss out on the news" thing. I filtered kdawson for about a day but got paranoid that I was missing some interesting stories.

            kdawson is a terrible editor, and makes poor choices about which articles to post to Slashdot, but of course he sometimes posts good stories too. The problem is that the signal to noise ratio is so low with him. It's irritating to have to scan through so many crappy summaries just to find the few good ones. But I don't want to miss out on

        • This is Slashdot. News for Nerds etc. Most readers should be able to use the filtering.

          In the past, I believe many of us filtered out JonKatz.

          Just because a vocal minority complain about kdawson doesn't mean the rest care that much.
    • by makomk ( 752139 )
      It's called a "follow-up". You see, when there's a news story, often relevant details don't become known until some days later - as in this case. If this happens, obviously the readers would like to be told, which means a second, updated story. (Of course, even in real-world newspapers, this can border on a dupe if done gratuitously. In this case, though, there really is new info.)
  • by twistah ( 194990 ) on Sunday February 22, 2009 @03:16AM (#26946975)

    I don't know about it nearly crashing the Internet. How many people actually noticed a difference that day, for that matter?

    A lot of admins, especially after the alert went out over the NANOG list, set their routers to reject long ASPATHs (or I assume, from what I saw on those list, I am not a BGP admin myself.) Many routers simply rejected these ASPATHs as well; correct me if I'm wrong, but weren't old versions of IOS the only ones affected? It was a serious issue, but I'm not sure if it came anywhere near a disaster scenario.

  • FTA (Score:4, Funny)

    by drDugan ( 219551 ) on Sunday February 22, 2009 @03:22AM (#26946999) Homepage

    "The Internet was back to normal in short order."

    Well, not completely normal [slashdot.org], not yet.

  • by tick-tock-atona ( 1145909 ) on Sunday February 22, 2009 @03:46AM (#26947069)

    Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is - this is just the latest example.

    Yeah. Like how everyone is trusted not to google "google".

  • laf (Score:4, Interesting)

    by maitai ( 46370 ) on Sunday February 22, 2009 @03:50AM (#26947079)

    When I worked for *unnamed nw regional backbone here* we had peering agreements with everyone except uunet that we connected to, and it was pretty known that if we spat out an bad BGP route we could bring down the whole net by hitting enter ('cept uunet, although I'm pretty sure uunet woulda went down from everyone else routing around them to us)

    How is this new? That was the 90's. and when we spent 100k+ on a Cisco 7513 with 64megs of ram so it could hold the BGP tables...

    We even wrote our own manual ('cause none existed) on how to deal with BGP tables so junior admins working for us wouldn't fuq it up. (and on top of that, we wouldn't let them touch the routers either)

    -meetme room in the westin in Seattle-

  • by DeadboltX ( 751907 ) on Sunday February 22, 2009 @04:04AM (#26947109)
    The critical bug is with the Cisco routers; a Mikrotik router merely nearly triggered the bug.
    It would be possible to trigger this bug with any routing software that does not do range checking on the amount of times the ASN is pretended.

    The summary is spreading FUD by making Mikrotik, the only named vendor in the summary, look like the vendor at fault.
    • Re: (Score:2, Informative)

      by Crackez ( 605836 )

      On the other hand, MikroTik devices do suck.

      Ever had the pleasure of dealing with one of these pieces of garbage?

      Not that Cisco doesn't have problems (FWIW, I admin a fair sized Cisco network), but MikroTik routers give me a feeling in my gut that it's just about to break, any minute now... I could build a better router out of a PC and some NICs (and have - love OpenBSD)...

      Disclaimer: my experience with MikroTik is from dealing with a particular Indian Contracting firm that uses them, and they also happen t

  • by Korey Kaczor ( 1345661 ) on Sunday February 22, 2009 @04:10AM (#26947115)
    The next time someone needs you to fix a computer problem and asks what went wrong, simply give them this article's summary as the reason why, replacing "router" and "Internet" with the the defective part in question. You're also guarenteed to look a bit sharper, too.

    "A bug by power supply vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from power supply vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the power supply was one typo away from disaster. Other power supply vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the power supply's trust-based critical infrastructure really is â" this is just the latest example."
  • GPL violators (Score:5, Informative)

    by Anonymous Coward on Sunday February 22, 2009 @04:18AM (#26947139)

    Mikrotik are known GPL violators, that use a modified Linux (they re-branded that as "RouterOS") and a terribly bad implementation of the BGP protocol..

    In some custom community network, where MikroTik has been deployed internally, that stolen-Linux is being hacked to use the Quagga instead of MikroTik's BGP.

    In short: that "RouterOS" has been higly unsuitable for the Internet. I can't believe somebody was so stupid to trust it.

    • Re:GPL violators (Score:4, Informative)

      by transporter_ii ( 986545 ) on Sunday February 22, 2009 @10:15AM (#26948273) Homepage

      I used Mikrotik for quite some time and I'm not sure they are "known GPL violators." I guess it sounds good to kdawson them and all, but they offer the changes made to GPLed software:

      To get a CD with the corresponding source code for the GPL-covered programs in this distribution, wire transfer $45 to MikroTikls SIA, Pernavas 46, Riga, LV-1009, Latvia. Please contact MikroTikls SIA for our current account information and wire transfer instructions. Offer valid until 2010. This CD will only include the source code of the following programs according to the license requirements. This CD will not include MikroTikls proprietary SOFTWARE.

      In reading through their posts on their forums, they claim that there aren't many changes to GPL software, and that they aren't required to release proprietary software code (true). And it seems they do make some attempt to release the code to what little GPL they do change (see above).

      Personally, I think Mikrotik is awesome. But to me, they are a little bit in a TiVo-type of area here.

      Why on earth they didn't just use FreeBSD instead of Linux, I will never understand. Then they could have done whatever they wanted with FreeBSD and not been made to look bad over it.

      transporter_ii

    • In short: that "RouterOS" has been higly unsuitable for the Internet.

      Really, that should be highly unsuitable for what appears to be a high-end backbone use on the Internet.

      Assuming they don't do themselves in with GPL violations, Mikrotik is in a position to blow Cisco out of the water some day.

      We used them for internet use all the time, just internally, where it couldn't take down the whole Internet.

      I can tell you right now they aren't ready for prime time. But you guys better look out when they are.

      Mikrotik's configuration software, winbox.exe, is about as cool as it gets

    • by KZigurs ( 638781 )

      I actually recall downloading their source about a year ago - couldn't find the link on the spot thou, but it certainly is there. Not to mention the fact that they are the ultimate solution if you just want to repurpose an old box at network entry point.

      go figure, it seems.

  • by ShakaUVM ( 157947 ) on Sunday February 22, 2009 @04:36AM (#26947199) Homepage Journal

    Reminds me of a story that Keith Marzullo told our class in a graduate level reliability class. This was back in the days of using UUCP to send email, and the vendor that he worked for had just released a "failsafe" product they were very proud of -- essentially, it was a mail router that could detect if a path went down, and would try an alternate router instead. The company touted it as a bulletproof solution.

    So they go to a conference, and set up some routers, unplug some of them, etc., and everything is going fine until they ask an audience member for his UUCP address. UUCP addresses are in the form of host1!host2!host3!username, with the routing for the username explicitly specified... the addresses could thus get quite long. In this case, the guy's email address was over the buffer limit the company's routers used.

    Guess what happened?

    The mail server tried sending an email to the next router in the chain. The router buffer overflowed and crashed. The reliable server than tried another router... and crashed it. It then went through the entire network, and crashed every single one of the nodes, turning a bug that would have been a single point of failure into a total network collapse.

    =)

    Yeah, one of my favorite stories from UCSD.

    • by DarkOx ( 621550 )

      I have seen bugges in spanning-tree do similar things on my network. This seems to be a recuring problem with "HA systems". Losts of stories like this out there. Its a hard problem to solve though.

      • Re: (Score:3, Insightful)

        by Bert64 ( 520050 )

        Make your backup device be different to the main one... If you use 2 different vendors the chances of a bug affecting both is significantly reduced, It also means that the devices have to actually use standard interoperable protocols to handle the failover.

        • Re: (Score:3, Funny)

          by Darkk ( 1296127 )

          Bad idea. Generally you want to stick to one vendor that you can trust to support your products either be Cisco or some other company.

          This way you'll have identical hardware for redundancy. If a bug is found in the firmware you just have to bug the vendor for a fix or threaten them that you're going to stop buying their products and go with a different vendor.

    • by Darkk ( 1296127 )

      Classic!

      Would have been nice to see the vendor's faces when they saw things were crashing itself down to it's knees and claim their product is "bulletproof".

      Can't put the blame on them as they couldn't have anticipated it would have caused a total network collapse by their own software.

  • by Anonymous Coward on Sunday February 22, 2009 @05:06AM (#26947265)

    Maybe if they updated their IOS back in 2003 when Cisco came out with the fix they wouldn't have these problems. You wouldn't give an XP user a pass on not updating for 6 years and having a problem, don't give these upstreams any.

    -zifr

  • Summary reads like the script for a bad disaster movie.

  • by yotto ( 590067 ) on Sunday February 22, 2009 @07:54AM (#26947755) Homepage

    At that point, the Internet was one typo away from disaster.

    I wonder how long that took?

  • Hmm... (Score:3, Interesting)

    by OneSmartFellow ( 716217 ) on Sunday February 22, 2009 @08:25AM (#26947839)
    A bug by device vendor A (twiddling a framis panel instead of sparting the glinbo interface) patted a bug from device vendor B (elevating ALP packets when deferring some GALAS modifiers with size benath 176), yielding a domino effect that caused widespread universal switching instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose LKM, divisor 965, was less than 1250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is -- this is just the latest example.

    Reads just about the same to me. I can't make any sense of either description of the bug
    • twiddling a framis panel instead of sparting the glinbo interface

      That's very (ahem!) creative. May I have some of whatever it is you have in your pipe?

  • So then you just have to enact secure connections, where everyone personally knows everyone else before you connect.

  • Heh.

    In the last 90s I worked for a large American test equipment manufacturer. We had developed an embedded system for performing parametric testing of telephone lines when not in use (and the test would be rescheduled if the line became required).

    It was great for detecting cables about to fail, that had failed, and could pinpoint where (by TDR) they likely had failed.

    It worked like a charm, except for one little nuisance: downloading new firmware to the thousands of remote units usually failed. It to

  • At that point, the Internet was one typo away from disaster. ... Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is -- this is just the latest example."

    At that point, the internet as a whole remained largely unaffected for the majority of users. Few people appreciate how robust the Internet's trust-based critical infrastructure and its ability to dynamically reroute traffic through the remaining nodes even with the loss of a significant portion of the net really is -- this is just the latest example.

  • to prepend your own ASN multiple times in an outgoing advertisement?

    bgp-prepend (integer: 0..16) - number which indicates how many times to prepend AS_NAME to AS_PATH

    Unless there really is a legitimate reason for it, this seems stupid. The only reason I can think of to put your own ASN more than once would be to artifically increase the AS_PATH size and lower other ASN's preference to route through you. But BGP has lots of other ways to accomplish that same goal.

    Why would MikroTik have this as a requi

The use of money is all the advantage there is to having money. -- B. Franklin

Working...