Lessons Learned From Skype’s Outage 278

Posted by CmdrTaco on Thursday December 30, 2010 @11:17AM from the don't-die-during-christmas dept.

aabelro writes "On December 22th, 1600 GMT, the Skype services started to become unavailable, in the beginning for a small part of the users, then for more and more, until the network was down for about 24 hours. A week later, Lars Rabbe, CIO at Skype, explained what happened in a post-mortem analysis of the outage."

This discussion has been archived. No new comments can be posted.

Lessons Learned From Skype’s Outage

Load All Comments

Search 278 Comments Log In/Create an Account

Comments Filter:

Deployed Soldiers. (Score:5, Insightful)

by puterg33k ( 1920022 ) writes: on Thursday December 30, 2010 @11:18AM (#34710822) Homepage

For us it's nearly our only way to speak to our loved ones at home. I'm just glad it's back up...

Share
twitter facebook
- - - Re: (Score:3)
      
      by Ihmhi ( 1206036 ) writes:
      
      "Doubles" refers to the last two digits in your post number (22 in this case).
      Every post on 4chan is numbered, with each forum having its own individual counter. So while something small like /int/ (International) might have tens of thousands of posts, something more popular like /v/ (Video Games) or /b/ (Random, the sewage drain of the Internet) have millions.
      There are often posts such as "doubles/triples/quads names my dog", or games wherein events are determined by post numbers like a roll of the dice. D
Blogspam (Score:5, Informative)

by ralf1 ( 718128 ) writes: on Thursday December 30, 2010 @11:22AM (#34710876)

Not sure why you didn't link to the actual article on Skype http://blogs.skype.com/en/2010/12/cio_update.html [skype.com] Instead of the blogspam site.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by commodore64_love ( 1445365 ) writes:
  
  Not sure why you didn't link to the actual article on Skype http://blogs.skype.com/en/2010/12/cio_update.html [skype.com] [skype.com] Instead of the blogspam site.
  Here's why: "Your organization's Internet use policy restricts access to this web page.
  "Reason:
  "Internet Telephony is filtered." - So I'm glad slashdot linked to the blog so I'd be able to read what was going on. My workplace is so backwards they still use old-fashioned telephone lines rather than internet phones. Oh and hot water radiators with that classic "thunk thunk thunk" sound when they turn on. Feels like I'm living in the 1930s. ;-)
  - Re:Blogspam (Score:5, Insightful)
    
    by John Hasler ( 414242 ) writes: on Thursday December 30, 2010 @12:33PM (#34711772) Homepage
    
    My workplace is so backwards they still use old-fashioned telephone lines rather than internet phones.
    And consequently you had reliable service while all the "modern, forward thinking" Skype users were down.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by iluvcapra ( 782887 ) writes:
      
      Logic: We need enhanced 911 service and reliable telephony during power outages, therefore block connections to skype.com on port 80.
- Re:Blogspam (Score:5, Funny)
  
  by Monkeedude1212 ( 1560403 ) writes: on Thursday December 30, 2010 @12:06PM (#34711410) Journal
  
  We didn't want to Slashdot Skype and cause any more issues.
  
  Parent Share
  twitter facebook
- - Re:Blogspam (Score:5, Insightful)
    
    by Jurily ( 900488 ) writes: <jurily.gmail@com> on Thursday December 30, 2010 @11:52AM (#34711246)
    
    But how else will aabelro promote his own site on Slashdot?! It's just good business sense.
    And people wonder why we don't RTFA.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by statusbar ( 314703 ) writes:
      
      ... Because there IS no "A".
      --jeffk++
December 22th? (Score:5, Funny)

by colinRTM ( 1333069 ) writes: on Thursday December 30, 2010 @11:23AM (#34710886)

Seriously?

Share
twitter facebook
you are kidding me (Score:5, Interesting)

by alphatel ( 1450715 ) * writes: on Thursday December 30, 2010 @11:25AM (#34710908)

If you are a node-based company worth several billion, charge for services, and don't even run enough of your own supernodes and monitor them in such a way that they cannot handle an outage effectively, you need serious help.

Share
twitter facebook
- Re:you are kidding me (Score:5, Insightful)
  
  by TubeSteak ( 669689 ) writes: on Thursday December 30, 2010 @11:56AM (#34711288) Journal
  
  If you are a node-based company worth several billion, charge for services, and don't even run enough of your own supernodes and monitor them in such a way that they cannot handle an outage effectively, you need serious help.
  No one expects 40% of a globally distributed network to crash at once. No one.
  FTFA:
  The initial crashes happened just before our usual daily peak-hour (1000 PST/1800 GMT), and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day.
  Not even a multi-billion dollar company would have a disaster plan that provisions 100x capacity as a hot/cold spare.
  Though I bet their new plan includes automatic spawning of nodes on EC2 or some other distributed CDN.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by localman57 ( 1340533 ) writes:
    
    I agree. But it wasn't an initial 100x surge, right? It was a cascading failure where eventually supernodes were up 100% because there were fewer and fewer of them. It's a matter of prevention, not cure.
  - Back up... (Score:2)
    
    by msauve ( 701917 ) writes:
    
    a client (or even many) crashing shouldn't cause the server to, too. That's just bad design/software.
    
    Skype seems clueless. They're thinking of using "processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software. We believe these measures will reduce the possibility of this type of failure occurring again." Contrariwise - this would only make the matter worse. What if the _current_ version were the one with the problem, and an automated
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
    - Re: (Score:2)
      
      by pagedout ( 1144309 ) writes:
      
      Ah, but its a brave new world where the client/server relationship is becoming fuzzier all the time. The part I think you are missing is that if you read the actual post it is obvious that everything that was crashing was applications on clients computers. It appears that some clients are promoted to server status to handle routing requests.
      
      As for bad design/software I would instead say they had features without consideration of consequences. Here are where their problems are from what I can see.
      
      1. No
  - Re: (Score:3)
    
    by TubeSteak ( 669689 ) writes:
    
    No one expects 40% of a globally distributed network to crash at once. No one.
    Oops. I made a mistake.
    It's 40% of 50%. So actually ~20% of global users crashed.
    The problem was that those ~20% of global users represented 25%~30% of active supernodes.
    Either way, losing 20%~30% or 40% of a globally distributed network is still the kind of stuff that only the RAND corporation and the Pentagon make plans for.
    If Skype hadn't included circuit breakers (so that the client would go easy on your bandwidth and CPU), their network might have stayed up.
  - Re: (Score:2)
    
    by ToasterMonkey ( 467067 ) writes:
    
    If you are a node-based company worth several billion, charge for services, and don't even run enough of your own supernodes and monitor them in such a way that they cannot handle an outage effectively, you need serious help.
    No one expects 40% of a globally distributed network to crash at once. No one.
    FTFA:
    The initial crashes happened just before our usual daily peak-hour (1000 PST/1800 GMT), and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day.
    Not even a multi-billion dollar company would have a disaster plan that provisions 100x capacity as a hot/cold spare.
    Though I bet their new plan includes automatic spawning of nodes on EC2 or some other distributed CDN.
    It was their own widely deployed buggy software that caused the big chunk to go offline. Any other organization with a big deploy everywhere button would understand the importance of an equally big roll back button, and heavy testing before doing either. I guess because Skype's clients are also their servers so they have no control is an excuse? Is it a good one?
- Re: (Score:2)
  
  by blackraven14250 ( 902843 ) writes:
  
  The last time I checked, the only service they charge for is IP-based to a standard phone connection, not any PC-to-PC stuff.
- - Re:you are kidding me (Score:4, Interesting)
    
    by marcosdumay ( 620877 ) writes: <marcosdumay@g[ ]l.com ['mai' in gap]> on Thursday December 30, 2010 @12:30PM (#34711736) Homepage Journal
    
    "China isn't deluded about itself like America"
    
    I'll belive that when I hear a chinese (one that isn't out of country for decades) saying that China will rule the world for any reason but because they are a superior race or culture. China is quite deluded, even more so than the US. Half the world (ocident) is helping them getting even more deluded, and the other half (orient) is too afraid to help them cut any kind of delusion.
    That doesn't mean, of course, that China isn't becoming a superpower. They may be, or may not, I don't know the future. Military, they already are...
    
    Parent Share
    twitter facebook
lesson (hopefully) learned... (Score:5, Insightful)

by smash ( 1351 ) writes: on Thursday December 30, 2010 @11:27AM (#34710948) Homepage Journal

... relying on dodgy peer to peer VOIP telephony for business purposes is retarded.
we've got people bitching at work about how it doesn't work from time to time, and why I've blocked its ability to do voice/video at the firewall. If you want VOIP, use something that uses standard SIP or some other documented, configurable traffic.

Share
twitter facebook
- Re:lesson (hopefully) learned... (Score:5, Interesting)
  
  by commodore64_love ( 1445365 ) writes: on Thursday December 30, 2010 @11:44AM (#34711144) Journal
  
  Ahh so YOU'RE the one blocking my skype. ;-)
  I don't understand why Net Admins (such as yourself) block useful tools like Skype. Or streaming radio. I don't see any harm in letting those things into the office space, and it provides a more pleasant working environment (to distract from the boredom of sitting at a desk all day).
  
  Parent Share
  twitter facebook
  - Re:lesson (hopefully) learned... (Score:5, Informative)
    
    by smash ( 1351 ) writes: on Thursday December 30, 2010 @11:55AM (#34711276) Homepage Journal
    
    Why do I block skype? Because the only way to have it work properly through most firewalls is to allow ALL outgoing ports. Which means you allow any random program to do any random shit through your firewall to the outside network. Its a massive, massive security issue you could drive an oil tanker through.
    Also, many companies pay for bandwidth. I don't want all of my bandwidth chewed up on video calls instead of mission critical apps.
    Its not just because we're nazis, its because skype protocol is completely fucked when it comes to the ability of your admin to control resources. Want voip/video? Use something else.
    
    Parent Share
    twitter facebook
    - Re:lesson (hopefully) learned... (Score:5, Insightful)
      
      by smash ( 1351 ) writes: on Thursday December 30, 2010 @11:59AM (#34711318) Homepage Journal
      
      Just let me clarify: corporate networks are different to your home network. your home network? fine, use skype. in the office, where you've got several hundred PCs that may/may not have malicious software and/or users at the helm - allowing all outgoing connections is just begging for trouble.
      Egress filtering is a good thing.
      Making your day at work "less boring" by enabling you to do non-work related shit with company resources is not what my job is about. It is about ensuring the continued operation of the company's network - and skype is a liability.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by BobMcD ( 601576 ) writes:
        
        Making your day at work "less boring" by enabling you to do non-work related shit with company resources is not what my job is about. It is about ensuring the continued operation of the company's network - and skype is a liability.
        Careful there, BOFH. Here I'll help:
        Making your day at work "less boring" by enabling you to do non-work related shit with company resources is none of my business. Get it requested through the proper channels and you can have it. I don't make the business decisions here, I just do what the company needs done to be successful.
        
        Re: (Score:2)
        
        by ImprovOmega ( 744717 ) writes:
        
        Look, I'm all for business driven IT, but sometimes you have to save your managers from themselves. It's not being a BOFH to look out for the corporate network. You were hired to have the expertise to make recommendations and keep things as secure as possible. If it gets shoved through anyway then it may be time to start looking for someplace that actually values your skills.
        
        Re: (Score:3)
        
        by BobMcD ( 601576 ) writes:
        
        Good luck with that. Welcome to 2010's economy.
        Meanwhile, CYA and collect your paycheck. Let those with the MBA's make the calls and take the heat, and NEVER bicker with the end user. You're not paid enough to deal with their crap.
        
        Re: (Score:2)
        
        by smash ( 1351 ) writes:
        
        It's still not going to be allowed through. They want skype, they can have a 3g service for their laptop and run skype through that.
        I've explained to management the security problems with skype when it was originally requested and have support to block it.
        
        Re: (Score:2)
        
        by BobMcD ( 601576 ) writes:
        
        Then you're either enjoying bickering with the end users or this is an imaginary scenario...
        
        Re: (Score:2)
        
        by smash ( 1351 ) writes:
        
        No, they just figure out skype doesn't work, come see me, i tell them it is not supported and to pick up the telephone.
        
        Re: (Score:2)
        
        by BobMcD ( 601576 ) writes:
        
        And, as mentioned elsewhere in this thread, there are many people / governments with the ability to decrypt it. So both for licensing reasons and for security reasons it had to be prohibited.
        Because these people/governments lack the ability to intercept your copper/GSM/other types of calls??
    - Re: (Score:2)
      
      by don_carnage ( 145494 ) writes:
      
      Deep packet is the only way to block Skype (or so I've heard.) The real danger lies not in the voice/videoconferencing but in the potential for tunneling and/or circumvention of data loss prevention controls.
    - Not true. (Score:3)
      
      by nuckfuts ( 690967 ) writes:
      
      Why do I block skype? Because the only way to have it work properly through most firewalls is to allow ALL outgoing ports.
      Skype lists three other firewall configurations [skype.com] that work, including two that only require egress on a single port that's almost always open anyway.
      Its a massive, massive security issue you could drive an oil tanker through.
      Oh, come on. Sure, egress filtering is a polite thing to do, but it's inbound connections that put you at risk. And chances are, if you do fall victim to some nefarious piece of malware that's making unwanted outbound connections, simple packet filtering will be useless anyway because it will fall back to TCP 80, or TCP 443, or even UDP 53, to tunnel out. Just l
    - - Re: (Score:2)
        
        by Duradin ( 1261418 ) writes:
        
        Have you listened to music at 16kbps? 96k is about as low as I'll go. Somethings aren't tolerable under 256k or 320k. Low bitrates are fine for talk but not music.
      - Re:lesson (hopefully) learned... (Score:5, Informative)
        
        by smash ( 1351 ) writes: on Thursday December 30, 2010 @12:31PM (#34711754) Homepage Journal
        
        Because skype wasn't written that way. You want standard voice/video, use a SIP program. Skype was written deliberately by the developers to allow it to talk to anywhere and everywhere through your network so it can route other people's calls, and connect to random other nodes for your own call routing. That free lunch you're eating? Paid for by other's use of your bandwidth.
        Multiply 500 users by 48kbit. thats 24 megabit in streaming audio. That you can get off that fucking $10 FM radio on your desk. Now i'm not sure how expensive bandwidth is where you are, but a 24 business grade meg METERED (say, 300 gigs) internet connection here is about 5-10 grand a month. The business is not going to wear the cost of 5-10k per month for our users to listen to shitty quality streaming MP3. Thats before you take into account the increase latency to mission critical apps, or remote end points on crappy satellite connections paying anywhere up to $7 per MEG of data
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by bdenton42 ( 1313735 ) writes:
        
        My impression is that it is just the directory and signalling information which runs through these supernodes, not voice traffic, so the load shouldn't be too high.
        Your point on streaming audio is correct... it's even worse when people sit there with streaming video (CNN, ESPN) going.
        
        Re: (Score:2)
        
        by acid06 ( 917409 ) writes:
        
        Any decent company I've ever worked with would have separate internet links for the "mission-critical" stuff and the regular internet traffic. They would have a dedicated link to the servers but users would have access to the internet through regular consumer broadband. Works great, you get the best of both worlds. Maybe you should leave your BOFH nest and consider this option and try to become less hated by your users (I know I would hate you).
        
        Re: (Score:2)
        
        by Cwix ( 1671282 ) writes:
        
        The system admins job isnt to be loved by his users.
      - Re: (Score:2)
        
        by Belial6 ( 794905 ) writes:
        
        While I think that comparing banning streaming music to record burnings is a bit over the top, you do make a good point about bandwidth. The cost of the bandwidth for audio streaming is trivial on a per user basis. Decent companies spend dramatically more than that to try to make work a pleasant place to be. Even crappy places to work often spend more than that. The claim "It's company equipment, so you should be using if for personal things." is basically a company statement that working for them shoul
  - Re: (Score:2)
    
    by Duradin ( 1261418 ) writes:
    
    Back in the day I worked at a place that banned streaming audio because one day there wasn't enough bandwidth for the actual business applications to go about their business when everyone was listening to their streamed music.
    Skype can eat a lot of bandwidth.
    - Re: (Score:2)
      
      by QuantumBeep ( 748940 ) writes:
      
      In places where DSL or cable internet is cheap, it seems basic common sense to have a "toy" internet connection with a wireless router. That's like $25 a month per 100 users (that's what we have where I work).
      Note that I'm not suggesting 100 people could actually use it at the same time, but out of 100 people actually working, maybe 100 use any real bandwidth at once.
      - Re: (Score:2)
        
        by Duradin ( 1261418 ) writes:
        
        Note the "back in the day". Past tense, what is doesn't affect what was.
  - Re: (Score:2)
    
    by noc007 ( 633443 ) writes:
    
    Within the network I manage, it boils down to bandwidth, security, and slacking off.
    We have two large offices and a few small offices. All of the internet traffic is routed through the WAN to the main office that has a 10Mb link which is shared with our internet facing servers. The other large office acts only as a backup and has a 5Mb internet connection. The WAN links are 3Mb with the exception of the main office having a 6Mb one. Regular business WAN traffic is a steady 1Mb across the board with the usua
    - Re: (Score:2)
      
      by smash ( 1351 ) writes:
      
      Exactly as above. People get DSL at home and think they have the equivalent at work (for each and every employee). It simply doesn't work that way.
  - Re: (Score:2)
    
    by Ephemeriis ( 315124 ) writes:
    
    I don't understand why Net Admins (such as yourself) block useful tools like Skype. Or streaming radio.
    Well, we don't block Skype here... Though we do block streaming radio. I can give you a couple good reasons for both.
    1) Bandwidth. A service like Skype or streaming some radio station may not actually take all that much bandwidth itself... But if you've got 10 or 100 or 1,000 folks using it simultaneously the bandwidth requirements get quite steep. And it's un-necessary bandwidth. You could pick up your phone and not hit the Internet, you could turn on a regular radio and not hit the Internet. Busin
How are supernodes defined? (Score:2)

by fantomas ( 94850 ) writes:

Sorry if this is off topic or an ignorant question, but how does Skype define supernodes? Does the company just randomly choose users who are online a lot and declare them supernodes without the owner's knowledge, or is there some other process?
cheers
- Re: (Score:2)
  
  by circletimessquare ( 444983 ) writes:
  
  "Does the company just randomly choose users who are online a lot and declare them supernodes without the owner's knowledge"
  yes, that's exactly what they do. and yes, that's retarded for a company like skype
  - Re: (Score:2)
    
    by smash ( 1351 ) writes:
    
    Well not its not really retarded for skype. its retarded for skype users to actually agree to those terms of service.
    - Re: (Score:3)
      
      by circletimessquare ( 444983 ) writes:
      
      that's right, because everyone who wants to use VOIP should review the source code and familiarize themselves with the relevant RFC specs
      classic "if you aren't a computer scientist you shouldn't use the internet" ignorant geek snobbery. how's that standard of behavior working for you?
      - Re: (Score:2)
        
        by smash ( 1351 ) writes:
        
        I was merely suggesting that its just fine and dandy as far as SKYPE the company goes to rip people's bandwidth off. If you cbf reading the license and just click OK for the free shit then you deserve whatever raping you get. Nothing is free.
        
        Re: (Score:2)
        
        by PReDiToR ( 687141 ) writes:
        
        The information is there for you to read, should you care to look it up.
        
        This article will be "google-able" within a week by 100% of the internet.
        
        People who have read about this outage will be more informed, should anyone care to ask around "about this Skype thing I've heard about" and information from geeks is immensely free-flowing (sometimes you can't shut us up).
        
        Don't go shouting and swearing at the GP who is trying to point out that lusers - don't fucking care - about the stuff we try and tell th
        
        Re: (Score:2)
        
        by circletimessquare ( 444983 ) writes:
        
        the lusers as you call them are whom the internet is for. the point is to make the internet to their standards: none and few, rather than making it to your standards: computer science majors only. the internet is not an exclusive club for the technically sophisticated
        think of it as an engineering exercise in robustness and hardiness and elasticity in the face of abuse. because your current inferior attitude that some sort of technical proficiency is required to use the internet is a standard that will simpl
Obvious problem.... (Score:5, Interesting)

by dstar ( 34869 ) writes: on Thursday December 30, 2010 @11:28AM (#34710958)

Hmm. Seems to me their biggest problem is that they allowed clients with a known bug to become supernodes; if 50% of the network had upgraded, they should only have been creating supernodes from the upgraded clients.
And in hindsight (I don't know that they should be blamed for not considering this before), the number of supernodes should probably be ~100-150% more than needed to service expected load. That way, if a third of them die, they _still_ have more than needed to handle the expected load. (And thus, hopefully, more than needed to handle the excessive load without causing them to shut down).

Share
twitter facebook
- Re: (Score:2)
  
  by BobMcD ( 601576 ) writes:
  
  Hmm. Seems to me their biggest problem is that they allowed clients with a known bug to become supernodes; if 50% of the network had upgraded, they should only have been creating supernodes from the upgraded clients.
  If they had the power to stop bugged clients from becoming supernodes, why not just use that same power to make them patch? You're sort of assuming that they ever imagined that this could have happened. It's pretty clear that they didn't...
  It's subtle, but it's there at the bottom where they admit 'we need to test our crap first and we need some way of making people patch' - which is kind of a known thing in the modern software world.
- Re: (Score:2)
  
  by roman_mir ( 125474 ) writes:
  
  Seems to me their biggest problem is that they allowed clients with a known bug to become supernodes
  OK, FTFA
  Approximately 40% of all Skype users that were online crashed, taking down around 30% of all supernodes.
  - so supposedly this means that 30% of the supernodes went offline due to the bug, is this correct?
  But look at the number: 40% of ALL Skype users went offline! That's insane, that's almost half. At the same time ONLY 30% of the supernodes went offline due to this bug, right?
  Something does not add up.
  FTFA:
  Clients that continued to be up and running, and clients that restarted the application had their network searches directed to the supernodes still running, leading to an overload of those. Since Skype has in place a protection when a supernode is overloaded, so it would not consume too much of a client’s system’s resources, the supernodes started to shutdown automatically one after another, leading to a generalized failure of the network.
  - so the sequence of events is supposedly this:
  1. Bug causes 40% of all Skype clients to stop functioning, this includes 30% of all supernodes.
  2. The remaining 60% of all Skype clients relied o
- Re: (Score:2)
  
  by sco08y ( 615665 ) writes:
  
  Hmm. Seems to me their biggest problem is that they allowed clients with a known bug to become supernodes
  Isn't the biggest problem the monolithic app design?
  Look at this bug: it's due to counting the number of voicemail messages. *Why* did that take out the node completely?
  This makes a pretty good argument for modularizing a GUI into discrete tools. Not only does it protect me from bugs in one tool, but I also don't have to run stuff I'm not interested in.
I don't understand this. (Score:4, Interesting)

by commodore64_love ( 1445365 ) writes: on Thursday December 30, 2010 @11:29AM (#34710964) Journal

"At its core, Skype relies on a third generation P2P network that has lots of peer nodes and a number of supernodes, one for several hundreds of nodes. Since Skype does not have a centralized directory to support finding routes between two or more nodes that want to communicate, the virtual network uses supernodes as directories. When a client enters Skype, it registers itself with a supernode, giving its IP address so it can be found by other clients who might want to establish a communication."
Skype is a peer-to-peer network? Like torrent? So the supernode is like a tracker website, to connect peers to one another? No supernode==no tracker==no calls going through. Hmmmm. Maybe they should try DHT.

Share
twitter facebook
TL;DR version: (Score:5, Interesting)

by The MAZZTer ( 911996 ) writes: <(moc.liamg) (ta) (tzzagem)> on Thursday December 30, 2010 @11:30AM (#34710974) Homepage

Lots of users were using an old outdated buggy version of Skype, lots of client crashes at once bringing down big chunks of the P2P network, remaining network couldn't handle the load and went down too, took a while for Skype to put it's own supernodes up to help get the network self-sustaining again.
They're considering an auto-update feature now since such a feature could have kept this from happening. Personally I think old versions should be blocked from making or receiving calls too, so users would be encouraged to update (works for Team Fortress 2). Of course auto updates would make updating super easy anyway so impact from that would be minimal.

Share
twitter facebook
- Re: (Score:2)
  
  by spxero ( 782496 ) writes:
  
  The problem with the auto-update feature in Skype vs. gaming is that most gaming computers will be close to top-of-the-line. Most computers used for Skyping will not be top of the line.
  From experience, the 5.0 version of Skype doesn't work as well as the 3.8 branch. Switching between windowed and full-screen video on the 5.0 branch takes ~4 sec to accomplish, with the audio becoming choppy at the same time. In addition, the video is choppy and audio quality is scratchy at best. The 3.8 branch doesn't have t
- - Re: (Score:2)
    
    by localman57 ( 1340533 ) writes:
    
    That's why I don't install new versions until they've been around for awhile
    Isn't that part of what caused this? :-)
Never makes sense to upgrade working software... (Score:5, Interesting)

by syousef ( 465911 ) writes: on Thursday December 30, 2010 @11:36AM (#34711038) Journal

...unless you need something in the newer version (feature, security update etc.). Of course us geeks like to have the latest to fiddle with, but for the average Joe end-user, if it ain't broke, don't fix it. There is always the risk that the newer software will contain new bugs. At one point the buggy version of the Skype software was the latest version and was what users were being pushed to upgrade to. If the crash had happened then, I wonder if they'd find a new way to scapegoat users.
By the way new versions breaking existing functionality isn't theoretical, or rare. I'm currently installing software on my new laptop. I've had to downgrade both Zonealarm and Virtualbox. The former broke remote desktop. The later broke file sharing. No idea why, but in each case uninstalling and installing an older version I knew worked fixed the issue for me.

Share
twitter facebook
- Re: (Score:2)
  
  by Enderandrew ( 866215 ) writes:
  
  The problem is that it is broke, you just often don't realize it. Older doesn't mean more secure or more stable inherently. New versions fix bugs discovered in old versions. If everyone did update immediately, then everyone would have had the bug fix and this outage wouldn't have happened.
  - Re: (Score:2)
    
    by BobMcD ( 601576 ) writes:
    
    You're suffering from sample bias. Newer software is also 'broke' and you also don't know that. I think the point would be, if it is 'broke' but not impacting you in a way that you'd know it, do you care? In some cases yes, in other cases no.
    - Re: (Score:2)
      
      by Enderandrew ( 866215 ) writes:
      
      It is equally possible that newer software introduces bugs as much as fixes them. But the assumption that older is always more secure and stable is flawed.
      In reality, the best solution is to review changelogs and make informed decisions when upgrading. But avoiding all upgrades isn't the solution.
- Re: (Score:2)
  
  by eulernet ( 1132389 ) writes:
  
  ..unless you need something in the newer version (feature, security update etc.).
  
  And also especially when the update is a 20 megabytes file. In fact, we need to reinstall the whole software every time.
  Why such a lame updating system ?
  - Re: (Score:2)
    
    by QuantumBeep ( 748940 ) writes:
    
    The going answer is "why waste time and effort making updates smaller?"
- - Re: (Score:2)
    
    by John Hasler ( 414242 ) writes:
    
    > And that's exactly why this happened.
    It happened because their system is vulnerable to cascading failure. They've managed to combine the disadvantages of a centralized system with those of a decentralized one.
Supernode Software (Score:5, Interesting)

by varmittang ( 849469 ) writes: on Thursday December 30, 2010 @11:37AM (#34711060)

How about they release some supernode only software that people can setup on a server and possibly the ability to setup Skype to use a preferred supernode. So a businesses can setup a supernode of their own and point their users too it. But also that supernode is part of the collective of supernodes and routes Skype connections for everyone else too. This would hopefully give Skype more supernodes out there that are 24/7 and not desktop computers routing the traffic.

Share
twitter facebook
client crashes should not - server crashes (Score:2)

by RichMan ( 8097 ) writes:

If problems with the client can lead to problems with the server then the server system lacks robustness. For applications like this the servers should be practically immune to any client state much ups.
Seems to me skype needs to work on their server side state machines.
- Re: (Score:2)
  
  by smash ( 1351 ) writes:
  
  You missed the point. With skype, the clients ARE the servers ("randomly" (i.e., non-nat well connected) selected supernodes).
- Re: (Score:2)
  
  by nedlohs ( 1335013 ) writes:
  
  Do you know what peer to peer means?
  here's a hint: there are no servers, they just use the bandwidth and cpu of random clients to do that work.
- Re: (Score:2)
  
  by QuantumBeep ( 748940 ) writes:
  
  There's an exception to the client-server divide, and this is a classic example: if your mistake causes a big chunk of your client base to DoS your infrastructure, it's going to go down, no matter how good your infrastructure is.
Article Summary [sarcastic] (Score:5, Funny)

by Ukab the Great ( 87152 ) writes: on Thursday December 30, 2010 @11:44AM (#34711148)

"We expected a Limewire topology to be as reliable as a Phone companyi topology and oddly enough that bit us in the ass."

Share
twitter facebook
- Re: (Score:2)
  
  by Lloyd_Bryant ( 73136 ) writes:
  
  "We expected a Limewire topology to be as reliable as a Phone companyi topology and oddly enough that bit us in the ass."
  Yeah - I mean, with a phone company topology, it'd be impossible for, say, 50% of AT&T's long distance network to be shut down by a software bug [everything2.com], wouldn't it?
Skype Win 5.0 client sucks (Score:5, Interesting)

by scorp1us ( 235526 ) writes: on Thursday December 30, 2010 @11:59AM (#34711328) Journal

The QA of this release is way down. On top of that, skype auto-updated people from 4.0 to 5.0. Within a few days, the buggy 5.0 had enough penetration (50%) to bring them down.
The windows client has widely been reported to:
consume 2x as much CPU (33% to 60% on mine after upgrade)
leak RAM (starts out ok but after some use over 1.5gig needed)
the GUI is slow, so the fade effects on some computers (mine) causes video tearing. It is no longer possible to run full-screen. (320x240 is all I get before tearing sets in)
The fonts in the video area don't render correctly.
It should be noted that I have a AMD X2 1.6 and Radeon 1200 card in this computer. Its not shabby. But the 5.0 client brought it to its knees.
It plays SCII just fine (albeit on the lowest setting).
It comes at a bad time when they are trying for more corporate agreements, but can't run on my 3-year-old hardware.
I uninstalled 5.0 and installed 4.0 and its back to normal.

Share
twitter facebook
- Re: (Score:3)
  
  by smash ( 1351 ) writes:
  
  Maybe you're a supernode? :)
Public Post-Mortem (Score:5, Insightful)

by Enderandrew ( 866215 ) writes: <<moc.liamg> <ta> <werdnaredne>> on Thursday December 30, 2010 @12:07PM (#34711424) Homepage Journal

You can bitch they didn't QA the release. You can bitch that you don't like a P2P topology. But it is nice to see a public post-mortem.

Share
twitter facebook
Missed opportunity for open source (Score:2)

by DCFusor ( 1763438 ) writes:

Back when I was doing one of the first VOIP solutions (this one mostly for LAN use) we dreamed up something like Skype, that would work in similar fashion. The big advantage is that it could be done by any reasonably large group of users and no phone company at all need be involved -- no charge to anyone, no control over anyone by some big monolithic corp. It could still be done, and I wonder why no one in the open source area has managed? Critical mass issue; selling the first phone is a bear -- who you
Forced auto updates are not the solution. (Score:5, Interesting)

by mario_grgic ( 515333 ) writes: on Thursday December 30, 2010 @12:15PM (#34711542)

I hate when apps run auto update daemons. This precisely the reason why I don't use any Google desktop software on my computers.

Proper thing to do in this case is simply disallow users to log in with a message they need to upgrade their client if they want to continue to use the app. Simple thing to do, rather than each app running a daemon. Soon enough there will be hundred update daemons on each user's computer, eating resources, connecting online all the time and bogging down the user experience. Thanks but no thanks. I refuse to use any of those.

Share
twitter facebook
Sounds similar to the AT&T crash (Score:2)

by bdenton42 ( 1313735 ) writes:

About 20 years ago now... sent out code with a bug in the fault recovery code, then a problem in one node cascaded throughout the network. http://www.phworld.org/history/attcrash.htm [phworld.org]
Supernodes shut down when overloaded? (Score:2)

by GeckoAddict ( 1154537 ) writes:

"We believe that increased load in supernode traffic led to some of these parameters exceeding normal limits, and as a result, more supernodes started to shut down"

Maybe I'm missing something, but why are supernodes coded to shut down during increased load instead of simply throttling requests? It seems like the idea of 'too many requests, shut down' is what caused the cascade. Can someone enlighten me as to why this is the preferred overload handling mechanism?
- Re: (Score:2)
  
  by gbjbaanb ( 229885 ) writes:
  
  its called cheap, crappy developers.
  Assume your socket connections will always work, and don't bother handling errors, throttling or connection requests, its the cheapest, easiest way after all. Its probably not even "too many requests, shut down" but "too many requests, crash". Once there - ship and let your users be damned.
  Only in this case, the company found out why you should hire the best devs you can and not the cheapest. If your business is software, you need to treat it like an asset, not a cost.
- Re: (Score:2)
  
  by flyingfsck ( 986395 ) writes:
  
  They are using Windows clients. "c:\> nice skypesupernode" ain't gonna do it.
Autoupdates (Score:3)

by ThePhilips ( 752041 ) writes: on Thursday December 30, 2010 @12:46PM (#34711928) Homepage Journal

One important lesson to be learned is this: many users do not update their software if they don’t have to. Skype had a newer version in place, without the triggering bug, but most users had the buggy one.

Yeah. Right. Because all recent Skype updates (staring with version 3(?)) were known to contain mostly only one of this: more ads or more UI bloat. And occasional breakages.
So why they expect that users would be updating it regularly?

Share
twitter facebook
Compromise (Score:2)

by jklovanc ( 1603149 ) writes:

There is an option between "auto-update" and "update when you want"; depricated versions. If a version has a known major bug in it that could compromise the system require updates only those versions. That way only the bad version will be replaced and we won't be updating everyone at every release. The main advantage is that the system is kept safe without unnecessary updates.
Short answer... (Score:2)

by Junta ( 36770 ) writes:

NAT is evil. Skype needs to build an overly complex networking protocol because too many people are behind NAT gateways. Skype *could* probably get away with their basic available hardware if only they got to design for a NAT free world.
One could also say they were trying to cheap out and not invest as much hosting required to assure reliability of their chosen networking architecture.
Of course, on the flip side, Skype as a service would be nearly useless in a NAT-free world. No need for a coordinating e
Where is the built-in redundancy (Score:2)

by Sara Chan ( 138144 ) writes:

Quote from TFA:
Approximately 40% of all Skype users that were online crashed, taking down around 30% of all supernodes. Clients that continued to be up and running, and clients that restarted the application had their network searches directed to the supernodes still running, leading to an overload of those. Since Skype has in place a protection when a supernode is overloaded, so it would not consume too much of a client’s system’s resources, the supernodes started to shutdown automatically one
The importance of the story (Score:2)

by mhollis ( 727905 ) writes:

Here is what really happened.
A non-telephone company had a cascading problem with its ad-hoc peer-to-peer networking that provides telephony and video services at costs way below any telephone (or cable) company. The company is profitable enough to make its own way in this world.
This story was broadcast pretty-much worldwide by all media.
The non-telephone company was embarrased and released a statement to the media about how this happened as a means by which it might encourage everyone to download new, fr
- Re: (Score:2)
  
  by ThatMegathronDude ( 1189203 ) writes:
  
  Where else are you going to find a free, distributed, encrypted by default text/voice/video chat service?
  - Re: (Score:2)
    
    by chipperdog ( 169552 ) writes:
    
    A bunch of us should put up Asterisk servers and polish up some open source SIP clients (SIP can support video and text also)
- - Re: (Score:3)
    
    by rjstanford ( 69735 ) writes:
    
    Google video chat, perhaps? Or maybe acknowledge that its fairly impossible to provide both 100% uptime and free video chat at the same time, without the resources of a major player behind you to promote goodwill?
    Seriously, they were down for some percentage of the people for 1% of one year, during which time many competitive products were available. This is not an earth-shattering catastrophe.
    - Re: (Score:2)
      
      by tenex ( 766192 ) writes:
      
      I think we're talking about better up-time than that for Skype. If we believe the outage numbers presented on their Wikipedia page http://en.wikipedia.org/wiki/Skype [wikipedia.org], they've had a total of 72 hours down time since the initial release in 2003--and assuming a 100% outage in all cases (which was not the case here)--their up-time minutes work out to something like:
      99.9988%
      Seven years and 72 hours of total down-tine... It might not be five nines, but does seem a pretty respec
      - Re:Lessons Learned From Skype's Outage (Score:2)
        
        by TaoPhoenix ( 980487 ) writes:
        
        (Satire)
        Sorry, no. In Today's Post 911 World, rational decision making can never be the same again. We have to Respond to an Event like this. Remember the Day That Skype Was Down forever!
        In other censorship news, all discussions of Averages and Means have been blocked, because 7 years of past performance will never matter again.
        (/Satire)
      - Re: (Score:2)
        
        by John Hasler ( 414242 ) writes:
        
        Seven years and 72 hours of total down-tine... It might not be five nines, but does seem a pretty respectable up-time percentage.
        By POTS standards it's abysmal.
      - Re: (Score:2)
        
        by John Hasler ( 414242 ) writes:
        
        The uptime of Skype to the user is the product of Skype's uptime, that of the user's Internet service, that of her electrical service, and that of her hardware. That product might exceed one 9 but it'll won't come near 5 9s.
    - Re:Lessons Learned From Skype's Outage (Score:2)
      
      by BrokenHalo ( 565198 ) writes:
      
      Well said. Skype is primarily a piece of technology aimed at the individual consumer. It is made completely clear at the outset that it doesn't claim to be a landline replacement, so anyone who lost business as a result of the outage doesn't get much sympathy from me.
      
      The dowmtime period for me was about a day and a half, which amounts to 0.41% of the year. No biggie, I have SIP and mobile alternatives. Or both if I run a SIP client over my wireless internet dongle or phone tether.
      
      I get very tired of tho
      - Re: (Score:2)
        
        by zach_the_lizard ( 1317619 ) writes:
        
        They are starting to roll out enterprise service. Skype for SIP now available in Beta [skype.com].
        Skype For SIP is the perfect way to integrate Skype with your existing PBX, allowing the communications from your PBX to be complemented by Skype functionality – head over to the Business blog to find out more about the Beta programme.
        Somehow I don't think PBX interoperability is aimed at the consumer market. (though SIP support might help some consumers)
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
- - Re: (Score:2)
    
    by BobMcD ( 601576 ) writes:
    
    It's sheer laziness to not patch your software. Yes, sometimes, a buggy update is unleashed upon the world. However, this is a case in point against running unpatched software.
    No, commodore64 is right. There needs to be a reason to patch and that reason needs to outweigh both the hassle of doing it AND the risk that something new will be broken.
    If you're not handing over fresh new dollar bills for a piece of software, expect it to be assembled with the bare minimum effort. This includes all patches. The likelihood that one of this will suck worse than the problem they're attempting to fix is very, very high.
- Re: (Score:2)
  
  by BobMcD ( 601576 ) writes:
  
  Sample bias again. TFA says 20% were affected, not 1%. Just because it didn't happen to you and your friends doesn't mean that the people who actually analyzed the problem suck at math.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Deployed Soldiers. (Score:5, Insightful)

Re: (Score:3)

Blogspam (Score:5, Informative)

Re: (Score:2, Informative)

Re:Blogspam (Score:5, Insightful)

Re: (Score:3)

Re:Blogspam (Score:5, Funny)

Re:Blogspam (Score:5, Insightful)

Re: (Score:2)

December 22th? (Score:5, Funny)

you are kidding me (Score:5, Interesting)

Re:you are kidding me (Score:5, Insightful)

Re: (Score:2)

Back up... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:you are kidding me (Score:4, Interesting)

lesson (hopefully) learned... (Score:5, Insightful)

Re:lesson (hopefully) learned... (Score:5, Interesting)

Re:lesson (hopefully) learned... (Score:5, Informative)

Re:lesson (hopefully) learned... (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Not true. (Score:3)

Re: (Score:2)

Re:lesson (hopefully) learned... (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

How are supernodes defined? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Obvious problem.... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I don't understand this. (Score:4, Interesting)

TL;DR version: (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Never makes sense to upgrade working software... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Supernode Software (Score:5, Interesting)

client crashes should not - server crashes (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Article Summary [sarcastic] (Score:5, Funny)

Re: (Score:2)

Skype Win 5.0 client sucks (Score:5, Interesting)

Re: (Score:3)

Public Post-Mortem (Score:5, Insightful)

Missed opportunity for open source (Score:2)

Forced auto updates are not the solution. (Score:5, Interesting)

Sounds similar to the AT&T crash (Score:2)