How Facebook Keeps Messenger From Crashing On New Year's Eve (ieee.org) 69

Posted by EditorDavid on Saturday December 29, 2018 @06:44PM from the crashing-the-party dept.

Wave723 quotes IEEE Spectrum: On New Year's Eve, millions of people will use Facebook's Messenger app to wish friends and family a 'Happy New Year!' If everything goes smoothly, those messages will reach recipients in fewer than 100 milliseconds, and life will go on. But if the service stalls or fails, a small team of software engineers based in the company's New York City office will have to answer for it.
The article says the team "tested and tweaked the app throughout the year and will soon face their biggest annual performance exam," since Messenger's 1.3 billion monthly active users send more messages on New Year's Eve than any other day of the year. Many of them hit "send" at the exact moment when their clock strikes midnight, "and people often try to resend messages that don't appear to make it through right away, which piles on more requests."

The solution appears to be load testing, re-directing traffic, message batching, and discarding "read receipts" and temporarily disabling other minor Facebook functions -- or, more generally, what their engineering manager describes as "graceful degradation."

How Facebook Keeps Messenger From Crashing On New Year's Eve

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 69 Comments Log In/Create an Account

Comments Filter:

So... (Score:2)

by Anonymous Coward writes:

"The solution appears to be ..." Stuff we've known since 1999?
- Re: So... (Score:1)
  
  by Anonymous Coward writes:
  
  Release the hounds, Smithers
- How glib (Score:5, Insightful)
  
  by SuperKendall ( 25149 ) writes: on Saturday December 29, 2018 @07:04PM (#57876790)
  
  "The solution appears to be ..." Stuff we've known since 1999?
  It's one thing to say you know how to do it...
  Quite another when literally BILLIONS of people are using your services all at once - especially around NYE where it's not even spread through the day, it's a huge DDOS equivalent with billions of messages at midnight exactly...
  Planning for that kind of load and super-extreme bursting is not easy, at all. No matter how much you "know".
  
  - - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      One DDoS per hour then
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Money doesn't (fully) help (Score:2)
      
      by SuperKendall ( 25149 ) writes:
      
      Yeah, it's called money. All you need is money.
      How many years did we all have to suffer through Twitter Fail Whales while they were flush with cash?
      There are plenty of examples of giant well funded enterprises with websites that utterly suck and can handle just about no load - especially if you look at websites where tech is secondary, any kind of unexpected load and BAM they are usually down.
      Money can indeed help to buy the servers you may really need to handle load. Money can even help hire the people th
  - Re: (Score:2)
    
    by KiloByte ( 825081 ) writes:
    
    The vast majority of messages avoid that peak: hardly anyone waits for the exact midnight to send a message. So the load gets smeared onto quite a chunk of time.
    The engineering problem boils down to: send short messages between pairs of arbitrary sources and destinations (although usually the source and destination are close to each other), with message size usually within 50-100 bytes. Let's be generous and say that with metadata they fit within 1500 bytes. Hmm... I wonder, have we seen such a problem b
    - Think again, your numbers are absurdly low (Score:3)
      
      by SuperKendall ( 25149 ) writes:
      
      The vast majority of messages avoid that peak: hardly anyone waits for the exact midnight to send a message. So the load gets smeared onto quite a chunk of time.
      Look around you at the next NYE party and you will see just how wrong you are. Most people queue them up ahead of time and lots of people are hitting Send as the ball drops... (hint to devs, if someone has typed a partial message transmit that to the server in case they come back and hit send later - course Facebook was just screwed by that recent
      - Re: (Score:2)
        
        by KiloByte ( 825081 ) writes:
        
        Look around you at the next NYE party and you will see just how wrong you are. Most people queue them up ahead of time and lots of people are hitting Send as the ball drops...
        Ouch. At least there are no "smart"phone zombies so bad anywhere near me, neither among low-tech nor high-tech friends.
        hint to devs, if someone has typed a partial message transmit that to the server in case they come back and hit send later
        
        And that'll speed up that 14 words message... how?
        On a *normal* day, Messenger and Whats App process over 60 billion messages a day
        Thanks for the correction, I based my estimates on numbers in the article's summary.
        Come on man, you know that modern web API's are not that compact, and we are talking Facebook here. You are off by an order of magnitude at least, way more when you stop to think that on NYE way more people are sending images also... One single response to a post on Facebook I just did with 14 words had a 9.5kb body going out, and a 21.2 k response.
        I'm talking about the problem to solve not their implementation. Of course Facebook runs a PHP script that runs a bunch of NPM modules to produce a 1.5MB response, but after you cut down that bloat, you can get the same result with orders of magnitude less
        
        Re: (Score:2)
        
        by SuperKendall ( 25149 ) writes:
        
        Ouch. At least there are no "smart"phone zombies so bad anywhere near me, neither among low-tech nor high-tech friends.
        I am highly doubtful the people around you are as pure as you claim.
        And that'll speed up that 14 words message... how?
        Read again about actual message sizes instead of fixating on content size alone. You want to get that traffic up t the server ASAP and send only a rigger signal. Even if it WERE just 14 words it would still be... rather nice to have a few billion 14 word messages already t
        
        Re: (Score:2)
        
        by KiloByte ( 825081 ) writes:
        
        Article is literally about Facebook. That is the problem to solve for, given how it is built.
        That's an XY problem -- if the transport has scaling issues, instead of throwing more hardware at it at some point it's good to take a step back and see if there are better approaches. And the core functionality is so simple that replacing just that part while keeping parts of the several-hundreds-of-megabytes-per-phone bloat they insist so much on having intact is a viable proposition.
        
        Re: (Score:2)
        
        by kriston ( 7886 ) writes:
        
        Of course Facebook runs a PHP script that runs a bunch of NPM modules to produce a 1.5MB response
        You honestly don't believe each request to a server spools up a new PHP script instance like it's still 2006, do you?
        
        Re: (Score:2)
        
        by KiloByte ( 825081 ) writes:
        
        They forked PHP as HHVM, optimized the hell of it, and do some recompilation to C++, yeah. But you can optimize it only so much.
        PHP is a shithouse -- and I don't mean a building you defecate in, I mean one whose structural material is dried excrement (as still done by some tribes). It might have been adequate for a literal "Personal Home Page" with little traffic, but trying to throw more hardware at it to get it to scale to modern Facebook workloads is a fool's errand. It's kind of like banks running un
        
        Re: (Score:2)
        
        by kriston ( 7886 ) writes:
        
        My point is that every click doesn't spawn a new PHP instance in their architecture unlike regular PHP.
        COBOL is running just fine--its only problem is the lack of human knowledge in the marketplace.
        And, at least with IBM COBOL, today's IBM z/OS runs COBOL programs originally compiled on the System/360 in the 1960s.
    - - Re: (Score:2)
        
        by KiloByte ( 825081 ) writes:
        
        How long would it take if one of those FB-generated movies about you and your friend were sent?
        I'm pretty sure there's not a single device with Fecesbook not blocked on multiple levels within ten meters from my current position. On the other hand, what friend? I spent the NYE fitting a dual-slot external-power-needed graphics card into a board on which the PCIe x4 slot takes most of the board's length -- the stereotypes about our kind won't reinforce themselves :p
  - Re: How glib (Score:1)
    
    by donstenk ( 74880 ) writes:
    
    Actually, it _is_ spread throughout the day. Itâ(TM)s not NYE at the same time in the world ;-)
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by jrumney ( 197329 ) writes:
  
  I find it hilarious that the fix for overloading caused in part by people resending messages which don't appear to have gone through, is to discard read receipts. Did anyone in the New York office that is going to be held responsible for the inevitable outage think this through? I wish you luck, Facebook.
Facebook's New Slogan (Score:1)

by Anonymous Coward writes:

That should be Facebook's new corporate slogan: "Graceful Degradation."
Simple (Score:4, Insightful)

by Sebby ( 238625 ) writes: on Saturday December 29, 2018 @06:55PM (#57876768) Journal

How Facebook Keeps Messenger From Crashing On New Year's Eve
Simple, do awful things that will make people avoid using any of your services.

- Re: (Score:2)
  
  by KiloByte ( 825081 ) writes:
  
  Simple, do awful things that will make people avoid using any of your services.
  Facebook has been trying that for years; doesn't help.
Just don't (Score:3)

by jwhyche ( 6192 ) writes: on Saturday December 29, 2018 @07:23PM (#57876844) Homepage

Want to keep it from crashing on New Year's Eve? Just to load the damn thing. There, simple. Problem solved.

"graceful degradation." (Score:2)

by grep -v '.*' * ( 780312 ) writes:

what their engineering manager describes as "graceful degradation."
If they'd just use SystemD their problems would be solved! For that matter though, I wish FaceBook would gracefully degrade to /dev/null.

Good luck to them though, it's a good engineering textbook problem. Stupid, yet necessary. (We have specific peak load times because we just do. Same thing with water supply and SuperBowl breaks, or 8AM/5PM rush-hour traffic.)

FB should also offer a "delivery within 100ms or your money back!" guarantee. See? The timestamp says it was _delivered_ to _our_ servers i
Loadbalancing... (Score:4, Funny)

by shabble ( 90296 ) writes: <metnysr_slashdot@shabble.co.uk> on Saturday December 29, 2018 @07:28PM (#57876858)

... couldn't they simply split their users up into, say, 24 groups, and reduce the load that way?

- Re: (Score:1)
  
  by sj26 ( 850595 ) writes:
  
  https://infiniteundo.com/post/... [infiniteundo.com]
  56. There are only 24 time zones
translation: (Score:2)

by Lehk228 ( 705449 ) writes:

facebook messenger is brittle poorly tuned garbage that cannot handle an ordinary upsurge in human use. AIM never had to be reengineered to survive new years eve without crashing, and it wasn't really all that good it just wasnt a flaming heap of shit
- Re: (Score:2)
  
  by Actually, I do RTFA ( 1058596 ) writes:
  
  FB Messenger is harder to handle loads, because they need to run machine learning on all the messages to build better profiles of their users.
- Re: (Score:1)
  
  by helpfulcorn ( 668048 ) writes:
  
  No, it just crashed at other points. The service where buddies were stored (feedbag) had a disastrous set of databases and there were times they all just went down *or* the whiscer (and others, essentially presence) services went down, and no buddies would show, and so people would sign off and on repeatedly trying to fix it slowing down the entire thing. It was all held together with gun tape and faith, at least in the late 90s and very early 2000s.
  
  There was never a test like that for AIM because people
By stealing dimes from the Elves (Score:1)

by Anonymous Coward writes:

By stealing dimes from the Elves?
- Re: (Score:2)
  
  by AndyKron ( 937105 ) writes:
  
  I still have an old rotary phone hanging in my basement complete with a Mr. Yuk sticker.
I don't understand (Score:2)

by AndyKron ( 937105 ) writes:

There's something like 26 midnights (timezones) around the world. Where's the problem?
WTF (Score:5, Interesting)

by ledow ( 319597 ) writes: on Saturday December 29, 2018 @08:53PM (#57877148) Homepage

Millions of people.. sending a small TCP packet... containing a couple of hundred characters...
Wow. Gosh. The infrastructure that must take to handle...
Like... a couple of servers in a rack and a few gigabits of uplink at worst.
Honestly, has modern technology come to this?
One single YouTube video probably has more bandwidth, more data transferred, more CPU usage and less latency.

WTF Commenters (Score:1)

by Anonymous Coward writes:

There's no way to reply to all the misinformed commenters here, but I'm really surprised at how naive the majority are. Clearly most of you have never worked with problems at this scale...this is a far more difficult problem to solve than you all think.
Graceful degradation (Score:5, Insightful)

by Bengie ( 1121981 ) writes: on Sunday December 30, 2018 @02:13PM (#57879756)

"Graceful degradation" is the unsung hero of properly engineered systems.

By pissing people off? (Score:2)

by BishopBerkeley ( 734647 ) writes:

Another factor may be that FB pissed so many people off by abusing their privacy that they deleted Messenger altogether. I did, anyway. Come on, people. Invite your friends for a gathering or accept another friend or family member's invitation. A messenger greeting blast has about as much impact and is about as memorable as a highway billboard encountered at 80 miles per hour. Do something meaningful.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

So... (Score:2)

Re: So... (Score:1)

How glib (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Money doesn't (fully) help (Score:2)

Re: (Score:2)

Think again, your numbers are absurdly low (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: How glib (Score:1)

Re: (Score:1)

Re: (Score:2)

Facebook's New Slogan (Score:1)

Simple (Score:4, Insightful)

Re: (Score:2)

Just don't (Score:3)

"graceful degradation." (Score:2)

Loadbalancing... (Score:4, Funny)

Re: (Score:1)

translation: (Score:2)

Re: (Score:2)

Re: (Score:1)

By stealing dimes from the Elves (Score:1)

Re: (Score:2)

I don't understand (Score:2)

WTF (Score:5, Interesting)

WTF Commenters (Score:1)

Graceful degradation (Score:5, Insightful)

By pissing people off? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals