CloudFlare Was Hit By Leap Second, Causing Its RRDNS Software To 'Panic' (silicon.co.uk) 24
Reader Mickeycaskill writes: The extra leap second added on to the end of 2016 may not have had an effect on most people, but it did catch out a few web companies who failed to factor it in. Web services and security firm CloudFlare was one such example. A small number of its servers went down at midnight UTC on New Year's Day due to an error in its RRDNS software, a domain name service (DNS) proxy that was written to help scale CloudFlare's DNS infrastructure, which limited web access for some of its customers. As CloudFlare explained, a number went negative in the software when it should have been zero, causing RRDNS to "panic" and affect the DNS resolutions to some websites. The issue was confirmed by the company's engineers at 00:34 UTC on New Year's Day and the fix -- which involved patching the clock source to ensure it normalises if time ever skips backwards -- was rolled out to the majority of the affected data centres by 02:50 UTC. Cloudflare said the outage only hit customers who use CNAME DNS records with its service. Google works around leap seconds with a so-called "smearing" technique -- running clocks slightly slower than usual on its Network Time Protocol servers.
Was the Go prog lang at fault? Would Rust help? (Score:1, Interesting)
The blog post about this incident [cloudflare.com] says:
and then later it says:
Re: (Score:1)
I don't know if you can blame the language, the devs should have added their own checks if the language didn't have a guarantee.
Re: (Score:1)
Why would you even think of switching programing languages due to the simple and sadly common 'bug' of programmers not verifying parameters match a function's documented pre-conditions? My only guess is you're paid to promote Rust. Lazy programmers will write bugs in every language.
Re: Was the Go prog lang at fault? Would Rust help (Score:2)
>RRDNS is written in Go
Their bugs are in HR department.
Who in the world hired people who are dumb enought to use an experimental language in production?
Re: (Score:2)
Unit test those edge cases (Score:1)
Re: (Score:3)
Read the article then. It shows it pretty plainly: https://blog.cloudflare.com/ho... [cloudflare.com]
I was going to try to guess what they were doing, but they have some actual code snippets.
AFAICT, a unit test wouldn't have caught this either (unless they planned for this sort of error, in which case the code wouldn't have been broken either). From TFA:
RRDNS doesn’t just keep a single measurement for each resolver, it takes many measurements and smoothes them. So, the single measurement wouldn’t cause RRDNS to think the resolver was working in negative time, but after a few measurements the smoothed value would eventually become negative.
So, a unit test with one negative example (which may have been difficult to mimic anyway, due to the direct usage of Time.Now()) probably wouldn't have triggered the issue o
Re: (Score:2)
I'm still left wondering whether the decision to put a leap second on the night tech support staff are most likely to be over halfway through a bottle of JD was A) some intentional attempt to catch edge cases where leap seconds happen during a year change or B) some tinfoil conspiracy where we'll find out billions of dollars were stolen from a system where that particular edge case could be exploited or C) just made by people so socially isolated that they don't realize just how hard it is to fix crashed bo
My internet died... (Score:2)
...at exactly midnight, while I was playing Chivalry. I kept getting laggier... and laggier... and then everyone "froze" and the client-side prediction took over. I was recording video and it was pretty funny. Everyone just kept walking forward, until they were in a wall, and kept trying to walk forwards.
It was interesting what the client prediction would let you do. You could change weapons. You could swing your weapon. You could throw axes (of which you have two) and they flew through the air, stuck in pe
Re: (Score:2)
PoGo players (and Ingress, I'd guess) have been known to capture indoor critters by dashing at building walls in the meatspace, then huddling over their phone to block signal. The game extrapolates position
Re: (Score:2)
I've actually been researching network game architecture lately and I was actually planning on doing some video-recorded analysis of various commercial-game network models when latency, jitter, out-of-order, and other errors occur. Extreme latency is a great way to "reveal" what's going on under-the-hood.
So this time, I got video and I didn't even have to set up artificial lag!
the gift that keeps taking (Score:2)
Do we really need to compensate? (Score:1)
We lose or gain a second here or there, who cares? The difference has been so far 27 seconds over the past 44 years or, extrapolated out, 1 MINUTE over 97 YEARS.
Are we really going to notice if the sun goes down a minute earlier every century? We already have to screw around with daylight savings & leap years why not just make February the 29th 24 hours and 1 minute long once a century and have done with it.
Echoes of time changes days gone by (Score:2)
I always remember time changes as busy nights in support when I worked for a large bank. The spring forward was usually a breeze, just a matter of a lot of server verifications and log checks, but the fall back was usually a messy night. Much harder to deal with and resolve issues involving duplicate timed log entries and transaction logs. I don't really miss those days...
Goes to show the old adage is true (Score:2)
Don't use services who names are terribly ironic in times of failure.
Flare, Flame, Burn, Drop, Etc. Et.
The universe just loves to throw a wrench at such forms of un-intentional hubris just for the LOLs.