Cloudflare Comes Clean On Crashing a Chunk of the Web Earlier This Month 111
Cloudflare has published a detailed and refreshingly honest report into precisely what went wrong earlier this month when its systems fell over and took a big chunk of the internet with it. The Register reports: We already knew from a quick summary published the next day, and our interview with its CTO John Graham-Cumming, that the 30-minute global outage had been caused by an error in a single line of code in a system the company uses to push rapid software changes. [...] First up the error itself -- it was in this bit of code: .*(?:.*=.*). We won't go into the full workings as to why because the post does so extensively (a Friday treat for coding nerds) but very broadly the code caused a lot of what's called "backtracking," basically repetitive looping. This backtracking got worse -- exponentially worse -- the more complex the request and very, very quickly maxed out the company's CPUs.
The impact wasn't noticed for the simple reason that the test suite didn't measure CPU usage. It soon will -- Cloudflare has an internal deadline of a week from now. The second problem was that a software protection system that would have prevented excessive CPU consumption had been removed "by mistake" just a weeks earlier. That protection is now back in although it clearly needs to be locked down. The software used to run the code -- the expression engine -- also doesn't have the ability to check for the sort of backtracking that occurred. Cloudflare says it will shift to one that does. The post goes on to talk about the speed with which it impacted everyone, why it took them so long to fix it, and why it didn't just do a rollback within minutes and solve the issue while it figured out what was going on.
You can read the full postmortem here.
The impact wasn't noticed for the simple reason that the test suite didn't measure CPU usage. It soon will -- Cloudflare has an internal deadline of a week from now. The second problem was that a software protection system that would have prevented excessive CPU consumption had been removed "by mistake" just a weeks earlier. That protection is now back in although it clearly needs to be locked down. The software used to run the code -- the expression engine -- also doesn't have the ability to check for the sort of backtracking that occurred. Cloudflare says it will shift to one that does. The post goes on to talk about the speed with which it impacted everyone, why it took them so long to fix it, and why it didn't just do a rollback within minutes and solve the issue while it figured out what was going on.
You can read the full postmortem here.
Why link to Register (Score:5, Informative)
and not the actual report?
https://blog.cloudflare.com/de... [cloudflare.com]
Re: Why link to Register (Score:2, Insightful)
Because itâ(TM)s slashfuckingdot. Coerced the liveleak moderators to migrate, and went down in quality.
Thanks for the link.
Re: (Score:2)
Because the person submitting the story didn't do correct research, the editors just pick the bones thrown at them and look at the votes for submissions.
Re:Is that regex? (Score:5, Informative)
Re: (Score:2)
That you have to resort to referring to the TIOBE index , which puts the combination of Delphi and Object Pascal at #18 and dropping, even being beaten by Visual Basic, shows how dead it is. You use Delphi, not object pascal, and the Delphi extensions to object pascal do indeed support automatic object creating every time you drop an object on a form, and automatic deletion when the form is destroyed.
It’s right there in the manuals if you had bothered to read them instead of being a software pirat
Re: (Score:1)
You just keep acknowledging that I really got your goat after you kept pissing everyone off a decade ago with your Hosts File spam. Nobody likes a spammer, and I obviously totally owned you because almost a decade later you’re still bringing it up.
You have never had a real job as a programmer, your Delphi knowledge is even less in demand than Visual Basic, nobody is going to donate any money to you to maintain your Hosts File or throw you a bone or a gig because all your self-promotion degenerates
Re: (Score:1)
Delphi is still dead. Nobody hires Delphi programmers, but you already know that because you’ve never had a real job. Your earlier hopes to leverage Hosts files into getting contacts for paying gigs died a long time ago. You don’t know c or c++, you can’t use FreeBSD because the install isn’t GUI-based, and you’ve been mooching off your relatives for so long that they have given up.
So, no skill in c, no skill in Java, so no commercial demand for you. Just another incel base
Re: (Score:2)
Nobody believes you. You have never had a real programmer job. And real c programmers use malloc() and free() for character string operation, not some shitty framework to hold your hand.
You are obsolete. Unemployable. Useless. How do you justify your existence in today’s world, when your go-to language of choice ranks lower than even Visual Basic? (Hey, you’re the one who pointed it’s low ranking out - typical APK own goal).
You continue to be SO easy to troll, stupid troll. You’
Re: (Score:2)
If you can’t guard against buffer overflow/underflow in c you’re doing it wrong. Same thing for memory leaks. Either you learn from your mistakes or you run to a framework to hold your hand. I learned to avoid memory leaks, buffer overrun, and dangling pointers, you obviously never reached that level. No wonder you use an obsolete language like Delphi. You probably can’
Re: (Score:2)
Re: (Score:2)
The libraries you use better be allowing for Unicode, because it’s out there and buffer overflows are a thing. And I’ve written string-handling code that doesn’t call strlen(). Pointer arithmetic is quicker.
Re: (Score:2)
Nobody uses 255 byte Pascal strings in a Unicode world. Yo
Re: (Score:2)
Beginning with Windows 7, a kernel name resolution feature allows kernel-mode components to perform protocol-independent translation between Unicode host names and transport addresses
Host names can be in Unicode, and end up taking more than 255 bytes. 1 ch
Re: (Score:3)
It is a poorly written regex. It is easy to prevent the backtracking by constraining the character values.
They should have written [^:]*(?:[^=]+=[^=]+) which doesn't match exactly the same set of patterns, but is probably what they wanted, and will not backtrack.
Relevant XKCD [xkcd.com].
Re: (Score:2)
This is the one! [xkcd.com]
Re: (Score:2)
Sensible idea to exclude the equals sign in the wildcard group, but note that you're not preventing backtracking with your suggested regex, only limiting its growth to linear complexity in the case of a positive match. Using a regex debugger such as https://regex101.com/ [regex101.com] one can see that your regex requires 101 steps to match the string "x=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", which it does only once it has exhausted all backtracking scenarios. Switching from the greedy matching of * and + to the lazy *? and +?
Re:Is that regex? (Score:4, Funny)
But What Was the Bug's Carbon Footprint? (Score:1)
Re: (Score:2)
Why is this story both "news for nerds" and "stuff that matters"? Why haven't you shoehorned in some awkward reference to global warming?
Well, if all the CPUs in all of CloudFlare’s data centers were at 100% utilization, as described, they were plowing though electricity like nobody’s business - and actually did have a significant carbon footprint during that time period.
But back to the actual story... now I feel a little better about some of the stupid regex mistakes I’ve made.
Re:But What Was the Bug's Carbon Footprint? (Score:4, Funny)
now I feel a little better about some of the stupid regex mistakes I’ve made.
Yeah, I've taken out a process and been really embarrassed, but half the internet?! Impressive regex.
Re: (Score:2)
Yeah, I've taken out a process and been really embarrassed, but half the internet?! Impressive regex.
Well, like they say, if you have a problem to solve, just write a regex. Now you've got twal@##$... NO CARRIER
Re: (Score:2)
Should have hired me instead, assholes!
Team player, charming personality, yeah I can't see why they wouldn't want you.
Re:What is backtracking? (Score:5, Informative)
It's normal behavior. Backtracking means to find a potential match, start to parse it out fully, then discover it's not the best match so you have to go back to the beginning and start over skipping the match you just found looking for a better one. In this case it's the ".*=.*" that gives it away. That's the regexp patter to match things like cookie name/value pairs in an HTTP request line, or at least it's an attempt at it. The problem is that given a string like
COOKIE1=VALUE1 COOKIE2=VALUE2 COOKIE3=VALUE3
you have a lot of potential matches for both ".*" constructs which the parser has to iterate through. Using parentheses to outline the portions that'd fall into each ".*",
(COOKIE1)=(VALUE1)
is an obvious first match but so is
(COOKIE1)=(VALUE1 COOKIE2=VALUE2)
and
(COOKIE1=VALUE1 COOKIE2)=(VALUE2)
They probably shouldn't've been using plain ".*" in the first place, rather limiting the set of characters for each one to the legal set for eg. cookie names and values (assuming that's what they were trying to match, the contents of a Cookie header). This is an easy rookie mistake to make, you have to work with complex regexes for a few years and get your fingers burned a few times to learn it from personal experience, or ideally get your fingers smacked with a ruler a few times by a senior dev so you learn from his personal experience not your own and do it in dev or QA rather than production.
Re: What is backtracking? (Score:1)
Re: (Score:2)
No. The backtracking's because the regexp allows '=' and ' ' as part of the name and value portions of the segment it's matching. The regexp gets fairly complex for the full-on cookie syntax, but a stripped-down version would resemble:
([A-za-z0-9]+=[^\s]*\s*)*
Syntax here limits the name to one or more alphanumeric characters and the value to a sequence of zero or more non-whitespace characters, the whole repeated zero or more times. Making it handle the full syntax of a cookie header... you begin to see
Re: (Score:2)
Gotta love it (Score:2)
Trashed once again by shitty software, welcome to the future.
"Refreshingly honest report" (Score:2)
"Don't worry, we're still the #1 MiTM service in the world and growing. Government-funded agencies still love us."
This is why we can't have nice things (Score:4, Informative)
The risk of this kind of issue is why, for example, Rust and Golang's regular expressions don't support backrefrences, which can make them such a PITA to use.
Although I reluctantly admit that not crashing the internet and stuff could be more important.
Re: (Score:2)
If the pattern is actually a regular expression (as this one appears to be) then it can be matched in O(1) space and O(n) time with respect to the length of the input. True regular expressions never require backtracking. The problem is that many languages implement Perl-inspired "regular expressions", which are not actually regular due to misfeatures like backreferences that can't be represented as deterministic finite automata. Rather than detecting the common cases involving proper regular expressions and
Regex running in nonlinear time? (Score:1)
The real stunner is how there are even regex libraries out there that do NOT run in linear time. The blog post itself links to a classic article describing how to compile a regex to a nondeterministic finite state automaton that can match in linear time (the size of the automaton might be another issue, but that is discovered at compile time).
30 year old observations - still valid! (Score:2)
If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization
And it will probably remain a valid observation. Right up until civilisation is destroyed.
Re: (Score:2)
And when future archaeologists dig down to that layer ... "Yup, just as we thought -- a regular expression with misconfigured backtracking. Those poor people -- they never had a chance against Skynet."
metaphysical HTTP error code 600 (Score:2)
Metaphysical HTTP error code 600: CPU exhausted, not enough Ken Thompson.
"You have a problem, so you decide to use a regex" (Score:4, Informative)
...now you have 2 problems.
Expression Engine? (Score:2)
Surely they don't mean this thing [expressionengine.com]?!
Damnit again! (Score:2)