AWS Load Balancer Sends 2 Million Netflix API Reqs To Wrong Customer 58
rsk writes "Amazon Web Services' Elastic Load Balancer is a dynamic load-balancer managed by Amazon. Load balancers regularly swapped around with each other which can lead to surprising results; like getting millions of requests meant for a different AWS customer. Using ELBs can result in AWS unintentionally introducing a man-in-the-middle (attack) into your application environment. Most AWS users do not realize this can happen and have not secured against it."
TTL value (Score:2)
Re: (Score:3)
Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.
Re: (Score:2)
Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.
No, they are not "forced" to do so. They have chosen an improper method to "workaround" a security issue that violates other internet standards and causes issues, because they are not implementing DNS resolution in a valid way.
The TTL in DNS is not an "advisory" value, it is a time after which the old RR in the previous authoritative DNS response must be expunged, a TTL o
Re: (Score:2)
The browser makers playing fast a lose with standards, outside of html sucks! They all suck, try an find a browser that does PASV ftp *correctly*. They all either as part of a very misguided security attempt or based on the assumption FTP servers are behind NAT and can't be configured to send a correct address in the PASV response don't use the address value returned and stupidly use control sockets remote address as the address.
That breaks all but the very most common use case and all the browsers do it.
Re: (Score:2)
Security is NOT an issue with The Cloud. (Score:2, Funny)
Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.
The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.
An
Re: (Score:1)
... and here I am without any mod points.
Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.
PS: For those of you without an irony chip installed ... pretend I started my post with </irony>
Re: (Score:2)
... and here I am without any mod points.
Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.
PS: For those of you without an irony chip installed ... pretend I started my post with </irony>
Pretend you started your post with Irony, off?
-AI
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
A single tear rolled down my cheek as I compared this satire against real, starry-eyed reactions of my company's management with "the cloud".
You know, this mythical beast that solves all scalability and maintenance issues while simultaneously having absolutely zero downsides...
Re: (Score:2)
cool story, bro [google.com] - but maybe it was submitted once and some faulty load balancer spread it out.
Re: (Score:1)
They don't need their own resolver to cause problems. Many popular programs cache DNS requests well longer than is appropriate. Firefox, for one caches DNS records internally (some versions on some platforms even for HOURS beyond the TTL unless you restart it) and so does Mac OS X itself.
Re: (Score:2)
Good point. There are APIs that provide TTL information (such as res_query), but Firefox does not seem to use them. Interesting.
Re: (Score:3)
It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.
"Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.
Re: (Score:2)
It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.
"Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.
Thus Elastic Load Balancer's other name, Erratic Load Balancer.
Re: (Score:3)
If the customer's getting the finger, wouldn't that make it more of an Erotic Load Balancer?
Re: (Score:2)
Pointing the figure and screaming very loudly would be very good, especially if Amazon does it, as it will help bring attention to broken behavior in DNS and browser software.
I will agree it hurts Amazon, but it helps the community, for large players like Amazon to help bring attention to broken software, so that it can be fixed.
Re:TTL value (Score:5, Interesting)
From what I've seen, it's frequently the client's DNS servers, not the client itself.
I've used a short TTL (5m) for quite a while. It's intentional, because I've needed to switch things rather quickly in the past, and it's better for it to "just work", rather than waiting hours for everyone to pick up the change.
I used to work for a place that had a huge traffic load. Our slow days were still millions of unique visitors. When we took a machine out of DNS (DNS round robin between 15+ machines), we'd see the traffic drop significantly in the first 5 minutes. When AOL finally saw our change, it would drop more. There would still be lingering people for about an hour, and then it would finally be idle.
That was a pretty regular thing for us to do. We scaled our traffic to our various datacenters this way. We'd also load test lines and individual servers with it. If it looked like we were running into a bandwidth limitation, I'd throw a few hundred Mb/s down the line, and see how it performed. If it really was, we'd then switch everything away from it to other datacenters until the provider fixed it.
In all those circumstances, in 5 minutes most (but not all) of the traffic moved. An hour from the change, the remainder had moved.
I've seen this with my home provider. I let them handle DNS for my home machine, rather than doing it myself. I've made changes, and they don't respect it within 30 minutes. Within about an hour, the new DNS records show properly.
Google's public DNS servers seem to do pretty well in that respect. Our changes are reflected properly there in just a few minutes. AOL, TimeWarner/RoadRunner, and a few others are pretty bad. I know why they do it (reducing load on their DNS servers), but it becomes a pain in the ass for places that need to make changes quickly.
Re: (Score:2)
Re: (Score:2)
There are some clients that cache dns records until they're restarted. I've removed internet facing vips from dns and weeks later there are still 100+ clients making connections, the only thing that would stop them is a client restart.
Re: (Score:2)
I think that was the primary motivation for Google setting up their public DNS servers (8.8.8.8, 8.8.4.4).
http://code.google.com/speed/public-dns/ [google.com]
Why no proxy? (Score:2)
Why doesn't Amazon use a reverse proxy which performs additional checks and routes the requests to the right customer? (With Server Name Indication, that would work for TLS, too.) Without that, it's simply not possible to switch IP addresses quickly between non-cooperating targets.
Re: (Score:1)
Re: (Score:2, Insightful)
On top of that, their "Elastic Load Balancer" (just another bullshit "cloud" marketing term for their cluster of F5 load balancers at each availability zone) is just, as I mentioned, an array of F5 load balancers. They either a) don't support the functionality OP is speaking about, or, more likely, Amazon chooses not to support handling traffic in that way to simply operations.
Re: (Score:2)
Re: (Score:2)
Does this really help if ELB misdirects requests? Or would this setup result in stable ingress IP addreses, so that ELB worked perfectly?
Re: (Score:2)
Simple. You most likely still pay for misdirected traffic in that case.
Charge both ways! (Score:2)
2. Sell it to customers until it breaks
3. Patent software anomaly
Profit!
Re: (Score:2, Informative)
Actually, they didn't write the load balancer. They just bought F5s and integrated them with their infrastructure to change their configurations programmatically.
Re: (Score:2)
Which exactly is what they do, using Xen instances. Duh. RedHat built out their environment for them. This is not rocket science, and is all out on the web if you know how to Google, use LinkedIn, etc.
Re: (Score:2)
Just googled it - if Amazon were using F5, F5 don't know about it [f5.com]. And even if the original design was just using spare capacity, that simply is not the case now (after all, that would imply that if Amazon itself needed to ramp up demand it could - and would - simply annex the entire EC2 capacity to cover it. This is, obviously, not the case).
Re: (Score:2)
They could've migrated away from them as part of their platform. My knowledge about it is 18-24 months old.
Re: (Score:2)
If they were F5s, they'd actually work. We use F5 here, and from looking at the config, Amazon would have to be literally incompetent to get such basic functionality wrong.
Easy fix below (Score:2)
Re: (Score:2)
Use rewrite rules to do a 301 redirect to goatse.cx when the host is api.netflix.com!
Why do that when the person to erronously receive the traffic could maybe do something profitable with that? Such as co-opt the Netflix API calls and display "video" or "messages" to convince the user to subscribe to a different service, netting $$ to the unintended target who received Netflix's requests
IPv6... (Score:2)
In this scenario, IPv6 would alleviate the need to so aggressively reuse IP addresses in that scenario.
Of course, one wonders given the high amount of traffic if amazon is needlessly changing addresses. They probably should make more effort to have a tendency to be more persistent even beyond the 'promise' of the ttl. Sort of how in most DHCP servers, even when your lease expires you'll still often get the last address you had because the DHCP server retained it anyway unless pool exhaustion forces a chan
DNS caches for 4 days. (Score:2)
AWS charges based on load (Score:2)
Whodunnit? (Score:2)
Does this story come with any indication that their isn't a mixup on Netflix's part?
Re: (Score:2)
... "there" isn't a mixup on their part. Honestly, it'd be great if the Slashdot API reacted in the same year that I clicked on preview.
Re: (Score:2)
The preview goes via Netflix.