Banks, Brokerages, PSN, the Steam Store, and More Are Down in Massive Internet Outage (theverge.com) 62
Many websites -- including banking pages, brokerages, and gaming services -- have been affected by what looks to be a major internet outage. From a report: As website owners and companies that run services that provide the backbone of the web scramble to solve the issue, consumers have been left unable to access services like Ally Bank, Fidelity, Sony's PlayStation Network, Airbnb, and more. Several airline sites are also having issues: Delta, British Airways, and Southwest's sites are all having major issues. At the moment, it's unclear what's causing the outage, though DownDetector reports that both AWS and Akamai, a pair of content delivery networks that host much of the internet, are both experiencing issues. Akamai's status page reports that the company is currently investigating an issue with its DNS service. Cloudflare's CEO has chimed in to say that its service isn't to blame.
The Cloud (Score:1, Informative)
The Cloud is just other people's computers that you rent.
Re: (Score:3, Insightful)
The whole internet is "other people's hardware". Sheesh, talk about beating a meme to death.
Re:The Cloud (Score:5, Insightful)
Did you miss the part about CDN? Do you not understand how the internet works? This isn't the 60's.
Re: (Score:2)
Because the customers didn't pay for failover to aws-west or aws-europe or wherever I would assume.
Re: (Score:3)
If people are doing it right, they have a skeleton framework of their stuff ready to go in another region, or some infrastructure-as-code (Terraform, etc.) to get them there in a real hurry. It's called "Disaster Recovery" and it's something every major business undergoes the trouble to plan for if they even have a clue.
That being said, there are still a few services on AWS that require the us-east-1 region - notably CloudFront, the AWS CDN.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
You do realize that these major CDN and cloud providers have an uptime that few, if any large companies can maintain on their own right?
This! "Someone else's computer" is almost universally owned and run by someone who is better at looking after their computer than you are!
Re: (Score:2)
I have run large amounts of systems and yeah, it's tricky and expensive... let's do a quick horseback cost analysis.
One company messes up and has an outage. They take the hit for however many hours they screwed up. Cost is to one company and their users.
CDM or some other highly centralized service screws up (anyone remember DynDNS a few years back or the great SFDC foul up where they gave access to everyone across domains and had to shut the whole thing down for over a day to fix it?) and EVERY company us
Re: (Score:2)
It costs a lot to use cloud providers as well. In my experience maintaining a DR on-premise hardware solution and extending yourself into various cloud providers is the sweet spot for medium to large companies. There are lots of variables to juggle like legal holds and backup systems that are different for each provider but could be consolidated easily with an on-premise solution.
Cloud allows you to scale quickly and easily, if you're doing it to save money I think you're doing it wrong. You can craft all
Re: (Score:2)
I've worked at MSPs for the last 17 years and other than some really awful clients with total garbage environments, nobody was down days per year, and downtime only decreased as clients moved from a rack with 10 servers on it to 2-3 VM hosts.
I can probably count on one hand weird incidents where down time longer than a couple of hours happened, usually the result of some significant external problem. Like a wide-area power outage due to weather, but it was almost irrelevant because dark buildings with no H
Re: (Score:2)
Thats nice. What does that have to do with what I posted?
Engineering against failure is not a simple as just having a reliable server any more. When everything is "somebody else's network" then using "someone else's server" really doesn't add much unreliability. OTOH, since, through a CDN, you get access to lots of other people's servers in lots of different places, most of the time, even if some part of the network fails, one of those other people's computers will likely be somewhere where your customer can connect to it. That means that even if your machine
Re: (Score:3)
Thats nice. What does that have to do with what I posted? I used to be that something like the "Steam Store" would go down and Steam IT people would fix it because it was their computers that ran the store. Now everything goes down at once because they are all someone else's computers. You are putting everything in a single point of failure: a bunch of datacenters containing other peoples computers in Ashburn, Virginia.
It looks like this outage was caused by a DNS misconfiguration at Akamai. A CDN like Akamai can provide distribution, load balancing, DOS protection, and redundancy way cheaper and more effectively than I can do so for myself due to economies of scale.
Is it possible to do better on your own hardware? Of course, but most businesses cannot beat Akamai's SLA in-house for a competitive price. So they choose Akamai or some other CDN. This means that when Akamai or another big CDN or cloud hosting provider goe
Re: (Score:2)
Hey look, someone that thinks of "The Cloud" as it was 10 years ago!
AWS has had more than the us-east-1 region for at least 8 years now. If you must have always-online then you had better be multi-regional, and one of those regions should probably be on a different continent. Good luck doing that easily with your own datacenter without taking a flamethrower to your budget.
However, "the cloud" is only tangentally related to what we're talking about here, which is CDNs. If you are running a network of the
Hrmm, let's see.... (Score:1)
Didn't AWS just recently kick NSO group off their platform? Nahh... couldn't be
Re: (Score:1)
All your websites are belong to us
Everything is breaking down right now (Score:2)
/.-ing (Score:2)
At least mrsmash did not post links to all those sites in TFA - the only thing that was missing was a massive slashdotting on top of the current issues. ...
Netcraft confirms-it
Re: (Score:2)
Re: (Score:2)
and why? Because CDNs, load balancers, and multiple-origin setups. Usually provided by cloud service companies.
If someone was able to be "slashdotted" - it's in quotes because Slashdot hasn't been the pre-eminent internet stampede for a decade or so - they should be lambasted for NOT using auto-scaling solutions that are brain-dead easy to implement due to the maturity of cloud offerings.
DNS. (Score:5, Funny)
It's not DNS.
There's no way it's DNS.
It was DNS.
Re: (Score:2)
That should be a tshirt. So true.
Re: (Score:2)
Cloudflare hasn't learned... (Score:1)
Re: (Score:2)
At least they didn't just blame "the network" (Score:2)
Re: (Score:3)
My experience is that more often than not those of us working with hardware in the field need to tell the Networking people what is wrong with their infrastructure. Our access control hardware throws alarms if it's offline for more than a second, I have yet to see networking monitors configured to throw alarms in less than 10 minutes, and normally they seem to be set for an hour or more.
Redundancy, Redundancy, Redundancy, Redundancy... (Score:5, Interesting)
A key design point of TCP/IP is that the message packet can allow for alternate routing of message even if a portion of the infrastructure is loss.
Arpanet the precursor to the current Internet was made off of Military budget as a network that will function if a portion of the US was nuked.
However with ISP monopolizing many areas, and get solo contracts on infrastructure, as well with companies being cheap and the fact the Internet graph has been trending into a bow tie pattern. We are getting a lot of single points of failure, in which a small issue can cause wide scale problems.
If your business is critical to be connected to the Internet, you better have redundant lines from different carriers that follow a different routing path, if you use cloud services, you should probably be using their competitors too, in case they go down.
I would also note, you should be keeping local backups yourself as well with remote backups stored in a different area of the world.
Redundancy is a lot of work, but it only really safe way to protect yourself.
Re: (Score:3)
Redundancy is a lot of work, but it only really safe way to protect yourself.
If you're on AWS and running your own database, redundancy across regions is not too much work. You should consider setting it up today.
From there, if you really want redundancy, setting up the web servers on different cloud providers is not too hard. The difficult part is the database.
Re: (Score:3)
We have numerous database replication technologies out there, that have been available for a while now.
One can get rather simple by creating the ODBC Driver to apply Database Changes to data commands to be pushed to multiple sources, while reading from only one, if one goes down, cache up the commands, then populate them when it comes back, in the mean time switch your data reader to the backup location.
Re: (Score:2)
It definitely can be done. If you want it to be solid, you need to use transactions and handle timeouts in the code. Unfortunately, there aren't very many teams that use transactions correctly (partly because a lot of new people don't really learn about them, and secondly because experienced people don't test their transactions).
On and off again (Score:5, Funny)
I hate myself but, have they tried turning it off and on again?
Re: (Score:2)
Re: (Score:2)
Hey (Score:2)
s/then/they/
Not interested in your personal pronouns buddy.
P.S. Isn't it time to push for Regex support in pronouns.
Re: (Score:2)
Re: (Score:1)
Actually meant as humor, as I found the thought of a substitution command being used as a pronoun to be rather humorous. . :-)
Re: (Score:2)
That's outsourcing for you.
Is it even plugged in? (Score:2)
Cleaning staff pulled the power cord *again*. We really need to talk to them.
Who cares... (Score:2)
Re: (Score:1)
When you see a dupe, you know /. is back to normal
The bigger the basket (Score:2)
All services are up and running. (Score:3)
Sorry, sorry (Score:5, Funny)
Re: (Score:2)
Backup to DNS: direct IP links (Score:2, Interesting)
Why are so many major organization crippled by DNS outages?
Where are the direct IP links and basic pages with direct IP links that could be used as backup?
Where are the primary-paths that don't rely on DNS that could be used as backups?
DNS is not the endall of internet, simply a simplification of addressing.
DNS maybe is the endall for general public usage, but us IT folks should be smarter than that.
We have non-DNS basic functionality setup for backup, don't other more major organizations?
Re: (Score:2)
Ironically, Prime Video was not affected (Score:2)
Covid becomes SkyNet (Score:2)
Well... (Score:2)
DEFCON (Score:2)
My opinion (Score:1)