Google Takes Blame For Internet Disruption Across Japan (theregister.co.uk) 25
An anonymous reader shares a report: Google on Saturday accepted responsibility for the widespread internet disruptions Japan experienced the previous day. The search engine giant apologized for the trouble, saying it was caused by an errant network setting that was corrected within eight minutes of its discovery. Google did not say whether human error or a technical malfunction was to blame. The disrupted services used internet connections provided by NTT Communications Corp. and KDDI Corp., both of which said Friday that the issues were caused by a change in the flow of data traffic. From a report on The Register: The trouble began when Google 'leaked' a big route table to Verizon, the result of which was traffic from Japanese giants like NTT and KDDI was sent to Google on the expectation it would be treated as transit. Since Google doesn't provide transit services, as BGP Mon explains, that traffic either filled a link beyond its capacity, or hit an access control list, and disappeared. The outage in Japan only lasted a couple of hours, but was so severe that Japan Times reports the country's Internal Affairs and Communications ministries want carriers to report on what went wrong.
Hmm, actually we *knew* it was them in the first place the moment it started, most non-joke internet network engineers refuse to fly blind, so there are probes and monitors everywhere in the Internet control plane (DFZ BGP routing). E.g. read: https://bgpmon.net/bgp-leak-causing-internet-outages-in-japan-and-beyond/
Also, most internet network engineers will place a lot of the blame on *Verizon*, not Google. "route leaks" are a *fact of life*, they will happen at least once to everyone. You *MUST* filter the routing plane to not accept crap from other autonomous systems you peer with, and Verizon *utterly failed* at doing it. Had they filtered, they'd have rejected the bogus routing from Google and avoided most, if not all of the damage.
So, Google might have publicly taken the blame since it was their operational error that triggered the damage in the first place, but they are at most responsible for half of it... and *everyone in the field* knows it, Google included.
did a company that does no provide transit services even manage to send a route table that was accepted for use? Just seems like a very exploitable issue there. Does google have authority or permissions that allows this, even though they don't actually have the capabilities?
Multihoming. I'd imagine Google provides transit service, but only for their own IP blocks. Each Google datacenter almost certainly has multiple Internet connections to the world. As a result, they have multiple netblocks provided by multiple ISPs.
If, through some unlucky DNS accident, a client on ISP A looks up google.com and gets an IP address provided by ISP B, it would take many more hops to reach that server via public Internet routes than by sending traffic to that datacenter's nearest router (on ISP A) and asking that router to forward traffic through the datacenter to the other set of IPs.
This happens because their peers were too lazy or otherwise unable to set up and maintain proper route filters for Google's ASNs. Problem is that with Google Cloud and Google Fiber (for business), google actually sometimes *does* provide transit, so it would be a difficult thing for all of their upstream providers to keep continuously updated -- instead they think "Well, they are Google. They are probably doing that all on their end and won't ever fuck this up."
BGP routing has no authority mechanism. Anyone can publish any route. This is not [bgpmon.net] the [bgpmon.net] first [networkworld.com] time [pcworld.com] this has happened, nor will it be the last.
