Cisco's Network Bugs Are Front and Center in Bankruptcy Fight (bloomberg.com) 103
Reader Dharkfiber writes: Bloomberg is covering a story today about a hosting business that is now filing chapter 11 due to bugs in a switch. Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?An excerpt from the Bloomberg report: There's buggy code in virtually every electronic system. But few companies ever talk about the cost of dealing with bugs, for fear of being associated with error-prone products. The trial, along with Peak Web's bankruptcy filings, promises a rare look at just how much or how little control a company may have over its own operations, depending on the software that undergirds it. Think of the corporate computers around the world rendered useless by a faulty update from McAfee in 2010, or of investment company Knight Capital, which lost $458 million in 30 minutes in 2012 -- and had to be sold months later -- after new software made erratic, automated stock market trades.
Peak Web, founded in 2001, had worked with companies including MySpace, JDate, EHarmony, and Uber. Under its $4 million-a-month contract with Machine Zone, which began on April 1, 2015, it had to keep Game of War running with fewer than 27 minutes of outages a year, court filings show. According to Machine Zone, the hosting service couldn't make it a month without an outage lasting almost an hour. Another in August of that year was traced to faulty cables and cooling fans, according to the publisher.
When will IT training become formal curriculum (Score:3, Insightful)
Never? That's all being outsourced, duh.
Re: (Score:2)
Plus, there are plenty of students who don't want to go into that industry. They would rather be lawyers, bankers, artists, athletes, and those kind of careers really should stay out of IT.
Re: (Score:1)
"High school"? They insist that this stuff be taught in pre-school. On the other hand I see no problem if it replaces "women studies" in those ivy league colleges.
point and click
Re: (Score:1)
Which would you rather interact with? A CISCO Nexus 3000 or a woman?
Re: (Score:1)
I wouldn't know, I've never had a Cisco...
Re: (Score:2)
Re: (Score:2)
Re: When will IT training become formal curriculum (Score:1)
Re: When will IT training become formal curriculu (Score:1)
Re: When will IT training become formal curriculum (Score:2)
The nature of coding has changed tremendously over the years. What they learn today will have to be unlearned, because by then it will become the wrong way to do it. And that's if there is still a need to do it at all.
Cars don't need a tune-up and chass
Re: (Score:2)
True, to a certain point, and also to a certain point, it's true that the more IT goes offshore, the worse this is going to get. Fortunately, it looks like some companies are waking up, and realizing that contrary to what the IT service provider is telling them, IT isn't "just following procedures, which anyone could do".
Re: (Score:1)
Re: (Score:2)
I think that's true in some cases, but I've met people who couldn't apply the scientific method to a problem to save their lives. (Form a theory, devise a test, perform test, collect results, revise theory.) There are apparently people who don't understand this at a fundamental level, and even learned as a procedure, are at a loss as to how to apply. System administration is a skill that not everyone has, and one that can't be taught to just anyone. You get outsourced IT on scratchy phone lines who when
Re: (Score:2)
I have encountered those types as well.
They can *perfectly* follow a list of instructions, even those with branches, so long as those branches describe exactly what they see.
Those people are invaluable in a HVM testing environment where it's:
* Load trays in tester
* Push run button
* Unload passed and failed parts, put on appropriate shelves
* If tester jams like picture A do worksheet FOO
* If tester jams like picture B do worksheet BAR
All is well. BUT if the tester jams and it's not like A || B they are
Re: (Score:3)
You are right, for processes that can be described in a small enough number of steps. You definitely don't want to take up the time of a big brain with lots of experience doing the same operation over and over. But I would submit that a modern Enterprise installation is just too big and too varied to expect any practical set of procedures to cover, say, 80% of issues. The fall back in my experience is typically to (a) apply patches, and (b) hope the problem goes away.
I think this is why, when a company d
business-critical means you need hot sparing (Score:3)
or at least a cabinet full of new plug-ready parts. that means the HDAs need to be pre-formatted, for instance. cables tested. configurations stored on a server for tftp loading behind your firewall.
things that cost money. things that suits have no clue about.
Re: (Score:1)
Re: (Score:3)
This is the exact same reason why companies don't train their (IT) staff. What is the point in spending money to make it easier for them to leave you?
Re: (Score:2)
https://xkcd.com/908/ [xkcd.com]
All Cisco users had this problem? (Score:5, Insightful)
All systems have bugs, not all data centers have this kind of crap uptime record.
Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails. You won't get this if you just hire A+/Net+ staff.
Blame the PHB/CTO not the hardware.
Re:All Cisco users had this problem? (Score:5, Interesting)
Agreed. I've worked at places that kept five nines of availability. By "normal" IT standards, it was massively overbuilt: multiple sets of gear, clustered in failover mode, with a separate redundant setup elsewhere in the data center, on an entirely different power feed. . . (as I recall, we had at LEAST 4 independent power feeds)..
We also had cabinets full of spare parts, entire full pieces of gear on the shelf, and an entire library of config files on the TFTP server. Plus duplicated on a laptop that lived in one of the cabinets. Took a LOT of labor and gear, and was not cheap,
And we constantly had to explain the man-hour and spares costs to the suits . .
Re: (Score:3)
I work with and build them all the time. Mind you I realy no longer think you can get any complex service into 5+ 9's without the application being part of the solution, reliability is not a bolt on thing it's baked into the design.
The story is laughable, fault fans, buggy firmware on the nexus 3k's. Those are TOR switches should be extremely easy to replace and always used in redundant setups. They probably got suckered into VPC and similar, guess what I dont care what they say all stacks share a single
Re: (Score:2)
They probably got suckered into VPC and similar, guess what I dont care what they say all stacks share a single failure domain, dont get me wrong they are great but you need at least A+B stacks.
Yeah and reading release notes is an easy way to convince yourself that unless you need something like VPS or are cheesing license limits on a management platform, stacking should just be purged entirely from your configurations.
Re: (Score:2)
I'll take trill over full stacking anyday, but I still think of trill as a single failure domain.
Re: All Cisco users had this problem? (Score:2)
The Internet has contributed greatly to this mentally. It's a lot cheaper to have your users download a patch than to ship a CD, floppies, or tape
Re:All Cisco users had this problem? (Score:4, Funny)
Suit: "Explain the man-hour and spares costs to me."
Engineer: "Certainly." (*brains him with a fried 24-port managed switch*) "Would you like it explained again?"
Re: (Score:2)
I was thinking "our contracts specify that if something goes wrong we have to fix it in under 30 minutes or we pay a lot of money. The only way to do that is to have the spare parts and people on site." but I like your method :)
Re: (Score:1)
Re: (Score:2)
This is really the answer.
Buggy or not, if you provide an SLA (service level agreement), then you are ultimately responsible for it.
You do what you have to provide that SLA. ...
Test the equipment you plan to use.
Add a lot of redundancy and failovers
SLA's cost money.
Heck one silly line in the article is
"The entire network often has to go down in order to patchâ"very disruptive in the best of times,"
I really have to wonder what kind of network these guys are running. There should be failover nodes to tak
Re: (Score:2)
Yeah, this sounds to me like management overselling/overpromising and underfunding the back end IT staff and hardware.
It just sounds like their network was not properly architected with redundancies, perhaps with the hope of getting the contract first and building up the infrastructure later.
Then, when they can't deliver, they look for scapegoats... oh look, Cisco has some money... it's their fault!
Re: (Score:2)
Re: (Score:2)
Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails.
That would always be a preferred model, if you have that kind of budget, but...
Blame the PHB/CTO not the hardware.
...I'd say the equipment vendors should share some of that blame. I don't work anywhere near the 5 9's area and even I find some really appalling feature bugs introduced on even routine patchlevel upgrades. Stuff like the combination of DHCP-snooping/arp-inspection/source-lockdown on a port, which is the right way to configure access ports for anyone who gives a flip about security in depth, suddenly blocking all traffic after
Re: All Cisco users had this problem? (Score:1)
Re: (Score:2)
Priceless: Having the whole network vulnerable to a single exploit.
All the vendors want you to go 'one stop shopping'. That should make you a little uncomfortable.
Read the warranty card. .. (Score:3)
"Disclaimer of Liabilities - Limitation" Page 16, states that (condensed) : all liability shall not exceed the price paid for the software, or of the price of the product which includes the software.
And to use the equipment and Cisco software, you agree to the terms of service.
http://www.cisco.com/c/en/us/t... [cisco.com]
So, at best, they can recover the costs of the switches involved. . .
Re: (Score:2)
How else do you expect this to turn out? I suppose that Cisco could pay up to keep them quiet and out of a courtroom but that sets a precedent for writing checks if a company can somehow blame them for their failure.
There is already a long history of people getting fitness of purpose claims tossed out of court. I don't believe that Cisco has much to worry about here.
This is timely (Score:5, Interesting)
I'm a photographer, and I sell my work through a web service. They bring together the finishing providers (prints, calendars, t-shirts, etc) and take care of payment, and all I have to do is provide content and manage sales. When I finish post-processing on a new photo, the tool I use (Adobe Lightroom) automatically uploads to the web service in the album I select. I cover events, so there's often a massive number (600 or so) of photos to upload.
Yesterday I was getting sporadic "service not available" messages from the service. After doing some triage to verify the problem was not at my end, I contacted customer support. Mind you, this was 10:30 PM PST. But that's the way it is with photographers -- we often take photos during the day and process them at night, which is somewhat the opposite of a standard use case. (And should be borne in mind when said services schedule maintenance. Just sayin'.)
Browsing the service's forum, I saw others were seeing the same error message, and people were starting to get excited. (This is our livelihood, after all.)
I got an answer to my service ticket in less than 30 minutes, that they were struggling with with network problems with one of their service providers (probably a cloud service). I got a followup shortly after that they thought the service was up now but they were still testing. And I got another followup at 6:30 AM that the problem had been resolved and they had put steps in place to insure it would not happen again. They also implemented a "status page" that we could consult in the future (which should have already existed, but live and learn).
Now, *that's* the way to handle an incident like this. Very commendable. But it does point up the problems a business sometimes has when they rely too much on external services. Just my opinion, but the main difference I can see between in-house and outsourced is one of motivation. If you're providing an online service, your employees realize in their heart of hearts that outages can easily result in business failure and loss of jobs. But if you're renting all the pieces of your service from outside vendors, you soon find that those vendors may be concerned about their contract with you, and the money they make off you, which isn't at the same level in the hierarchy of needs as the live-or-die situation you are in.
Re: (Score:2)
> What exactly does telling the level zero answering machines what the problem is have to do with corporate management understanding the MAINTENANCE does NOT mean RUN IT 'TILL IT BREAKS.
I'm having trouble parsing that. I'm going to assume the third "the" is supposed to be "that". Ok, now it scans.
Ok NOW, I'm having trouble relating what you wrote to what I wrote.
> You sir are NOT an IT support person.
I, um, sir, have been an IT support person since 1984, the date of my first post to net.news.newsite
Re: (Score:2)
Reply All podcast had a good episode on a cloud photo provider that had massive tech problems [gimletmedia.com] and people lost contact with their photos. Worth a listen.
Cloud just means a computer you can't control.
Re: (Score:2)
Absolutely true. I pay for a professional account on the service I use, and one of the bennies is the opportunity to keep all my original images on their cloud so that I don't have to worry about storing them locally or backing them up.
No. Not only no, but Hell No. Not on your friggin' life. Ok maybe as a backup, but putting the only copy of an original photograph on *someone else's* cloud? It is to laugh.
My original images reside on a local hard disk, periodically backed up to *another* hard disk, whi
IT in schools? (Score:3)
Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?
Good, bad, or ugly, is it time to admit that business can't really continue without Patents/Accounting/Negotiations/Advertising/Sales/1000 other things?
When will patent law/banking/economics/marketing of these become formal curriculum in schools? That's about the time when IT should become a part of the formal curriculum as well.
High school shouldn't be about training for a job that only a fraction of the students will eventually do. If businesses can't survive without IT, then they hire people who are specially trained in IT - a HS course won't be train people enough to solve any hard IT problems anyway.
Re: (Score:1)
Re: (Score:2)
Well, to be fair, almost all professionals will touch a computer of some kind at some point.
So learning *generally* about IT is not a bad idea. Basic principles of how a computer works, how networks communicate, how the Internet functions. All good things to learn.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
It prevents the PHB's foot from being covered in feces when he puts it up some IT dweeb's butt after unscheduled downtime.
Last time that dweeb will ever try and tell a manager 'I told you so'.
Re: (Score:2)
Re: (Score:2)
I meant to indicate that most people don't understand the biological support mechanism of the foot, and how that interacts with shoes. The arch of the foot and the ankle work together to provide shock-absorption, protecting the knee and hips; a shoe provides arch support, and how and to what degree it supplies this support affects the health of these anatomical components. So does the shoe's profile (raised heel?). The sole is made of complex rubber compounds which affect traction (e.g. some shoes can w
Re: (Score:2)
Re: (Score:2)
I agree it probably does not need to be part of schooling, but there are a lot of businesses that STILL treat IT like an unimportant part of the company, right up there with Janitorial Services etc, instead of what is really a mission critical part of the company
Re: (Score:2)
Re: (Score:2)
I know very very few schools that even OFFER shop even as an elective (My son's school does - one section, and it is mostly an engineering design class - they spend a lot of time on FEA etc - more computer time than tool time). Offering Computers? Sure!! Mandating it? Nope.
Seriously, I also know of no school that offers Home Ec, Cheerleading, Marching Band as CLASSES - After school activities? Yes. Gym ? Yeah NYC requires it, but you'll find more than 50% of the kids don't take the required amount th
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:3)
IT training? (Score:4, Insightful)
People will figure out IT training is important, when they realize that they can't make stupid statement like "IT training" as if it means something.
What even IS I.T.?
Are you talking about server management? Network Managment? DevOps? What skill sets do you need?
It's like saying we need more brain surgeons, so we need MOAR BIOLOGY TRAINING!
MBAs, or people in general, will never appreciate just how complex some work can be, because of Dunning-Kruger. They don't know or understand how complex IT is, therefore they are unable to *appreciate* how complex IT is. Just like they are unable to appreciate anything else that is complicated, whether it's medicine, physics, etc.
Re: (Score:1)
Re: (Score:2)
IT is the burger-flipping job of the future. It's the people who rack switches, wire up networks, and generally do grunt work. High school students are IT worker stock.
This is Slashdot, where we confuse Computer Science--the elites, the engineers, the mathemeticians, the people who can't do their job because they don't know linear algebra--with IT, simply because most burger-flipping server jockeys have figured out how to use awk and perl, kind of, and think that makes them the same thing as a software
Re: (Score:2)
Wow. That's a bleak and insulting picture of both the future and of slashdot. And you're clearly suffering from Dunning-Kruger -- or more accurately, the rest of us find you insufferable due to Dunning-Kruger.
I admit, I awk and perl pushed everything about linear algebra and a whole boatload of other things I learned in school out of my brain. But that doesn't make me stupid -- I simply know practical things for my particular and current situation. I have no doubt I could pick up linear algebra quickly shou
Re: (Score:2)
It has to be wrong to be ignorant; and it has to be at least almost-true to be insulting.
Someone on Slashdot started talking about IT degrees when I mentioned self-studying Computer Science. If you've ever looked at the two programs, you'd facepalm at warp speed: IT degrees include "general IT", network administration, and IT security; Computer Science is essentially a mathematics discipline based on exploring what can be computed (and how). They're not really equivalent in any sense.
You talk a lot a
Re: (Score:2)
Re: (Score:2)
I didn't say it was a bad thing. Why do you think cell phones cost so little now? Just cell phone service for 2 hours of voice per week would cost $550/month right now if it had only followed inflation since 1984 (the first commercially-available cell phones cost $4,000--over $9,000 in today's dollars--and yes, that's what the rates were like).
You have to remember: businesses don't pay wages. Businesses get revenue from product sales. In the end, what isn't taken by profit or taxes somewhere along t
Re: (Score:1)
"faulty cables and cooling fans" (Score:3)
If they're contractually bound to deliver that sort of uptime, and their system isn't designed to tolerate these kind of failures, they deserve to fail.
Re: (Score:2)
To a degree. What if there is a serious bug or hardware flaw from a sourced component. Remember when HP bought motherboard components (faulty capacitors - from a supplier who had tried to steal the code from another company and had stolen fake docs) about 10 years ago? Their laptops and desktops had about a 40% failure rate in the first year as a result. Is that on the consumers shoulders to have purchased a machine with bad motherboard capacitors that were sourced by HP? They should have met the specific
Re: (Score:2)
Re: (Score:2)
Bad Chinese electrolytic caps were _everywhere_ for a few years there. It's been more than 10 years, about 20. Tempus fugit.
The world continued to turn. Not everything was replaced in the bad cap window, not every OEM pinched that penny and not all bad caps failed when new.
Re: (Score:2)
Re: (Score:2)
Not sure I can recognize a surface mount cap at a glance. I bet they are still there. Sure the busses are better tuned, but at the end of the day, caps fix many things.
Re: (Score:2)
Whilst I agree in principal, in practice, we know that the 5* service offered by a budget provider will not be equal to the 5* service offered by a reputed provider.
A decent datacenter wouldn't be taken down by shoddy cables & ventilation.
There's no problem with choosing a low-cost datacenter...as long as you factor that into your infrastructure design and put the saved money into redundancy. Done right, spreading your risk over several low-cost options can provide a stronger service than putting all of
Re: (Score:2)
So the consumer is at fault for expecting lower failure rates?
Depends on the failure rate. Considering high end server hardware has a very low failure rate and yet in critical applications with contractual uptimes they still get clustered or put in redundant pairs, yes I would say it still is the consumer's fault for not being able to handle the failure gracefully.
SOP (Score:2)
Sounds like the suits took a contract but did not want to pay for the back end infrastructure to really support it.
I can't tell you the number of times I've seen this mentality -
From Banks to Airlines to Healthcare to "Service" Providers....
Usually it seems to be a combination of cheap C-level people and a layer of "yes" men between them and IT.
Unfortunately the deciders in chief don't feel the pain when deals like this cause the company to implode....
Re: (Score:2)
Coding and IT aren't exactly the same thing (Score:1)
Re: (Score:2)
Don't depend on what you can't see (Score:3)
The company’s Nexus 3000 switches began to fail after trying to improperly process a routine computer-to-computer command, and because Cisco keeps its code private, Peak Web couldn’t figure out why.
...
Finally, late in October, came the 10 hours of darkness. Three people familiar with Peak Web’s operations say the lengthy outage gave the company time to deduce that the troublesome command was reducing the switches’ available memory and causing them to crash. The company alerted Cisco.
So they ended up black-box debugging the vendor's own problem for them. I wish I could say I am unfamiliar with that...
Re: (Score:2)
Re: (Score:2)
After taking training for Cisco certification I can think of many such commands. These switches are not the kind you find at Best Buy. These switches will communicate with other devices on the network about how to route traffic. Ethernet does do routing much like how IP does routing, just at a different layer. For this routing to be efficient every device needs to know something about where on the network the other devices are located.
An equipment failure, a poorly planned network, and improperly traine
Re: (Score:2)
Re: (Score:2)
Is that how you think you should respond to people trying to answer your questions? Is it that hard to comprehend that an Ethernet switch is a kind of computer? And that to make sure that the network is maintained that this computer needs to send commands to another similar computer? And do so "routinely"?
Therefore, if the switch fails to respond to those commands, and the network was not planned well, and competent people aren't there to fix the problem then the the network fails. Which is what brought
Re: (Score:2)
One obvious possibility are standard routing/switching handling messages like ICMP [wikipedia.org] and IGMP [wikipedia.org]. The former is used for all kinds of routing and error reporting purposes, and the latter for helping equipment keep track of which IPs need which multicast messages. ARP is perhaps technically another. That's the protocol network hardware/drivers use to map IP addresses to hardware MACs. But there are all kinds of other messages going around down under the application layer.
These are the kinds of things you'd expec