Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Businesses Software

Cisco's Network Bugs Are Front and Center in Bankruptcy Fight (bloomberg.com) 103

Reader Dharkfiber writes: Bloomberg is covering a story today about a hosting business that is now filing chapter 11 due to bugs in a switch. Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?An excerpt from the Bloomberg report: There's buggy code in virtually every electronic system. But few companies ever talk about the cost of dealing with bugs, for fear of being associated with error-prone products. The trial, along with Peak Web's bankruptcy filings, promises a rare look at just how much or how little control a company may have over its own operations, depending on the software that undergirds it. Think of the corporate computers around the world rendered useless by a faulty update from McAfee in 2010, or of investment company Knight Capital, which lost $458 million in 30 minutes in 2012 -- and had to be sold months later -- after new software made erratic, automated stock market trades. Peak Web, founded in 2001, had worked with companies including MySpace, JDate, EHarmony, and Uber. Under its $4 million-a-month contract with Machine Zone, which began on April 1, 2015, it had to keep Game of War running with fewer than 27 minutes of outages a year, court filings show. According to Machine Zone, the hosting service couldn't make it a month without an outage lasting almost an hour. Another in August of that year was traced to faulty cables and cooling fans, according to the publisher.
This discussion has been archived. No new comments can be posted.

Cisco's Network Bugs Are Front and Center in Bankruptcy Fight

Comments Filter:
  • by Anonymous Coward on Friday September 09, 2016 @12:39PM (#52855867)

    Never? That's all being outsourced, duh.

    • Plus, it's a foolish thing to do. It doesn't make any sense to forego a class like history or math just so high school students can learn how to configure a network switch. Save that as an option for trade school!

      Plus, there are plenty of students who don't want to go into that industry. They would rather be lawyers, bankers, artists, athletes, and those kind of careers really should stay out of IT.
      • "High school"? They insist that this stuff be taught in pre-school. On the other hand I see no problem if it replaces "women studies" in those ivy league colleges.

        point and click

      • High School is not to train kids. If it were then McDonalds, Walmart and AppleBees would be the top classes. That's not the proper way to educate the future of your nation.. stop being so stupid and argumentative just to be stupid and argumentative. One thing we don't need is more angry little trolls who want to hate on everything and be critical without taking the time to think or knowing how to use words well. We should have more coding and computer classes, which I think most people would call an IT cla
    • True, to a certain point, and also to a certain point, it's true that the more IT goes offshore, the worse this is going to get. Fortunately, it looks like some companies are waking up, and realizing that contrary to what the IT service provider is telling them, IT isn't "just following procedures, which anyone could do".

      • by umghhh ( 965931 )
        They need some time to learn it too. In essence any job is 'just following procedures'. Only 'following procedures' may be as simple as 'if A then B' or quite complex with quite some freedom to develop ad hoc 'procedures'. You need to have a good and motivated team and a good manager to do that. My ex boss always claimed that the cleaning lady could do my job too. I always agreed, pointing out that it could take the time to learn the 'process', languages used and the language spoken by customer (documentati
        • I think that's true in some cases, but I've met people who couldn't apply the scientific method to a problem to save their lives. (Form a theory, devise a test, perform test, collect results, revise theory.) There are apparently people who don't understand this at a fundamental level, and even learned as a procedure, are at a loss as to how to apply. System administration is a skill that not everyone has, and one that can't be taught to just anyone. You get outsourced IT on scratchy phone lines who when

          • I have encountered those types as well.
            They can *perfectly* follow a list of instructions, even those with branches, so long as those branches describe exactly what they see.
            Those people are invaluable in a HVM testing environment where it's:
            * Load trays in tester
            * Push run button
            * Unload passed and failed parts, put on appropriate shelves
            * If tester jams like picture A do worksheet FOO
            * If tester jams like picture B do worksheet BAR

            All is well. BUT if the tester jams and it's not like A || B they are

            • You are right, for processes that can be described in a small enough number of steps. You definitely don't want to take up the time of a big brain with lots of experience doing the same operation over and over. But I would submit that a modern Enterprise installation is just too big and too varied to expect any practical set of procedures to cover, say, 80% of issues. The fall back in my experience is typically to (a) apply patches, and (b) hope the problem goes away.

              I think this is why, when a company d

    • or at least a cabinet full of new plug-ready parts. that means the HDAs need to be pre-formatted, for instance. cables tested. configurations stored on a server for tftp loading behind your firewall.

      things that cost money. things that suits have no clue about.

      • "Suits" don't want to/need to know the details. An effective IT leader can communicate this in a way that good leaders (emphasis on "good") can understand. You shouldn't expect C-suite executives (outside of the CIO/CISO/CTO/Chief-IT-Leader-Guy) to understand or care about IT related concepts. Accountants don't expect a CEO to understand how a double-entry bookkeeping system was implemented or the details about its implementation, merely that it is a GOOD thing for the company to have because of reasons A,B
    • It will never hit the curriculum because schools could never retain IT competent teachers. As soon as they were sufficiently highly skilled to teach any sort of IT class that was relevant, they'd be off to work in IT, rather then remain a teacher.

      This is the exact same reason why companies don't train their (IT) staff. What is the point in spending money to make it easier for them to leave you?

  • by HornWumpus ( 783565 ) on Friday September 09, 2016 @12:47PM (#52855945)

    All systems have bugs, not all data centers have this kind of crap uptime record.

    Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails. You won't get this if you just hire A+/Net+ staff.

    Blame the PHB/CTO not the hardware.

    • by Salgak1 ( 20136 ) <salgak@speakea s y .net> on Friday September 09, 2016 @01:01PM (#52856079) Homepage

      Agreed. I've worked at places that kept five nines of availability. By "normal" IT standards, it was massively overbuilt: multiple sets of gear, clustered in failover mode, with a separate redundant setup elsewhere in the data center, on an entirely different power feed. . . (as I recall, we had at LEAST 4 independent power feeds)..

      We also had cabinets full of spare parts, entire full pieces of gear on the shelf, and an entire library of config files on the TFTP server. Plus duplicated on a laptop that lived in one of the cabinets. Took a LOT of labor and gear, and was not cheap,

      And we constantly had to explain the man-hour and spares costs to the suits . .

      • I work with and build them all the time. Mind you I realy no longer think you can get any complex service into 5+ 9's without the application being part of the solution, reliability is not a bolt on thing it's baked into the design.

        The story is laughable, fault fans, buggy firmware on the nexus 3k's. Those are TOR switches should be extremely easy to replace and always used in redundant setups. They probably got suckered into VPC and similar, guess what I dont care what they say all stacks share a single

        • by skids ( 119237 )

          They probably got suckered into VPC and similar, guess what I dont care what they say all stacks share a single failure domain, dont get me wrong they are great but you need at least A+B stacks.

          Yeah and reading release notes is an easy way to convince yourself that unless you need something like VPS or are cheesing license limits on a management platform, stacking should just be purged entirely from your configurations.

        • Until we stop lying to ourselves by calling them "bugs', as if they crawled into the code on their own, software will continue to be full of errors. "Oh, we'll just patch the bug" does not fix the underlying problem, which is coders and everyone else up the food chain having an "if we fuck up, we will say it's a bug and just patch it" and wait for the next bug.

          The Internet has contributed greatly to this mentally. It's a lot cheaper to have your users download a patch than to ship a CD, floppies, or tape

      • by ewhac ( 5844 ) on Friday September 09, 2016 @02:26PM (#52856985) Homepage Journal

        And we constantly had to explain the man-hour and spares costs to the suits . .

        Suit: "Explain the man-hour and spares costs to me."

        Engineer: "Certainly." (*brains him with a fried 24-port managed switch*) "Would you like it explained again?"

        • by suutar ( 1860506 )

          I was thinking "our contracts specify that if something goes wrong we have to fix it in under 30 minutes or we pay a lot of money. The only way to do that is to have the spare parts and people on site." but I like your method :)

      • Bad IT Directors like that screwed it up for everyone after them by bringing down upper management on IT. Now a lot of upper management doesn't trust or want to fund IT because they let IT Directors just run wild. It's a mutual mistake, but the IT Directors knew they were not being cost effective and that's their job. IT needs to think it itself as amp or turbo booster for business profits. Good IT brings massively increased production and automation. Good IT raises revenue so much more than it costs that
      • This is really the answer.
        Buggy or not, if you provide an SLA (service level agreement), then you are ultimately responsible for it.

        You do what you have to provide that SLA.
        Test the equipment you plan to use.
        Add a lot of redundancy and failovers ...

        SLA's cost money.
        Heck one silly line in the article is
        "The entire network often has to go down in order to patchâ"very disruptive in the best of times,"

        I really have to wonder what kind of network these guys are running. There should be failover nodes to tak

    • Yeah, this sounds to me like management overselling/overpromising and underfunding the back end IT staff and hardware.

      It just sounds like their network was not properly architected with redundancies, perhaps with the hope of getting the contract first and building up the infrastructure later.

      Then, when they can't deliver, they look for scapegoats... oh look, Cisco has some money... it's their fault!

    • by skids ( 119237 )

      Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails.

      That would always be a preferred model, if you have that kind of budget, but...

      Blame the PHB/CTO not the hardware.

      ...I'd say the equipment vendors should share some of that blame. I don't work anywhere near the 5 9's area and even I find some really appalling feature bugs introduced on even routine patchlevel upgrades. Stuff like the combination of DHCP-snooping/arp-inspection/source-lockdown on a port, which is the right way to configure access ports for anyone who gives a flip about security in depth, suddenly blocking all traffic after

  • by Salgak1 ( 20136 ) <salgak@speakea s y .net> on Friday September 09, 2016 @12:49PM (#52855967) Homepage

    "Disclaimer of Liabilities - Limitation" Page 16, states that (condensed) : all liability shall not exceed the price paid for the software, or of the price of the product which includes the software.

    And to use the equipment and Cisco software, you agree to the terms of service.

    http://www.cisco.com/c/en/us/t... [cisco.com]

    So, at best, they can recover the costs of the switches involved. . .

  • This is timely (Score:5, Interesting)

    by roc97007 ( 608802 ) on Friday September 09, 2016 @01:01PM (#52856075) Journal

    I'm a photographer, and I sell my work through a web service. They bring together the finishing providers (prints, calendars, t-shirts, etc) and take care of payment, and all I have to do is provide content and manage sales. When I finish post-processing on a new photo, the tool I use (Adobe Lightroom) automatically uploads to the web service in the album I select. I cover events, so there's often a massive number (600 or so) of photos to upload.

    Yesterday I was getting sporadic "service not available" messages from the service. After doing some triage to verify the problem was not at my end, I contacted customer support. Mind you, this was 10:30 PM PST. But that's the way it is with photographers -- we often take photos during the day and process them at night, which is somewhat the opposite of a standard use case. (And should be borne in mind when said services schedule maintenance. Just sayin'.)

    Browsing the service's forum, I saw others were seeing the same error message, and people were starting to get excited. (This is our livelihood, after all.)

    I got an answer to my service ticket in less than 30 minutes, that they were struggling with with network problems with one of their service providers (probably a cloud service). I got a followup shortly after that they thought the service was up now but they were still testing. And I got another followup at 6:30 AM that the problem had been resolved and they had put steps in place to insure it would not happen again. They also implemented a "status page" that we could consult in the future (which should have already existed, but live and learn).

    Now, *that's* the way to handle an incident like this. Very commendable. But it does point up the problems a business sometimes has when they rely too much on external services. Just my opinion, but the main difference I can see between in-house and outsourced is one of motivation. If you're providing an online service, your employees realize in their heart of hearts that outages can easily result in business failure and loss of jobs. But if you're renting all the pieces of your service from outside vendors, you soon find that those vendors may be concerned about their contract with you, and the money they make off you, which isn't at the same level in the hierarchy of needs as the live-or-die situation you are in.

    • Reply All podcast had a good episode on a cloud photo provider that had massive tech problems [gimletmedia.com] and people lost contact with their photos. Worth a listen.

      Cloud just means a computer you can't control.

      • Absolutely true. I pay for a professional account on the service I use, and one of the bennies is the opportunity to keep all my original images on their cloud so that I don't have to worry about storing them locally or backing them up.

        No. Not only no, but Hell No. Not on your friggin' life. Ok maybe as a backup, but putting the only copy of an original photograph on *someone else's* cloud? It is to laugh.

        My original images reside on a local hard disk, periodically backed up to *another* hard disk, whi

  • by Kwyj1b0 ( 2757125 ) on Friday September 09, 2016 @01:09PM (#52856143)

    Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?

    Good, bad, or ugly, is it time to admit that business can't really continue without Patents/Accounting/Negotiations/Advertising/Sales/1000 other things?
    When will patent law/banking/economics/marketing of these become formal curriculum in schools? That's about the time when IT should become a part of the formal curriculum as well.

    High school shouldn't be about training for a job that only a fraction of the students will eventually do. If businesses can't survive without IT, then they hire people who are specially trained in IT - a HS course won't be train people enough to solve any hard IT problems anyway.

    • by Anonymous Coward
      Now that ITT Tech has been closed down, we're expecting a massive shortage in IT talent.
    • Well, to be fair, almost all professionals will touch a computer of some kind at some point.

      So learning *generally* about IT is not a bad idea. Basic principles of how a computer works, how networks communicate, how the Internet functions. All good things to learn.

      • Explain to me how a shoe functions.
        • It prevents the PHB's foot from being covered in feces when he puts it up some IT dweeb's butt after unscheduled downtime.

          Last time that dweeb will ever try and tell a manager 'I told you so'.

        • I don't know, I kind of feel like I have understanding of shoes. I've even built them. Lousy shoes to be sure, but I think I get the basics.
          • I meant to indicate that most people don't understand the biological support mechanism of the foot, and how that interacts with shoes. The arch of the foot and the ankle work together to provide shock-absorption, protecting the knee and hips; a shoe provides arch support, and how and to what degree it supplies this support affects the health of these anatomical components. So does the shoe's profile (raised heel?). The sole is made of complex rubber compounds which affect traction (e.g. some shoes can w

            • I don't think shoes are about support....that is a secondary (and probably rather recent) problem. The primary purpose of shoes is to protect your feet from the environment.
    • by CharlieG ( 34950 )

      I agree it probably does not need to be part of schooling, but there are a lot of businesses that STILL treat IT like an unimportant part of the company, right up there with Janitorial Services etc, instead of what is really a mission critical part of the company

      • Do you guys not remember High School? GYM is a fucking class. You're sitting here, like argumentative clowns, telling people we don't need more computers classes in high school because that's not real education or that's not going to payoff? Yeah but Home Economics, Gym, Cheerleading, Marching Band, shop class are all totally practical. If you want to TRAIN people to fill a role instead of educating them to live up to their potential, you have a serious lack of understand of how society really works and t
        • by CharlieG ( 34950 )

          I know very very few schools that even OFFER shop even as an elective (My son's school does - one section, and it is mostly an engineering design class - they spend a lot of time on FEA etc - more computer time than tool time). Offering Computers? Sure!! Mandating it? Nope.
          Seriously, I also know of no school that offers Home Ec, Cheerleading, Marching Band as CLASSES - After school activities? Yes. Gym ? Yeah NYC requires it, but you'll find more than 50% of the kids don't take the required amount th

      • by swalve ( 1980968 )
        How about getting some industrial engineers into IT to bring some kind of order to the fat, zit faced chaos? Until then, we are just worse smelling janitors and plumbers.
    • We should be teaching coding, especially scripting and automation in school. They are universally useful in all fields. Many fields WISH they had more field specific coders.. aka engineers who are also coders or scientists who are also coders. These people can help you build better apps faster than most anyone else. Since most of the important professions use computers we want to train kids to be good at computers and coding. Robotics, Automation and Coding is the future. Business and management is still
  • IT training? (Score:4, Insightful)

    by ilsaloving ( 1534307 ) on Friday September 09, 2016 @01:31PM (#52856391)

    People will figure out IT training is important, when they realize that they can't make stupid statement like "IT training" as if it means something.

    What even IS I.T.?

    Are you talking about server management? Network Managment? DevOps? What skill sets do you need?

    It's like saying we need more brain surgeons, so we need MOAR BIOLOGY TRAINING!

    MBAs, or people in general, will never appreciate just how complex some work can be, because of Dunning-Kruger. They don't know or understand how complex IT is, therefore they are unable to *appreciate* how complex IT is. Just like they are unable to appreciate anything else that is complicated, whether it's medicine, physics, etc.

    • by umghhh ( 965931 )
      This has not much to do with D-K: your job is simple and I do not understand why it takes so much time and why you failed to deliver on time etc because it is so easy etc. - because I have never done it. Also: you are stupid a-e because you do not appreciate how difficult and tricky my current task is. These two work most of the time. There is also a genius part, I had few of those in my team some years back - really good developers. They could not understand the whole idea of a bigger team. I took our git
    • IT is the burger-flipping job of the future. It's the people who rack switches, wire up networks, and generally do grunt work. High school students are IT worker stock.

      This is Slashdot, where we confuse Computer Science--the elites, the engineers, the mathemeticians, the people who can't do their job because they don't know linear algebra--with IT, simply because most burger-flipping server jockeys have figured out how to use awk and perl, kind of, and think that makes them the same thing as a software

      • Wow. That's a bleak and insulting picture of both the future and of slashdot. And you're clearly suffering from Dunning-Kruger -- or more accurately, the rest of us find you insufferable due to Dunning-Kruger.

        I admit, I awk and perl pushed everything about linear algebra and a whole boatload of other things I learned in school out of my brain. But that doesn't make me stupid -- I simply know practical things for my particular and current situation. I have no doubt I could pick up linear algebra quickly shou

        • It has to be wrong to be ignorant; and it has to be at least almost-true to be insulting.

          Someone on Slashdot started talking about IT degrees when I mentioned self-studying Computer Science. If you've ever looked at the two programs, you'd facepalm at warp speed: IT degrees include "general IT", network administration, and IT security; Computer Science is essentially a mathematics discipline based on exploring what can be computed (and how). They're not really equivalent in any sense.

          You talk a lot a

      • by swalve ( 1980968 )
        And we have done it to ourselves. We used to be the guys in the IBM suits that worked for NASA. We rebelled against that and now we are cable tv installers.
        • I didn't say it was a bad thing. Why do you think cell phones cost so little now? Just cell phone service for 2 hours of voice per week would cost $550/month right now if it had only followed inflation since 1984 (the first commercially-available cell phones cost $4,000--over $9,000 in today's dollars--and yes, that's what the rates were like).

          You have to remember: businesses don't pay wages. Businesses get revenue from product sales. In the end, what isn't taken by profit or taxes somewhere along t

    • So you think our future CXX types couldn't have used some basics in ITIL when they are younger? Certainly would have served some of them better than a logic class.
  • by bmk67 ( 971394 ) on Friday September 09, 2016 @01:49PM (#52856569)

    If they're contractually bound to deliver that sort of uptime, and their system isn't designed to tolerate these kind of failures, they deserve to fail.

    • To a degree. What if there is a serious bug or hardware flaw from a sourced component. Remember when HP bought motherboard components (faulty capacitors - from a supplier who had tried to steal the code from another company and had stolen fake docs) about 10 years ago? Their laptops and desktops had about a 40% failure rate in the first year as a result. Is that on the consumers shoulders to have purchased a machine with bad motherboard capacitors that were sourced by HP? They should have met the specific

      • Bad Chinese electrolytic caps were _everywhere_ for a few years there. It's been more than 10 years, about 20. Tempus fugit.

        The world continued to turn. Not everything was replaced in the bad cap window, not every OEM pinched that penny and not all bad caps failed when new.

        • by swalve ( 1980968 )
          It was also shitty engineering, or at least that's my position. You notice that motherboards don't have banks of capacitors anymore, don't you?
          • Not sure I can recognize a surface mount cap at a glance. I bet they are still there. Sure the busses are better tuned, but at the end of the day, caps fix many things.

      • Whilst I agree in principal, in practice, we know that the 5* service offered by a budget provider will not be equal to the 5* service offered by a reputed provider.
        A decent datacenter wouldn't be taken down by shoddy cables & ventilation.

        There's no problem with choosing a low-cost datacenter...as long as you factor that into your infrastructure design and put the saved money into redundancy. Done right, spreading your risk over several low-cost options can provide a stronger service than putting all of

      • So the consumer is at fault for expecting lower failure rates?

        Depends on the failure rate. Considering high end server hardware has a very low failure rate and yet in critical applications with contractual uptimes they still get clustered or put in redundant pairs, yes I would say it still is the consumer's fault for not being able to handle the failure gracefully.

  • Sounds like the suits took a contract but did not want to pay for the back end infrastructure to really support it.

    I can't tell you the number of times I've seen this mentality -

    From Banks to Airlines to Healthcare to "Service" Providers....

    Usually it seems to be a combination of cheap C-level people and a layer of "yes" men between them and IT.

    Unfortunately the deciders in chief don't feel the pain when deals like this cause the company to implode....

    • by swalve ( 1980968 )
      It seems to me that the C-level people usually just say "do this thing for the least amount of money" and it's the boot lickers in the middle who really louse things up by not having the smarts and/or balls to actually get the thing done.
  • We really just need more coders and engineers, for now. AI will replace a lot of coders too. The continued move to the cloud will replace a lot IT personal. We always need good managers and customer relations/sales if you prefer more stable fields. Coding can be hard work for the money. I don't think people get that. Most IT is a lot of ass sitting waiting for something to happen. Coding is more like real work unless you own the product and can mostly do bug and feature requests. If you planned to work for
    • by swalve ( 1980968 )
      You can't say that and then also say things like "ALL code has mistakes!" Want more money? Be better at your job.
  • by T.E.D. ( 34228 ) on Friday September 09, 2016 @02:42PM (#52857169)
    The basic moral here is not to bet your life on systems you have no capability of fixing, if you can at all avoid it.

    The company’s Nexus 3000 switches began to fail after trying to improperly process a routine computer-to-computer command, and because Cisco keeps its code private, Peak Web couldn’t figure out why.

    ...

    Finally, late in October, came the 10 hours of darkness. Three people familiar with Peak Web’s operations say the lengthy outage gave the company time to deduce that the troublesome command was reducing the switches’ available memory and causing them to crash. The company alerted Cisco.

    So they ended up black-box debugging the vendor's own problem for them. I wish I could say I am unfamiliar with that...

    • by swalve ( 1980968 )
      What "routine" computer to computer command? It's a fucking switch.
      • After taking training for Cisco certification I can think of many such commands. These switches are not the kind you find at Best Buy. These switches will communicate with other devices on the network about how to route traffic. Ethernet does do routing much like how IP does routing, just at a different layer. For this routing to be efficient every device needs to know something about where on the network the other devices are located.

        An equipment failure, a poorly planned network, and improperly traine

        • by swalve ( 1980968 )
          Which has fuck-all to do with computer to computer commands, whatever those are.
          • Is that how you think you should respond to people trying to answer your questions? Is it that hard to comprehend that an Ethernet switch is a kind of computer? And that to make sure that the network is maintained that this computer needs to send commands to another similar computer? And do so "routinely"?

            Therefore, if the switch fails to respond to those commands, and the network was not planned well, and competent people aren't there to fix the problem then the the network fails. Which is what brought

      • by T.E.D. ( 34228 )

        One obvious possibility are standard routing/switching handling messages like ICMP [wikipedia.org] and IGMP [wikipedia.org]. The former is used for all kinds of routing and error reporting purposes, and the latter for helping equipment keep track of which IPs need which multicast messages. ARP is perhaps technically another. That's the protocol network hardware/drivers use to map IP addresses to hardware MACs. But there are all kinds of other messages going around down under the application layer.

        These are the kinds of things you'd expec

If you want to put yourself on the map, publish your own map.

Working...