Failed Win XP Upgrade Wipes Out UK Government Agency 731
Lurker McLurker writes "The BBC and the Register report that the UK Government's Department for Work and Pensions attempted to upgrade seven PCs from Windows 2000 to Windows XP, and ended up with BSODs on over 60,000 machines. I wonder if the National Health Service is regretting awarding Microsoft a £500 million contract now." The Guardian also has a good story.
This is typical of our government. (Score:5, Insightful)
I can imagine it now
Intern: "Sir, Microsoft have bought out Windows XP Service Pack 2. It's had numerous bug reports of dying pcs and software not working anymore. THIS is the time to upgrade to Windows XP, then upgrade to SP2 because windowsupdate won't stop bugging the hell out of us until we do!"
Boss: "You mean we could cock something up, and it might not even be our fault for a change?! Lets pay someone vast amounts of money to do it!"
The Gaurdian reports it was a week long outage. Now, I may be completely wrong here, but surely all they had to do was restore those pcs back to their previous Windows 2000 state using the daily backups they do... I mean, it's only common sense to do backups on such a critical syst...oh, wait, nevermind.
</cynical>
*sigh* (Score:3, Insightful)
Come on now (Score:5, Insightful)
Local Government (Score:2, Insightful)
Although we have several xp boxes (mainly used by my development team (along with Windows 2k Pro ones)), there is no way this IT department is going to roll out XP across the entire authority (approximately 400 machines) until at least Mid quarter 2005, there are far far too many problems to even contemplate it.
Heck, half the staff haven't even figured out the difference between a wallpaper and screensaver yet, yet alone giving them more fancy gadgets.
Contractor (Score:3, Insightful)
The installation and update of operating systems is so easy any more, a blind one armed monkey masturbating could do it.
I've worked with EDS people, and the one armed monkey would be a godsend compared to most of them that I've had the "fortune" of working with...
Re:Too slow. (Score:5, Insightful)
Re:Come on now (Score:2, Insightful)
Re:EDS again (Score:5, Insightful)
Because Accenture is the other choice!
This sort of cockup would have been impossable with the ex Arther Anderson crowd. They would still be struggling to get the shrink wrap off the CDs without wrinkling thier suits.
Seriously the problem is government procurement procedures. The contract goes to the lowest bidder and a record of past f****ups is not taken into account.
Re:EDS again (Score:2, Insightful)
Re:Not a nail for Microsoft. (Score:3, Insightful)
You guys are amazing! (Score:3, Insightful)
I wish I could take one of you Linux "experts" up on your idea. "Here, upgrade these 2000 PCs, all of which are from different manufacturers and different configurations, to Linux. I need it done in the off hours and I need everything to work like it did before.".
*crickets*
Of course someone will reply and say "ok!" knowing it won't happen. It's not because I don't have the ability to make that decision but it's because I know better than to get real information/insight about IT from most /. posters.
It's painfully obvious that a scant few here actually have a clue about running a business that relies on IT. It's more than ripping CDs and DVDs kids. Sure, the company that did the mistake is at fault but the problem is not in the chosen OS, it's in the chosen technicians and management.
Re:Not a nail for Microsoft. (Score:5, Insightful)
Doesn't anyone do risk analysis anymore?
What a big surprise (Score:3, Insightful)
"This patch caused the desktops to BSOD and made recovery rather tricky as they couldn't boot to pick any further patches or recalls. I gather that MS consultants have been flown in from the US to clear up the mess."
So, even more of the money I pay in tax is being diverted to M$ then...
Re:Too slow. (Score:5, Insightful)
Now, assume Microsoft bails EDS out, and there is no reason why not, because you can bet they'll send a bunch of temps to every DWP office at EDS' expense if they have too. In a nutshell, Microsoft gets a PR coup: "We've just bailed out out a leading *cough* solution provider! Now imagine that had been, say, a Linux deployment... Who could EDS have called then?" Given the excellent grasp of PR, spin and FUD Microsoft has, I don't think this is going to help break the Microsoft stranglehold at all.
Re:TCO costs rise scarily with Windows XP failures (Score:4, Insightful)
I once knew a bean-counter (quite senior) on nearly 3 times my engineer's salary. He was sat there in front of a spreadsheet adding up a column of numbers on a pocket calculator.
Welcome to the UK Public Sector. That was your tax money.
Re:Uh-oh... (Score:5, Insightful)
I'm sure the government has perfectly good reasons for continuing to hand contracts to EDS. It's just probably not a reason they want to tell you because it involves (bribery|nepotism|stupidity|all of the above)
Jedidiah.
Re:Another nail? (Score:2, Insightful)
Re:Not a nail for Microsoft. (Score:5, Insightful)
It's true that Microsoft's robustness is rather mirage-like, but there's a thing called human error, and that can bring down any system. All the software did was follow human instructions, after all: that's why we need IT people with brains to decide who is doing what.
However, PXE boot and a server with HDD images ready would've been helpful...
Re:Too slow. (Score:3, Insightful)
It's still Microsoft's fault, because they designed a system that accepts updates for the wrong system, and after that update is installed, it's damned near impossible to back it out. EDS has fault here too, but let's face it, they couldn't have screwed the pooch nearly as well with a non-MS based system.
Hey! let's be fair here, ok? (Score:5, Insightful)
So, admin stupidity can also be blamed on MS, it's part of the TCO studies that make the decision to buy MS.
Aside from that, a point-and-click update cannot fail so miserably. A script made by the admin, of course should, because you can assume that someone smart (and bold) enoguh to make a little script should be responsible for their decisions. Some guy clicking checkboxes shouldn't be allowed by those means to break 60000 computers, through a
GUIs for dummies should have enough checks to prevent such underiable effects, they have a sufficiently constrained domain to be able to do so. If the guy wanted to do a legal task that the tools dosnt' allow, he could always write some Visual Basic Script, and then he would be on his own. Bringing down an organization by mis-clicking checkboxes is responsability of the guy that provided the checkboxes, too.
Re:Hope they all loose their jobs tomorrow (Score:5, Insightful)
They emigrated, most likely. One of the problems with incompetence is that it's self-reinforcing, the competent get more and more fed up with having to deal with incompetence all day and find something better to do with their time.
Re:You guys are amazing! (Score:5, Insightful)
You know that (re)installing Windows on a large number of systems of different types, for example when an upgrade fails, is a total fucking nightmare, yes?
At least Linux comes with 99% of drivers pre-installed. With Windows you have to find them on the net first, then find some way of getting them to the target system (because you don't have a NIC driver, remember?).
Re:Too slow. (Score:5, Insightful)
They could have called Novell or IBM.
Apart from that though - any setup can be screwed-up by an admin, no currently available OS can protect you from that. So for a TCO estimate at least we would have to look at the total loss due to screw-ups like this, and weigh them with the number of installations. Using a single data point can't be valid. That said, my gut feeling is that Linux provides considerably better TCO.
Re:The reason for the upgrade (Score:5, Insightful)
Looks like they got a deal; they got the version that also blocks viruses, worms, and abuse of Solitaire!
Writing article about Free iPod [tinyurl.com]. Please help out.
They probably wanted to block assholes who disguise 'Free iPod' links in the sigs. 'TinyUrl' my ass. If you want an iPod, ask your parents to raise your allowance. Otherwise, I heartily encourage you to fuck off.
Re:What should be done first... (Score:3, Insightful)
Re:Another nail? (Score:2, Insightful)
Re:If this was in the private sector... (Score:3, Insightful)
All systems are prone to failure (Score:2, Insightful)
We've got to educate the people spending our money on large computer systems to spend part of that money on more testing!
Re:TCO costs rise scarily with Windows XP failures (Score:5, Insightful)
Yes. It's not like the upgrade could detect the version of the program it's being applied to, and only install if the version matches the version it is intended for. That is completely unheard of, and would be impossible technically.
This was sarcasm, FYI.
This situation is more analogous to a wrong signal causing the door to open and then jam. And yes, such a door manufacturer deserves to be blamed.
Re:Too slow. (Score:3, Insightful)
Somebody else in the thread mentioned this - if you overwrite your Linux kernel with a botched version, your system's hosed. If you didn't keep a backup, it's damned near impossible to back it out.
Nobody can protect an incompetent admin from him / herself.
Re:We need to educate the decision makers (Score:2, Insightful)
Re:We need to educate the decision makers (Score:2, Insightful)
This is one we should lobby our representatives on to ensure they don't do it at all [no2id.net]. The fact that they will piss away several billion quid of taxpayers money is by-the-by when there is no reason other than sheer control-freakery to want this database in the first place.
Re:This is typical of our government. (Score:3, Insightful)
or have I missed something?
Perhaps I'm just missing something here.... (Score:5, Insightful)
Re:TCO costs rise scarily with Windows XP failures (Score:3, Insightful)
However, I don't know any reports which consider Total Cost of Ownership Assuming Your IT Department Is A Bunch of Blathering Idiots. Most seem to assume a certain degree of competence.
Re:Hope they all loose their jobs tomorrow (Score:5, Insightful)
Re:The reason for the upgrade (Score:2, Insightful)
If you are seriously writing a article on free ipod crap why don't you link to a page that explains a bit about your project. Otherwise you're not different from the 12 year old kids you are internet begging for ipods (I'm assuming that your older than 12, could be a bad assumption).
Will you give away the ipods you get, or will you keep them? With Xmas coming up there are a lot of poor childern out there who are going to get any much else other than AOL and Live linux CDs.....
And for the record... the article is not going to have favorable things to say about the free ipod experience.
Is you writting going to be bias from the start?
Or are you writting about how the free-ipod fad is causing a lot of REALLY ANNOYING internet begging. "I want an ipod, please give up your privacy so I can have an ipod". Please note that "I want a free-ipod so I write bad things about free-ipods, please give your privacy so I can an ipod" is in no way any different.
Oh wait, I've seen a post where you accuse somebody of being home from highschool, you're most likely 12 years old.
Sheesh! Really... get a grip!
Sheesh! Really... get a grip!
Re:Another nail? (Score:2, Insightful)
The usual patches from WindowsUpdate do detect operating systems. If that was the case it looks like someone rolled their own patches (easy to do, you can extract the patches from the windowsupdate MSIs, then bundle them into 1 file) and didn't do an OS check.
Re:TCO costs rise scarily with Windows XP failures (Score:5, Insightful)
Given, they should actually have an install script that checks the OS before it actually dumps the install package on there, but hey.
Not normally an MS apologist, but this isn't really Microsoft's problem. It's the contracted company that made the update package failing to ascribe it to the right download group.
So, the analogy. It's like some perfectly good system being installed, and someone presses the button marked 'open all doors' instead of simply open door 7.
I don't see anyone really blaming the door manufacturer here (Microsoft or the contractors), although I'd hazard a guess that the person who skipped over the part of the process that said 'double check the groups you assign this patch to' will be sorely chastised...
Re:EDS again (Score:3, Insightful)
Please realize that I'm not defending them. I'm just pointing out that, as someone who works in IT, management never sees it when things go flawlessly, but they will not hesitate to throw your ass to the wolves should something go wrong.
Re:RTFA! (Score:4, Insightful)
Any decent Windows Admin should know (Score:5, Insightful)
This is first day stuff.
Windows? Or EDS? (Score:3, Insightful)
As much as I'm sure the zealots among us would like to make this seem like a Windows failure, it looks like it's more of an example of how outsourcing leads to disconnected, incompetent, and unmotivated IT staff. And that, of course, leads to mishaps like this.
Either way, if you work for a company that brings EDS in house in any way, drop your shit and run. And don't look back. The flash could be blinding.
Re:TCO costs rise scarily with Windows XP failures (Score:5, Insightful)
In my experience (having worked for both) in terms of inefficieny and stupidity, there's only one thing worse than the British Public sector and that's the British Private sector.
My company used to be part of a large public sector concern and was sold off. Since then we seem to spend nearly of our time/money:
Changing company logo and name every 6-12 months
Adding a new problem management system which we have to learn every 6 months (we currently have about 5 each of which was supposed to replace all the others).
Paying huge bonuses to upper managent.
Paying huge car allowances to middle management including those who refuse to drive.
Not giving any rises under the so-called performance related pay scheme for 4 years despite meeting profit targets because all the money has gone on the above 2 items.
Making skilled people redundant then recruiting at vast expense people with the same skills 2 months later.
Making skilled people redundant then reemploying them at twice the pay as contractors for the next 2 years because they're still needed.
Repeatedly shuffling kit from datacenter to datacenter around the country at vast expense and disruption to our customers.
Ordering expensive buffets for management meetings , 95%+ of which get thrown away.
Managers having a schedule involving meetings all over the country which means that they spend about 25 hours out of 40 driving.
Managers refusing to use video-conferencing for meetings even in the light of the above.
How many of these things happened when I was in the public sector? Virtually none. We didn't have the money to throw around on such things. We were forced to be efficient.
Also if this private sector company I'm referring to was atypically inefficient, presumably it would do so badly it would collapse or be taken over. So this implies that many private sector companies are like this.
It's very easy to slag off the public sector if you use stereotypes, generalizations and distortions.
Fundamental Architectural Issue Here (Score:3, Insightful)
The fundamental error here is deep seated and architectural - they have 80,000 user interface devices which are stateful. By putting the wrong device on the desktop they have set this situation up.
In the olden days when clerks in government agencies used green screens this problem wouldn't happen. If a green screen failed, it would be replaced as a FRU. Today's equivalent is something like a SunRay - the user interface device holds only enough configuration to bootstrap itself and, again, is a FRU.
The situation at the DWP is different: the user interface device is a stateful device which holds configuration itself, and requires this configuration to be consistent before it gets enough connecticity to be remotely managed. The toolkits discussed, which are used to push config around these UI devices, are probably most excellent, but there should be no need for this sort of mularky.
So while I don't necessarily blame Microsoft for this incident, I do blame them for creating a monoculture where this sort of architecture is deployed. I expect the trials underway in government using SunRay devices as the user interface will be watched with more interest after this debacle.
A final question - how on earth do DWP recover 60,000 unbootable PCs?
Re:All systems are prone to failure (Score:5, Insightful)
If someone can manage this by selecting the "wrong checkbox" then the system is broken by design.
Microsoft sell a complex system with the claim idiots can administer it. The DWP employ/contract idiots to administer a complex, but vital, system. Niether of these are "innocent parties".
No choice for specific industries (Score:1, Insightful)
It's about processes (Score:3, Insightful)
It's this willingness to say "Localised error. That's all. Nothing to see here" that gives IT it's bad reputation. With properly designed processes and appropriate tools, localised error cannot have catastrophic consequences. In a system like this, I can see no excuse for pushing something out to 60K desktops in a nightly update without at least one, and probably both of:
a) Pushing it out to (say) 600 representative desktops a night or two before and monitoring
b) Having a cast-iron, regularly practiced and tested, process for pulling it back again.
Look at somewhere like SEI who make the Space shuttle flight control software. It cannot go wrong and it doesn't. Why, because they have processes! There are checks and testing and simulation and code walk-throughs and whatever, and if a problem NEARLY makes it through, and is caught in late testing or whatever, there are processes to look back and see how it got that far and make sure that the processes are improved so it doesn't happen again. The process writes the software and the people carry out the various roles prescribed by the process. There are processes for monitoring and improving the processes, etc.
Lower TCO = Lower Cost of Labor (Score:2, Insightful)
Re:TCO costs rise scarily with Windows XP failures (Score:3, Insightful)
Company polict stated that everyone should always turn off there PC's when they left for the day and you'd get moaned at if you didn't. The Radia team told everyone they must keep their PC's on at all times but this was never company policy.
Every morning it would take 20mins or so for Radia to install all the nights patches and reboot the PC's a couple of times. At random times during the day it would also reboot your PC automatically for you if you didn't notice what was happening quickly enough to stop it.
Various PC's were being used as servers but not offically classed as such ( due to the excessive hurt and pain involved in that process ) and they also would reboot themselves randomly cause outages on whatever they were doing.
Some PC's were still Windows 95 and Radia would never manage to install anything on them, just keep crashing, rebooting indefinitely.
In the end I managed to delete enough of it that it stopped working and gave me some peace of mind.
I think the lesson here is not to just deploy cool new tools willy nilly without assessing their place in your working practices.
Re:FAT CLIENT (Score:3, Insightful)
Cheers,
Adolfo
Re:The reason for the upgrade (Score:5, Insightful)
But a lot of them don't. I would say most state employees work their asses off doing pointless things, rather than screwing off. The problem is more with upper management than with the rank and file... though the problem does bleed over into the lower level employees because, after all, how long can you pour your energy into a task that you know only is neccesary because incompetent managers fail to streamline the operation and give you more real, productive work, before you start to take the job much less seriously?
So those petty state officials who shirk work do so as much due to being beat down, disillusioned, and tapped out as far as trying to do something about it in the face of a "front row" that doesn't like to listen to comments from their inferiors.
When I was working for the state, I considered myself very lucky to be involved in a project that was doing something meaningful, being productive and, while mistakes were made here and there, was relatively efficient overall. I could see how this was not the case in the departments working beside ours.
Eventually, though, the egos of the upper echelon managed to intrude even into our well defended (by caring managers) little island of fortitude and competance, and I had to say screw it. Now, unlike most of the rest of my friends that got laid off and sucked the government unemployement insurance tit, I am fending for myself with the money I saved by not buying useless crap.
So when people try to say I was overpayed at 60% of my fair private-industry salary, I don't shirk from the criticism. Yeah, the benefits were better than the private sector and the environment more permissive, but at least I didn't go looking for a handout like others so they could keep up the credit card payments for their DVD collections and car loan for their gas guzzling S.U.V.
At least I, one of those loathsome, lazy, state workers, had the good conscience not to apply my talents to better the carreer of a gaggle of idiots who aren't overseen adequately by the legislature that created their positions. If you want the state sector fixed, aim at the top. The clock punchers at the bottom are just a symptom of a management that preserves itself by not giving their underlings enough of a reason to revolt.
Re:The reason for the upgrade (Score:2, Insightful)
And you didn't see that coming because... ???
Re:We need to educate the decision makers (Score:3, Insightful)
Why? If British people can be encouraged to interfere with the American political process [csmonitor.com], then why can't Americans do the same to the Brits?
They shouldn't have upgraded (Score:3, Insightful)
Re:This is typical of our government. (Score:3, Insightful)
Guess what? EDS chose to do it themselves using a third party product rather than use the much more mature and safe existing update tools.
Now who's fault is that?
Re:EDS again (Score:3, Insightful)
The idea that you pay one company to come up with a box of requirements then send it out to tender, and get several boxes back from a few large companies like EDS. Then these get send off to the company contracted to deal with the subcontracting/tendering process. A haggling process commences between bunches of lawyers on both sides resulting in usually only one or two possibilities the cheapest one is then selected and fucks it up. Now a days most reputable companies don't even tender a bid cos of the cost and the fact they know it will be wasted money cos some company renowned for their failures like EDS will just undercut them.
My particular favourite was penalty clauses against downtime for an NHS system were introduced due to the fact that the system was so critical. But the company involved rather than implementing a backup system decided it would be more cost effective to ensure against a system failure.
Perhaps one day the out-sourcing-sub-contracting-legal-wrangling craziness will stop.
Re:It's about processes (Score:3, Insightful)
Excactly, and IT earned every bit of it. No one wants to pay for processes, no one wants to expend the extra effort for processes, and no one does. People in IT are more comfortable taking the intellectually lazy route and, because it works 80% of the time, they become quite comfortable doing it. For that other 20% or whatever, they figure out how to rationalize it as a "software glitch", even when it is their own fault, but the people they are explaining it to are so ignorant they will accept any explanation as the absolute truth. Management in IT must be the most ill-prepared and gullible bunch in any industry anywhere. The fact that accountants can't even match up trends in hardware and software costs with associated labor costs doesn't help (who here is still working on a 300MHz Pentium II with a buzzing hard drive and a 60Hz monitor, when the new hire in the next cube gets a 5GHz gold-plated dream machine who then wonders why you can't run their favorite dev tool of the week?).