Software Upgrade Crashes UK Air Traffic Control System 233
pitpe writes "Earlier today the computer system controlling most of the UK's airspace failed, after tests in preparation for an upgrade failed. The original failure occurred at the West Drayton centre, which is an old (70's) system, as opposed to the new system at Swanage, which has had its own problems. A system wide reboot to fix the system resulted in the entire system being taken down temporarily."
alternative system (Score:1, Funny)
Damn... (Score:2, Funny)
Re:Damn... (Score:3, Funny)
Re:Damn... (Score:4, Funny)
Hmmm .... (Score:2)
Run away
=)
Re:Damn... (Score:3, Funny)
It looks like you're trying to direct a pilot to land a plane. Would you like to:
- Have the pilot land at Cirque du Soleil and tell him it's Denver International Airport [ntu.edu.tw]?
- Redirect the plane to Chicago, but send the luggage on to Orlando?
- Adjust the ground level to send the plane and all aboard to fiery doom like that scene in Die Hard 2?
Re:Damn... (Score:2, Funny)
"Microsoft Air-traffic Control Software?" Shudder...
I can see it now...
Pilot: Air Traffic control (ATC) we've lost our nav con please advise!
ATC: Please remain calm and click on start then control panel. Once in the control panel double-click(TM) on game controler - we need to reconfigure your yolk before we enter admin mode on the nav con system.
Pilot: I don't see a |"start" button.
ATC: Ok sir, I need you to find the windows key on your systems keyboard.
Pilot: Whats a windows key?
Re:Damn... (Score:2)
Would that be Macs you're talking about then?
Re:Damn... (Score:4, Funny)
"I was directing air traffic on my ATC computer, giving a flight path to an KLM flight into Heathrow, when suddenly it was like *beep beep beep beep* and half of my planes were gone.
"I was like, 'Uunh?'
"It DEVOURED my flight path.
"It was a really good flight path.
"And then I had to send it again, and it wasn't as good 'cuase I had to do it fast. And the KLM crashed short of Heathrow.
"It was... a bummer.
"My name is Ellen Feiss, and I'm an air-traffic controller."
Ph33r.
Re:Damn... (Score:2)
Actually, I believe they were using Microsoft Air, aka Longhorn.
Re:Damn... (Score:2)
Re:Damn... (Score:2)
Three fingers (Score:3, Funny)
Re:Three fingers (Score:4, Funny)
this is a UNIX system, I know this!!!
Re:Three fingers (Score:4, Informative)
Re:Three fingers (Score:2)
Software doesn't rust... (Score:5, Insightful)
Re:Software doesn't rust... (Score:2, Informative)
Re:Software doesn't rust... (Score:5, Insightful)
What is implied is that its being pushed to its limits. e.g. it was designed for 100 flights a day, when today there are 1200 flights a day.
Those small things which you could get away with before start to become factors in usability and stability.
Re:Software doesn't rust... (Score:4, Insightful)
(Which I find strange, cause testing in a system as critical as this should be done in a separate environmnent.)
I assume you've had no previous experience in maintaining a 'vintage' system like that? The code is probably written by a lot of different programmers, each with his own style, poorly documented, and thus very hard to read and understand.
Software doesn't rust, but it clutters up and gets dirty over the years. It won't come apart by itself, but by the hands of a developer writing a necessary upgrade.
maybe they don't have a 30 year old spare (Score:2)
Re:maybe they don't have a 30 year old spare (Score:2)
If nothing else, they just proved Finagle's Law... If something can go wrong, it will...
Re:maybe they don't have a 30 year old spare (Score:2)
Deja vu - this just happened in Ireland (Score:2)
Re:Software doesn't rust... (Score:3, Insightful)
Companies who develop software wanted the release yesterday. Development tools focus on allowing fast development of applications. M$ is taking a lot of criticism on itself by delaying Longhorn.
You can no longer work on a product until 'it's done'. It has to ship, wether it's stable or not. If not, you issue a patch a week later. This is especially visible in games
Re:Software doesn't rust... (Score:4, Informative)
Sadly, it really is running on ~30 year old hardware, at least in part. I've spoken to some of the service engineers.
Re:Software doesn't rust... (Score:2)
Well, let's see. Would you like to use word processor from the 70's, "ed" perhaps? How about a nice video game, let's see we have Pong and Asteroids. Or you could go out on the Internet, I hear there are almost 100 sites hooked up now.
Software *has* improved a lot since the 70s. Yes, I'm aware of the so-called "software crisis." My only question is where do people get the unrealistic expectations that make realit
Re:Software doesn't rust... (Score:4, Informative)
The 'fridge size boxes are 70's vintage (I suspect bits have been replaced over the years). The CPUs are only about five years old. The system consists of two identical computers for hot failover and they they had to get two custom CPUs made by the original manufacturer (IBM, I think) to deal with Y2K.
As for the software? Written in some weird language called Jovial, and continually repatched - never rewritten.
BTW, where the heck is Swanage? The new NATS center is in Swanwick!
Re:Software doesn't rust... (Score:4, Informative)
Muahaha. Languages from the stone-age. Jovial is an ancient semi-descendant of Algol, originally written especially for avionics systems. I'm not nearly old enough to have worked with it myself - Jovial's heyday was the mid-'70's or so - but I used to work with a couple of DoD greybeards who had done so, although even they hadn't touched the thing in years, as it's mostly been supplanted by Ada these days. The USAF can tell you a bit more about Jovial [af.mil] if you're having a slow day today ;)
Re:Software doesn't rust... (Score:2)
It seems that way, doesn't it? But it's really not so - keep in mind that military procurement is an extremely lengthy and drawn-out process. We think of the F-22 Raptor - one of the systems listed on the Jovial page - as being "new" because it's only now entering active-duty service this year, here in 2004. But the first concept definition studies for the Advanced Tactical Fighte
Re:Software doesn't rust... (Score:2, Interesting)
Yeah but the Y2K problem was "discovered" way back in the 70s. Banks doing 25 year mortgages in 1975 would extrapolate into 2000 and "whoops!" Any place which had Y2K problems gets no sympathy from me.
bits rot one another (Score:2)
Lucky in the US... (Score:5, Informative)
Hopefully the UK will get the new system tested and online before it causes more problems!
Re:Lucky in the US... (Score:2, Informative)
Re:Lucky in the US... (Score:2)
The only minor problem is that the new system is, if anything, more likely to cause problems than the old one. Especially if it follows the same pattern as the majority of the other big systems our useless government has thrown money at.
Re:Lucky in the US... (Score:5, Informative)
The other one at Swanage handles the ATC for everywhere else. This was replaced with a new system in 2002.
But, by 2006 hopefully all ATC in the UK will be running on new systems.
Re:Lucky in the US... (Score:3, Informative)
Swanwick not Swanage (Score:2)
Swanage is in Wales.
Re:Lucky in the US... (Score:2)
That's a statement based on a totaly false premise. Simply because it was old or used tubes did not mean it was a bad design, in fact, the simplicity of the design and the shortness of the source code made it very a very easy system to debug and program. It wasn't as pretty, but it worked fine.
The simple fact is that there's more b
Re:Lucky in the US... (Score:3, Informative)
More problems... (Score:5, Informative)
It seems they have been having problems with their computer systems since 2001 when it was "privatized".
"The air traffic service has been beset by problems since it was partially privatized in 2001. A $484 million center at Swanwick in southern England opened five years late in 2002.
The opening was delayed by problems with computer software, and the glitches continued for months afterward, as controllers misread aircraft altitudes and destinations because of hard-to-decipher computer screens. In at least one case, controllers mistook the Scottish city of Glasgow for Cardiff in Wales.
Now.. that seems like a pretty big mistake for me.. especially for an air traffic controller to do. However, the article later states that:
"Transport Secretary Alistair Darling said Thursday's problem did not lie at Swanwick but at the older West Drayton center, which is due to be closed by 2007."
Thank goodness that old one is closing, however it doesn't sound like its replacement is doing any better!
"If you want to know what is wrong with transport in this country it is that over decades successive governments did not spend enough on the infrastructure and air traffic control is no different," Darling told BBC radio."
Excellent quote! While terrorism is on everyone's mind, we sometimes forget that safety of transportation should also be just as high. I couldn't imagine pilots relying on themselves to fly airplanes amid the thousands of others without the aid of traffic controllers and their computers.
Re:More problems... (Score:5, Interesting)
A dutch friend of mine once remarked that she didn't understand the mentality of the British. "You" she said, "have an amazing tendency to run things into the ground and then get around to fixing them rather than spending money on continually maintaining them so they never fall apart."
It's a very good point.
Re:More problems... (Score:5, Insightful)
<rant>
I blame Margaret fucking Thatcher, who let the hospitals fall apart and flogged off the viable bits of the infrastructure to her friends (at well below market value). [We're still feeling the effects of this on the railways, which the private sector has run into the ground] Corrupt old bitch.
Re:More problems... (Score:2)
Corrupt old bitch.
Gosh, how enlightened an woman-respecting you leftists are!
Re:More problems... (Score:2)
Re:More problems... (Score:3, Interesting)
I'd say the UK has been letting the infrastructure maintenance slide since at least WW2, maybe earlier. We inherited a fantastic installed base from the Victorians - the fact that it took 50 years of neglect to rot away is a tribute to how well they built - but the sad fact is this stuff was put together by a world-spanning Empire at the top of its game. What with paying for a couple of world
Re:More problems... (Score:2, Informative)
Re:More problems... (Score:3, Insightful)
Funny, I noticed this about the U.S. system. But I figured it out. It has to do with the fact that civil maintenance is done by civil-service people with a union and a contract, while new equipment and construction contracts
Re:More problems... (Score:3, Interesting)
Or the management mentality of 'Oh, security is too expensive right now we'll ship it and fix it later'.
Politicians only look to the next election and managers only look to the next quarter. It is a typical attempt by non technical types to ignore entropy, expressed quite nicely in the old saying 'rust never sleeps.' If you want a bridge to last, paint it today, not after it has
Re:More problems... (Score:2)
To be fair, it's really hard to estimate the economic costs and probability of a security breech.
The insurance industry is in this business, and they have to maintain and analyze *masses* of data to pull this off.
Re:More problems... (Score:2)
Then I'm afraid you lack imagination: the idea that aircraft need to be controlled by people on the ground is a large part of the problem... not only is there no real need for such a system now that technologies like GPS can allow aircraft to communicate and ensure they're not going to collide with any other aircraft nearby, but 'air traffic control' inc
Re:More problems... (Score:2)
What a nice little non sequiter. The system is having problems because it was privatised, but the Swanwick Center has nothing to do with that, since it was five years late one year after the privatization.
Sounds like the non-privatised system was having problems all on its own....
What WAS the System that crashed? (Score:5, Insightful)
It would help to reduce the coming surge of Microsoft jokes, which is very likely not relevant here.
Re:What WAS the System that crashed? (Score:5, Informative)
The hardware is an IBM 9020 family mainframe, the application is written in Jovial (one of , if not THE first algebraic language), and BAL assembler (for the monitor mostly). The monitor is the operating system so it effectively is a custom written operating system for this application.
Although MVS is also used for testing. The I/O capabilities of the mainframe are superb which means it can handle 2000+ flights with only 14 Megs of RAM (if I remember rightly).
I believe the NAS application came as a freebee from IBM when the UK purchased the hardware and was the same NAS (national airspace system) application used all over the US. It has been continously developed since then (no mean feat when you consider that all variables are global in Jovial, It uses holleriths instead of ascii, and you are limited to 5 or 6 characters per variable name). The hardware has also been upgraded several times over its lifetime.
It doesn't often go down, last time was 2002 sometime, and you can tell how important it is because everyone screams when it does go down. The people I worked with are extremely dedicated to their job, but one cannot test a system like this for absolutely every eventuality. No doubt some patch was applied and some special case came up that caused a FLOP (functional loss of operation). It happens, Radar is usually unaffected, so the safety implications are not large, but flow is affected.
The UK approach to handling NAS is much different to the US, the US tends to not touch the NAS software and develop external systems that enhance the usage of airspace, where as the UK tends to delve into NAS and improve things directly in NAS. Jovial is a very interesting language it has been used heavily by the US military and exists in such applications as Cruise missiles and many other aircraft and missile systems. Read about Jovial here if you are interested.
I can't say too much about it for various NDA reasons (OSA) I think most of the above is in the public domain.
HTH.
Re:What WAS the System that crashed? (Score:3, Interesting)
Re:What WAS the System that crashed? (Score:2)
Re:What WAS the System that crashed? (Score:2, Informative)
I think the system which crashed was only responsible for admitting new flight plans to the whole complex. Any flightplan already filed could carry on; it is just that no-one could file a new plan for the next flight.
Links for reference (Score:4, Informative)
They have a press release http://www.nats.co.uk/news/news_stories/2004_06_0
Lazy? Click links here... (Score:2)
http://www.nats.co.uk/services/index.html [nats.co.uk]
and
http://www.nats.co.uk/news/news_stories/2004_06_03 .html [nats.co.uk]
So what? (Score:5, Insightful)
At least there should be. Computers crash, break, have bugs, etc. They're a tool - a more efficient and convenient tool to be sure.
But when they break, there are contingencies so that planes can still take off and land, and wont just fall out of the sky.
This is also why Y2K was such a bunch of stupidity. We really aren't as reliant on computers as people think. We know they crash and are prepared to handle it when they do.
Re:So what? (Score:2)
The redundant systems can't replace the speed and accuracy of a computer.
Computers are a tool. But how do you access the radar system and translate its information with out a computer?
Re:So what? (Score:4, Informative)
In scenarios like this, where load has increased whilst the computers systems were in place, we *are* reliant on them.
Think of banks - time was when you had to almost plead on your knees to get a banck account, and they charged you for running it. This was becasue every account was written down manually in a book, and any calculations were performed by hoards of clerks. Then - computers. Now your new account is just one more record in a table somewhere, so the banks give out accounts to anyone who wants one, and do it for free. If for some reason your bank's computer system goes AWOL, there is no way they can process a month's interest calculations on the millions of balances and transactions - not to mention actually applying the transations that would now come in on bits of paper.
I do agree that in a lot of cases, there remains a perfectly useable manual method, but where the computer system has enabled geometric increases in capacity over the manual system (which has been taken up) then, if you'll excuse the pun, it won't fly.
You're right about the Y2k thing - I worked on a contract for a railway maintenance company in 1999 and the Y2K cordinator guy was tearing his hair out at the thousands of questions he got monthly such as "so, these nails, are they Y2K compliant?" He actually had solid steel track components called "chairs" that the rails sit on that had Y2K compliance stickers on them from the manufacturer. Presumably, they got fed up explaining it too, and decided it was easier to just stick the stickers on everything they made...
Re:So what? (Score:2)
Same in Ireland! (Score:5, Informative)
week in Dublin [ireland.com]
And the Wizard said: (Score:5, Funny)
[x] Allow Windows to detect new hardware ?
[ ] Allow planes to circle in uncertainty ?
[x] Show this window at all airports
This happened here in Houston about a month ago: (Score:2, Informative)
Bug (Score:2)
It could have been alot worse... (Score:2, Interesting)
as for the system crashing in the first place, it's unfortunate, but a good thing that they were able to cope and keep everyone safe - that's the main thing, right? (it's certainly my main concern)
and as for the software not being up to the job, it may well not be. after all, air traffic has increased ever so slightly since the 1970's - is it reasonable to expect a program presumably designed for 70's hardware, and 70's air traffic loads to cope with heathrow in 2004?
Swanwick not Swanage! (Score:5, Informative)
Swanwick, not Swanage! (Score:4, Informative)
Swanage is a pleasant little seaside resort. I know it well and stayed there a few nights when on my honeymoon.
Finding Swanwick and Swanage on a map of southern England is left as a exercise. Hint: Mapquest [mapquest.co.uk] may be a good place to start.
Paul
Reboot took the system down? (Score:2)
err, if you keep your fingers crossed, that is!
Re: (Score:2)
Re:What's the problem? (Score:2)
Hang on a second... (Score:3, Interesting)
"The FDP was being tested overnight for a future upgrade. The system was successfully returned to service but at 06.03 errors were detected in the distribution of flight data between Centres. As a precaution, we decided to restart the FDP (known as a cold restart) causing an interruption to full service. The data processing system was restored at 06.42 and declared fully operational at 07.03. Flight capacity restrictions were lifted at 08.05. The system is now fully operational and we are confident that it is stable.
Through the response team at West Drayton, we have been working with airports and airlines to clear the delayed departures, and expect the backlog to be cleared quickly.
Our investigation into the cause of the problem is continuing."
Let me get this straight: they ran a test on the FDP. The FDP glitched. They rebooted the FDP. They are still investigating the problem.
Now, unless I am mistaken, I can only infer from their statement above that they are now running the FDP which is still susceptible to the problems highlighted by the test.
Re:Hang on a second... (Score:4, Insightful)
That's not the way I understand it. From their report, I understand the events went something like this:
The Hardware ... (Score:2)
article [computerweekly.com]
Re:The Software ... (Score:2)
It was/is definitly a defense industry language. The dollar is not a statement delimiter, its information for the billing system
Golden rules.. (Score:5, Insightful)
Rus
Re:Golden rules.. (Score:2)
Re:Golden rules.. (Score:2)
Re:Golden rules.. (Score:3, Interesting)
Platform (Score:2)
Windows Update (Score:3, Funny)
ATC software is scary (aka, Know Your Userbase) (Score:4, Insightful)
The first version of the software was built using standard current interface guidelines and widgets and the testing group that had no experience with older ATC systems were wowed at how simple and yet powerful it was. Pretty much any random person off the street could look at the screen and easily figure out what was going on and how to do various basic tasks. When that version was demoed to the ATC union the union freaked out at how different it was and thus began a cycle of making it more and more backwards.
So, nowadays the next gen ATC software almost exactly replicates the UI of the old non-computerized and semi-computerized systems. On-screen toggle switches and dials, that sort of thing. The FAA and the ATC union have decided that retraining all of their ATCs to use modern computer interfaces would be a Bad Thing. When the computer screen doesn't exactly replicate the interface of the 50+-year-old systems, they freak out and scream bloody murder. On the flip side, kids coming into the field today that have been using computers most of their lives are finding the interface to be counterintuitive to the point of being almost unusable. Middle-aged workers who are both highly proficient ATCs and home computer users report that switching between the two types of interfaces each night when they go home requires conscious effort on their part, since they are so orthogonal.
So who wins? Historical inertia, of course. Why fix the problem today when you can wait for your successors to fix it in 25 years?
Re:ATC software is scary (aka, Know Your Userbase) (Score:2)
The average ATC is a retired enlisted man or woman. They're trained to react, not to think, and any changes to the user interface make them very nervous. On the other hand, they're very good at what they do.
Re:ATC software is scary (aka, Know Your Userbase) (Score:2)
Not Swanage - but Swanwick (Score:2)
Re:This wouldnt've happened...... (Score:2, Funny)
Re:This wouldn't have happened... (Score:4, Insightful)
This might have happened even if they were running linux. If the software that is used for the air traffic controlling was written badly it still could have crashed.
Re:This wouldn't have happened... (Score:4, Interesting)
Without this structure, Linux would probably fail at an unacceptable rate too.
Re:This wouldn't have happened... (Score:4, Funny)
No, the air traffic controllers would still be figuring out how to cut/copy/paste while a 747 is on it's final approach.
Re:A new meaning: (Score:2, Troll)
Blue sky...? In England...?
Are you mad...?
Re:A new meaning: (Score:2)
Re:A new meaning: (Score:2)
Re:A new meaning: (Score:2)
Moderators,
Parent isn't a troll. UK weather is notorious for being cloudy.
Enjoy,
Re:new linux distro idea (Score:4, Funny)
Check out gflightcontrol-0.01, then run the usual:
Of course, it requires gnome 2.6 and all deps. Planes will have to circle while everything emerges.
Re:new linux distro idea (Score:2, Funny)
Re:I always wondered... (Score:3, Interesting)
It does, however, carry the potential to introduce errors in various systems.
Would you want the altimeter to read 200 feet too high, or have an uncommanded left turn, because some numbnuts is yakking on the cellphone?
"DC-9 flight crew experienced an involuntary turn [cio.com] by the autopilot during cruise. Autopilot reacted normally after the captain asked passengers to turn off any personal electronic devices. Crew later learned that
Re:I always wondered... (Score:2)
When you're standing on the ground, your cell phone (usually) connects to one or two transmitters that are fairly nearby. You obviously don't need exact line-of-sight, but general-direction-of-sight is necessary for reception. This is how the system was designed to operate.
When you're in a plane at FL330, you're line-of-sight to
Re:Dangers of open source? (Score:2)
In actual fact it is connected to the internet ( albeit through SSL encryption ). This is as a result of a drive to cut sick days amongst ATC staff by allowing them to work from home, or from coffee shops or pubs, using specially adapted web browsers and their mobile phones or WAP access points.