Software Upgrade Crashes UK Air Traffic Control System 233
pitpe writes "Earlier today the computer system controlling most of the UK's airspace failed, after tests in preparation for an upgrade failed. The original failure occurred at the West Drayton centre, which is an old (70's) system, as opposed to the new system at Swanage, which has had its own problems. A system wide reboot to fix the system resulted in the entire system being taken down temporarily."
Lucky in the US... (Score:5, Informative)
Hopefully the UK will get the new system tested and online before it causes more problems!
More problems... (Score:5, Informative)
It seems they have been having problems with their computer systems since 2001 when it was "privatized".
"The air traffic service has been beset by problems since it was partially privatized in 2001. A $484 million center at Swanwick in southern England opened five years late in 2002.
The opening was delayed by problems with computer software, and the glitches continued for months afterward, as controllers misread aircraft altitudes and destinations because of hard-to-decipher computer screens. In at least one case, controllers mistook the Scottish city of Glasgow for Cardiff in Wales.
Now.. that seems like a pretty big mistake for me.. especially for an air traffic controller to do. However, the article later states that:
"Transport Secretary Alistair Darling said Thursday's problem did not lie at Swanwick but at the older West Drayton center, which is due to be closed by 2007."
Thank goodness that old one is closing, however it doesn't sound like its replacement is doing any better!
"If you want to know what is wrong with transport in this country it is that over decades successive governments did not spend enough on the infrastructure and air traffic control is no different," Darling told BBC radio."
Excellent quote! While terrorism is on everyone's mind, we sometimes forget that safety of transportation should also be just as high. I couldn't imagine pilots relying on themselves to fly airplanes amid the thousands of others without the aid of traffic controllers and their computers.
Re:Software doesn't rust... (Score:2, Informative)
Links for reference (Score:4, Informative)
They have a press release http://www.nats.co.uk/news/news_stories/2004_06_0
Re:Lucky in the US... (Score:2, Informative)
Same in Ireland! (Score:5, Informative)
week in Dublin [ireland.com]
This happened here in Houston about a month ago: (Score:2, Informative)
A string of failures (Score:0, Informative)
Hurray another British triumph!
Swanwick not Swanage! (Score:5, Informative)
Swanwick, not Swanage! (Score:4, Informative)
Swanage is a pleasant little seaside resort. I know it well and stayed there a few nights when on my honeymoon.
Finding Swanwick and Swanage on a map of southern England is left as a exercise. Hint: Mapquest [mapquest.co.uk] may be a good place to start.
Paul
Re:Lucky in the US... (Score:5, Informative)
The other one at Swanage handles the ATC for everywhere else. This was replaced with a new system in 2002.
But, by 2006 hopefully all ATC in the UK will be running on new systems.
Vacuum Tubes (Score:1, Informative)
I got news for you. All the air traffic control centers in the US are *still* running on vacuum tubes. What do you think the CRT displays in all the radars and computers are?
In fact the computer I'm sitting at right now depends upon a vacuum tube as one of it's most important parts, without which it would be rather worthless
Re:Software doesn't rust... (Score:4, Informative)
Sadly, it really is running on ~30 year old hardware, at least in part. I've spoken to some of the service engineers.
Re:Three fingers (Score:4, Informative)
Re:Lucky in the US... (Score:3, Informative)
The only way this could be described as the entire countries ATC system is when you consider that London D&D (Distress and Diversion) is [in]conveniently located at West Drayton, and would be down during this outage. London D&D is responsible for monitoring 121.5 ("guard" frequency) across the whole country (Scottish and possibly Shannon FIR included) and becomes responsible for handling just about any aircraft emergency before handoff to local ATCO's.
Whilst D&D wouldn't be affected technically, regulations require that the information provided by the HCS at West Drayton is always available to D&D, thus forcing a D&D shutdown in the event of the HCS crash. There is enough contingency to negate the use of D&D in an emergency, and handoff to local controllers immediately instead, but I don't think NATS and the CAA were willing to operate without D&D due to regulations and risk of litigation. Many GA flights (even those operating locally and un-filed under VFR) were also grounded due to the lack of D&D. It's a stupid system where one system needlessly relys on another then gets scared into submission when the other goes tits up.
Re:So what? (Score:4, Informative)
In scenarios like this, where load has increased whilst the computers systems were in place, we *are* reliant on them.
Think of banks - time was when you had to almost plead on your knees to get a banck account, and they charged you for running it. This was becasue every account was written down manually in a book, and any calculations were performed by hoards of clerks. Then - computers. Now your new account is just one more record in a table somewhere, so the banks give out accounts to anyone who wants one, and do it for free. If for some reason your bank's computer system goes AWOL, there is no way they can process a month's interest calculations on the millions of balances and transactions - not to mention actually applying the transations that would now come in on bits of paper.
I do agree that in a lot of cases, there remains a perfectly useable manual method, but where the computer system has enabled geometric increases in capacity over the manual system (which has been taken up) then, if you'll excuse the pun, it won't fly.
You're right about the Y2k thing - I worked on a contract for a railway maintenance company in 1999 and the Y2K cordinator guy was tearing his hair out at the thousands of questions he got monthly such as "so, these nails, are they Y2K compliant?" He actually had solid steel track components called "chairs" that the rails sit on that had Y2K compliance stickers on them from the manufacturer. Presumably, they got fed up explaining it too, and decided it was easier to just stick the stickers on everything they made...
Re:Software doesn't rust... (Score:4, Informative)
The 'fridge size boxes are 70's vintage (I suspect bits have been replaced over the years). The CPUs are only about five years old. The system consists of two identical computers for hot failover and they they had to get two custom CPUs made by the original manufacturer (IBM, I think) to deal with Y2K.
As for the software? Written in some weird language called Jovial, and continually repatched - never rewritten.
BTW, where the heck is Swanage? The new NATS center is in Swanwick!
Re:Lucky in the US... (Score:3, Informative)
Swanick (formerly known as New EnRoute Centre - NERC) has just gone live after a very painful birth. This was largely because the system was based on the US AAS project which was cancelled before NERC was started.
Re:What WAS the System that crashed? (Score:5, Informative)
The hardware is an IBM 9020 family mainframe, the application is written in Jovial (one of , if not THE first algebraic language), and BAL assembler (for the monitor mostly). The monitor is the operating system so it effectively is a custom written operating system for this application.
Although MVS is also used for testing. The I/O capabilities of the mainframe are superb which means it can handle 2000+ flights with only 14 Megs of RAM (if I remember rightly).
I believe the NAS application came as a freebee from IBM when the UK purchased the hardware and was the same NAS (national airspace system) application used all over the US. It has been continously developed since then (no mean feat when you consider that all variables are global in Jovial, It uses holleriths instead of ascii, and you are limited to 5 or 6 characters per variable name). The hardware has also been upgraded several times over its lifetime.
It doesn't often go down, last time was 2002 sometime, and you can tell how important it is because everyone screams when it does go down. The people I worked with are extremely dedicated to their job, but one cannot test a system like this for absolutely every eventuality. No doubt some patch was applied and some special case came up that caused a FLOP (functional loss of operation). It happens, Radar is usually unaffected, so the safety implications are not large, but flow is affected.
The UK approach to handling NAS is much different to the US, the US tends to not touch the NAS software and develop external systems that enhance the usage of airspace, where as the UK tends to delve into NAS and improve things directly in NAS. Jovial is a very interesting language it has been used heavily by the US military and exists in such applications as Cruise missiles and many other aircraft and missile systems. Read about Jovial here if you are interested.
I can't say too much about it for various NDA reasons (OSA) I think most of the above is in the public domain.
HTH.
Re:What WAS the System that crashed? (Score:2, Informative)
I think the system which crashed was only responsible for admitting new flight plans to the whole complex. Any flightplan already filed could carry on; it is just that no-one could file a new plan for the next flight.
Re:More problems... (Score:2, Informative)
Re:Software doesn't rust... (Score:4, Informative)
Muahaha. Languages from the stone-age. Jovial is an ancient semi-descendant of Algol, originally written especially for avionics systems. I'm not nearly old enough to have worked with it myself - Jovial's heyday was the mid-'70's or so - but I used to work with a couple of DoD greybeards who had done so, although even they hadn't touched the thing in years, as it's mostly been supplanted by Ada these days. The USAF can tell you a bit more about Jovial [af.mil] if you're having a slow day today ;)