Boeing 787s Must Be Turned Off and On Every 51 Days To Prevent 'Misleading Data' Being Shown To Pilots (theregister.co.uk) 140
The U.S. Federal Aviation Administration has ordered Boeing 787 operators to switch their aircraft off and on every 51 days to prevent what it called "several potentially catastrophic failure scenarios" -- including the crashing of onboard network switches. The Register reports: The airworthiness directive, due to be enforced from later this month, orders airlines to power-cycle their B787s before the aircraft reaches the specified days of continuous power-on operation. The power cycling is needed to prevent stale data from populating the aircraft's systems, a problem that has occurred on different 787 systems in the past. According to the directive itself, if the aircraft is powered on for more than 51 days this can lead to "display of misleading data" to the pilots, with that data including airspeed, attitude, altitude and engine operating indications. On top of all that, the stall warning horn and overspeed horn also stop working.
This alarming-sounding situation comes about because, for reasons the directive did not go into, the 787's common core system (CCS) -- a Wind River VxWorks realtime OS product, at heart -- stops filtering out stale data from key flight control displays. That stale data-monitoring function going down in turn "could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure." Solving the problem is simple: power the aircraft down completely before reaching 51 days. It is usual for commercial airliners to spend weeks or more continuously powered on as crews change at airports, or ground power is plugged in overnight while cleaners and maintainers do their thing.
This alarming-sounding situation comes about because, for reasons the directive did not go into, the 787's common core system (CCS) -- a Wind River VxWorks realtime OS product, at heart -- stops filtering out stale data from key flight control displays. That stale data-monitoring function going down in turn "could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure." Solving the problem is simple: power the aircraft down completely before reaching 51 days. It is usual for commercial airliners to spend weeks or more continuously powered on as crews change at airports, or ground power is plugged in overnight while cleaners and maintainers do their thing.
Windows (Score:5, Funny)
I thought Windows had a clear indicator that it was not to be used for mission critical systems.
SNL [Re:Windows] (Score:5, Funny)
Rebooting a plane during a flight could make a fun SNL skit.
Copilot: "It's asking if we to want to also upgrade now."
Pilot: "No, just finish a plain reboot! We're losing altitude."
Copilot: "Uh, now it's asking for a license key code."
Pilot: "Crap, I don't remember where I wrote it down. Let me check around..."
Copilot: "I got an idea ... [cabin speaker] Attention passengers, this is your copilot, does anybody have a working Microsoft Windows key code we can borrow?"
Pilot: "This is a goddam jet, not a laptop. You'll make them panic."
Copilot: "Worth a try, you got a better idea?..."
Re:SNL [Re:Windows] (Score:5, Funny)
Re: SNL [Re:Windows] (Score:2)
On the other hand, the passengers only suffer the worst consequences once.
Re: (Score:3)
Re: (Score:2)
I actually lost near a day's work because a critical machine forced a reboot and needed the BIOS password to complete it (firmware upgrade), which IT couldn't find. When they did eventually find it Windows decided that Bitlocker wasn't happy and needed the recovery key to boot, so they had to go find that too.
Re: (Score:2)
Re:Windows (Score:5, Interesting)
VxWorks was good enough for Mars Pathfinder but you can write crap software for any OS.
Re: (Score:2)
Today I found a Fintech website that is vulnerable to XSS. Not only was it vulnerable, an employee at the company explicitly had to make it vulnerable by calling a function labeled "insecure." This sort of stuff is really demoralizing.
Re: (Score:2)
eh, I hit a website for a website design company in 2012 that showed the date as '5/719112', a Y2K bug TWELVE years after.
Best of all, there was no reason for then to show a date at all, they just decided to insert it in a bar on the top of the page.
Re: (Score:2)
ugh, "5/7/19112" that is.
Re: (Score:2)
i found one in 2016 on a calendar widget. I was doing some test and got to write "Y2K bug detected..."
Re: (Score:3, Funny)
Press 4 for English
Re:Windows (Score:5, Insightful)
With the real time systems, the OS is generally pretty solid. It is the code that sits on top of it that is almost always the culprit when things are buggy. Part of the problem is that a lot of safeguards aren't in the RTOS because they can be slow or bulky. So you need programmers who understand this sort of thing, but those programmers are aging out of the system without up and coming devs to replace them. Add to that a tendency to cut corners and speed up testing,
Re: (Score:2, Funny)
With the real time systems, the OS is generally pretty solid. It is the code that sits on top of it that is almost always the culprit when things are buggy. Part of the problem is that a lot of safeguards aren't in the RTOS because they can be slow or bulky. So you need programmers who understand this sort of thing, but those programmers are aging out of the system without up and coming devs to replace them. Add to that a tendency to cut corners and speed up testing,
~Darinbob
REMark
The challenge of annotating code with the chatter of its author(s) is having articulated a procedure in its most elegant expression within the sublime constraints of syntax and variable of one language feels redundant and burdensom with one's native tongue of a general purpose.
"It's right there! It works! Can't you see how it works! If I have to explain everything, the remarks will be longer than the damn code!"
Re:Windows (Score:5, Insightful)
The older I get, the more I would prefer developers to stop trying to have elegant code and instead just have readable code that does what it says, and comments that match the code. Anyone trying to be clever needs to stop doing that unless it can be justified (trying to cram more stuff into the 320 bytes of code you're given). Anything sublime should be left at home as a hobby because at the end of the day the code is not supposed to be a progammer's personal artistic expression, it's supposed to be something that works and that other programmers can read and modify and improve upon.
This is something I've discovered too, that 99% of all work that programmers do is maintaining existing code. Writing new code is a luxury. Instead the job is about fixing bugs, adding features, re-architecting to make it do something it wasn't originally designed to do. Even new projects you almost never get a chance to ask for three years so that you can build a marvel from the ground up. So the goal should always be to make your code so that someone else can after you modify it, even if that person is yourself 5 years in the future.
Just to add to this (Score:2)
Comment removed (Score:4, Interesting)
Re: (Score:3)
I've been lucky enough to be mostly writing software for awhile, but who knows what is next. (Next may be soon, given disruptions at work.) That being said, if I possibly can I try to rewrite sections of code that I come back to and find hard to follow, since I know if I find it hard to follow, even if it works, the next guy, not having wrote it, or thought quite the same way, is likely to find it harder to follow. One thing I find more useful than comments is simply the usual documentation of exactly what function do. You try to always make the name say what it does, but there is no substitute for a sentence or two, particularly when your trying to reference it, and wondering how it works. (The documentation will auto popup in visual studio.) That plus keeping your functions short, if you possibly can, handles a lot of things. Of course, if you must make them longer then more commenting is usually required, but simply documenting function and objects is I think more important than documenting inside a function. Ideally your function should be something, that if it has a description some other competent programmer can within a short period of time say, yes, this does that without a lot of additional comments.
Definitely agree. I also try to make it a point when I write function documentation like you describe that I write it to describe what the function does with out referencing how it's implemented. Oddly enough I do things that way because that's how I was taught to do things in high school English class of all places.
Re: (Score:3)
64 bit counters have their own issues though. The main one is that on 32 bit systems they are not atomic by default. A 32 bit value will always read/write in one atomic operation, but a 64 bit value is two at least. When you have interrupt code or hardware counters or pre-emptive multitasking it can be a major problem and for some reason few developers seem to understand it.
Re: (Score:2)
Is this true for 64 bit processors ?
Somehow, intuitively, that doesn't make sense
Re: (Score:2)
It depends on the external bus width on some architectures but generally no, 64 bit systems can access 64 bit words atomically.
Re: (Score:2, Funny)
Re: (Score:2)
Painstakingly making every data member private, and giving a public get method and protected set method has bailed me out so many times.
When I see new coders casually make my base class methods public, they get a very serious talking down to. Nothing nasty or insulting. Just a serious talk, involving the guy/gal's boss. Do it three or four times, your get a reputation in the company and your class libraries live unmolested. Rarely when the boss
Re: (Score:2)
Yes I agree 100%.
The older I get, the more I would prefer developers to stop trying to have elegant code and instead just have readable code that does what it says [...] This is something I've discovered too, that 99% of all work that programmers do is maintaining existing code. Writing new code is a luxury. Instead the job is about fixing bugs, adding features, re-architecting to make it do something it wasn't originally designed to do.
And this is why the world needs more senior devs. I've work at a place
Re: (Score:2)
Ok Boomer. Don't you know everything can be fixed with a simple:
npm install github.com/random_weed_dude/washigh/tarball/v0.0.0.0.0.1
Re: (Score:2)
Ok Boomer. Don't you know everything can be fixed with a simple:
npm install github.com/random_weed_dude/washigh/tarball/v0.0.0.0.0.1
haha well played :)
Oh god NPM. I mean the easy of hacking is appealing I guess (I don't write much JS, so I'm speculating here) but the thought of using it in production makes me kind of horrified. There is so much of that.
Re: (Score:3)
I'm confused here. I never said comments were bad, I was advocating for MORE comments and less inscrutable code that is treated as a puzzle for the reader.
Re: (Score:3)
I'm confused here. I never said comments were bad, I was advocating for MORE comments and less inscrutable code that is treated as a puzzle for the reader.
You're chatting with a bot.
Re: (Score:2)
Oh, that's good news, actually.
Re: (Score:2)
Oh, that's good news, actually.
Hardly. They're getting better. Good enough to troll, advertise and spam, but not good enough to be game master for a DND session.
Re: (Score:2)
Re: (Score:2)
On the other hand, very often the rich catalog is full of stuff you can't use. I remember one person at work ranting that we didn't support DTLS (he was big on having one of each standard included in each product). I pointed out that the smallest DTLS library I could find (I think it had "mini" in the name) still took up 3/4 of our available code space. Which didn't seem to placate him. The stuff that's out there for easy use is sometimes unsuitable for use in many environments.
On the other hand there ar
0 0 */50 * * (Score:5, Funny)
Re: (Score:3)
This isn't a UNIX/Linux situation....
Re: 0 0 */50 * * (Score:4, Funny)
Anacron (Score:2)
isn't that what anachron is for?
This sounds a bit like the patriot missile bug that did something like count microseconds to determine the time but the time base wasn't precise so it drifted. That resulted in the disaster in the Kuwait war.
Re: (Score:2)
The operating system will crash.
Re: (Score:3)
Just set up a cron job to shut down the main power on the plane every 50 days.
Please ensure your reboot script checks that the plane is not currently in flight, or on the run/taxi-way ...
Re: 0 0 */50 * * (Score:5, Funny)
Re: (Score:2)
Just set up a cron job to shut down the main power on the plane every 50 days.
Please ensure your reboot script checks that the plane is not currently in flight, or on the run/taxi-way ...
Nah its just easier to check if the plane is on, then run the cron script.
Re: (Score:2)
If in flight, autopilot should countermand pilot input and accelerate to the ground, which is a known safe place. If the pilot tries to resist, just cycle over and over until the meat bag tires out. Boeing, Boeing, Bang.
Re: 0 0 */50 * * (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
Oh, and better do a
set message.pagesize.minimum("A-14")
as well.
Re: (Score:2)
You don't open Windows at Warp speed or 37,000 feet either.
Did you try turning it off and then on again (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
Neither is "reboot or die."
Re: (Score:2)
Isn't it, "Abort, Reboot, Die?"
hawk
Re: (Score:2)
I've heard it from an industrial control system vendor... and honestly after all the stories about Boeing I just assumed that people would be turning them off and on again daily just to be on the safe side.
Re:Did you try turning it off and then on again (Score:5, Informative)
It's a really stupid error too. Many developers will recognise 51 days. It's the time it takes a 32 bit unsigned millisecond counter to overflow.
Re: (Score:2)
Advice to programmers worldwide: when storing timestamps, always spring for the extra 32 bits. It's totally worth it, just to avoid problems like this.
Re: (Score:3, Informative)
That would be 49.71 days, actually, or 24.85 days for a signed 32 bit counter to go negative.
Now, for a different real world example, Microsoft IIS (3.x, I think it was) had a bug where the date/time fields in W3C format log files would stop incrementing after only 40 days. Wonder what size counters they were using?
Re: (Score:2)
Re: (Score:2)
I had to power cycle my king ant's HP OfficeJet printer yesterday morning (couldn't even shut it down with its power button). I never had to power cycle a printer before. Geez.
Re: Did you try turning it off and then on again (Score:2)
Re: (Score:3, Interesting)
That's actually a surprisingly common troubleshooting technique on pretty much any large airplane. Primary Flight Control Computer failed? Just turn it off and on again. No longer failed? Good to go!
Especially on Airbus this usually solves most problems. We're constantly pulling and resetting circuit breakers. Boeing is actually a bit better in that regard, mainly because the systems are simpler and less interconnected. Either it works, or you need a mechanic to physically repair it.
I also used to fly the D
Comment removed (Score:5, Funny)
Re: (Score:2)
Re: (Score:3)
A warning before 3, the pilot must close all open windows before rebooting,
It's Boeing, not Airbus.
(for the less aeronautically inclined, some of the windows on Airbuses do in fact open http://s.wsj.net/public/resour... [wsj.net] )
Re: (Score:2)
Don't plug a USB Death Cart into the plane: https://books.google.com/books... [google.com]
32 bit counter (Score:5, Informative)
51 days is pretty close to 2**32 milliseconds.
Sounds like an overflow of a 32 bit counter.
Resetting that would avoid a Microsoft style 'turn it off and on again' reboot. But there may be more than one of them so a power cycle to set them back to zero sounds a safer (if less convenient) way
Re:32 bit counter (Score:5, Interesting)
That's only 49.7 days in milliseconds. Probably a 42-bit counter counting microseconds, which is much more appropriate resolution given how fast a plane travels - and that's 50.9 days until turnover, which is much closer to the estimate.
Re: (Score:2)
Perhaps 49.7 days plus a worst case mid-flight cushion.
But why shut it off only every 50 days... (Score:2)
Makes you wonder why they don't recommend turning it off every 2 weeks. Then someone would have to screw up this basic maintenance three times in a row for it to be a problem.
Re: (Score:2)
Re: (Score:2)
And the tick speed is probably something like
1 tick = (1000/1024) ms
Counts per day: (1000*1000/1024 ) * 24*60*60 = 84375000
UINT32 Max 0xFFFFFFFF = 4294967295
4294967295
Re: (Score:2)
If you're gonna pretend to be a nerd, learn how binary works.
And you should learn the difference between milliseconds and microseconds before opening your big, dumb, gaping maw.
Re: (Score:2)
Not the first time.... (Score:5, Informative)
The really funny thing is that this isn't the first bug in the 787 requiring a reboot. I was thinking that this was a dupe, because I remembered reading a similar story a while back. Turns out that the previous bug occurred at 248 days [theguardian.com] and was even more serious....
Oh, and another one at 22 days [popularmechanics.com].
It's almost like Boeing rushed this thing into production. Oh, wait....
Ha ha (Score:2)
"This ruins my uptime!" (Score:5, Funny)
Pilot 1:"So what your plane's uptime? Mine's 2305hrs and counting..."
Pilot 2:"I've got a 787, so mine at 1223.89hrs.... OH SHIT! Hold on one min..."
Should have used QNX (Score:2)
Good time to air out bad news (Score:2)
With COVID news taking up most of the air time and attention, NOW is the time for companies to put out bad news! It will slip under most radar and be forgotten soon.
By the way, how many 787 are still flying and not being grounded?
Re: (Score:2)
Why not use OS/2 (Score:2)
I tell my users (Score:2)
I'm fairly sure... (Score:4, Insightful)
...that DO-178C's software requirements, not to mention other vehicular coding standards, NASA's Power of Ten, and unit testing that's supposed to look for non-fatal state corruption bugs of this sort should prevent cumulative errors and stale data.
Of course, that assumes Boeing sticks to standards.
The MAX8 incidents, along with reported computer issues with the 777, make me think Boring are winging it.
I'm increasingly of the opinion that if you put any unexplained situation involving a modern Boeing down to a computer glitch, you've an excellent chance of being right.
The question is, do we have too many standards? Incoherent/Unusable standards? Conflicting standards?
There are even standards designed for specific projects, such as the Joint Strike Fighter.
What's clear is that we've no shortage of tools to prevent these sorts of bugs and that Boeing (and to some extent Airbus) aren't using any of them.
51 days exactly (Score:2)
Mid air over the Pacific sounds like as good a place as any to reboot the thing.
On a 787 now (Score:2)
I'm posting this using the aeroplane's wifi.
Oh look, I've found it's running an ssh server.
jeremyp@Magenta ~ % ssh 192.168.1.2 -l pilot
Password:
Last login: Fri Apr 3 13:54:54 2020
pilot@787 ~ % uptime
13:56 up 50 days, 23:59, 3 users, load averages: 2.73 2.38 2.23
pilot@787 ~ %
Oh, shi
Re: (Score:2)
Don't likely keep the engines running for 51 days but the computers running the thing would likely never be turned off until some maintenance is required. Well, they just added a new maintenance requirement.
Same thing happend on Mars (Score:2)
The 1996 Pathfinder lander/Sojourner rover mission ran VxWorks and was disabled because of a priority-inversion bug. I think they had to wait until the system timed out and reset itself.
Re: (Score:2)
News to me. I had no idea that they would keep their engines running that long! I wonder what the record is.
As noted, the engines aren't necessarily on all the time as the electrical/electronic systems can also be powered from the ground.
Re:51 days? (Score:5, Informative)
Shutting down the plane != shutting down the engines.
This is, AFAIK, about some of the avionics computer hardware. I haven't looked at the 787's electrical architecture specifically, but I am assuming it is at least similar at a high level to some of its predecessors. If so, then each of the computers is connected to multiple power buses, at least one of which gets power from onboard batteries, and at least one of which switches to external power while parked at a gate (via conversion from the AC bus, I think).
You have to shut the computers down explicitly, or else they just keep running even when the engines are powered down.
Re: (Score:2)
Re: 51 days? (Score:2)
I suspect that they're using a 4 byte unsigned integer count of milliseconds in their filter. At 50 days, they have a value below 2^32 and at 51 it's above 2^32.
Re: (Score:2)
You know how your car radio doesn't ask you for the unlock code every time you turn the engine off but does when you remove the battery? Think of planes the same way.
Re: (Score:2)
QA did the 50 day test and it worked within acceptable parameters...
Re: (Score:2)
Re:Welcome back to 1995! (Score:4, Informative)
Any Windows 95 computer left running continuously would've melted onto the desk long before it reached 49.7 days.
Re: (Score:2)
Any Windows 95 computer left running continuously would've melted onto the desk long before it reached 49.7 days.
I'm not sure mine ran for 49.7 days without a reinstall.
[sits back in rocking chair]
Back in the day I had a Win 95 PC. It was a bit of an oddball I bought from a friend of mine assembled from parts. It was a P133, with a whopping 72M of RAM, a Riva 128 graphics card and both a 3.5 and 5.25" floppy drives. The hard disk was a respectable 700M. When browsing in the local bookstore one day, I came a
Re: (Score:2)
Re: (Score:2)
Wow, I wish I'd thought of that... that's clever. Whole partition backups, from your other OS!
Thanks! It was pretty nice and really convenient. I can't claim any special trickery, it was a combination of relatively rare things. At that point CD-Rs were not common, I got a nice high end one which was reliably and fast enough at both reading and writing to be useful. It was the most expensive part of my PC at that point, though I had the cash because I didn't upgrade the CPU or RAM. The hard disk was small, a
Re: Welcome back to 1995! (Score:2)
Re: (Score:2)
I Had to switch it off and on every few days to clear the memory.
~ Hey_Jude_Jesus
It takes me back to the Sinclair's ZX-80 with molded plastic inked with heat vents. 1k RAM applying 4k ROM is a harsh mistress.
Re: (Score:2)
Seriously. I can't overstate how fucking garbage this is. Even unpatched copies of Motherfucking Windows 95 could make it to 64 days.
Re:Do they even know what a code review is? (Score:4, Interesting)
Code reviews can't help you if the requirements are wrong. I'm guessing the code was reviewed thoroughly** by whatever subcontractor implemented that avionics module, but the requirements said nothing about maximum up time, etc.*
It's a very common omission in requirements, sadly.
*To be fair, a really good subcontractor would review the requirements before implementing and say "hey, this is under-specified." However, that kind of experience-based knowledge is difficult to codify.
**I'm surprised, though, this was not caught by a checklist. "Check for integer overflow, and check for floating point underflow" are pretty common checklist items.
Re: (Score:2)
Boeing talked the FAA into self certifying these systems. Then Boeing contracted out the work to India for pennies on the dollar. This was simply a matter of greed. In more civilized countries the CEO and board members would all be rotting in jail.
Re: (Score:2)
Does Boeing even know what a fucking code review is?
Sure. One of the things you do not do because they cost money...