Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Transportation Programming

'How the Boeing 737 Max Disaster Looks to a Software Developer' (ieee.org) 388

Slashdot reader omfglearntoplay shared this article from IEEE's Spectrum. In "How the Boeing 737 Max Disaster Looks to a Software Developer," pilot (and software executive) Gregory Travis argues Boeing tried to avoid costly hardware changes to their 737s with a flawed software fix -- specifically, the Maneuvering Characteristics Augmentation System (or MCAS): It is astounding that no one who wrote the MCAS software for the 737 Max seems even to have raised the possibility of using multiple inputs, including the opposite angle-of-attack sensor, in the computer's determination of an impending stall. As a lifetime member of the software development fraternity, I don't know what toxic combination of inexperience, hubris, or lack of cultural understanding led to this mistake. But I do know that it's indicative of a much deeper problem. The people who wrote the code for the original MCAS system were obviously terribly far out of their league and did not know it.

So Boeing produced a dynamically unstable airframe, the 737 Max. That is big strike No. 1. Boeing then tried to mask the 737's dynamic instability with a software system. Big strike No. 2. Finally, the software relied on systems known for their propensity to fail (angle-of-attack indicators) and did not appear to include even rudimentary provisions to cross-check the outputs of the angle-of-attack sensor against other sensors, or even the other angle-of-attack sensor. Big strike No. 3... None of the above should have passed muster. None of the above should have passed the "OK" pencil of the most junior engineering staff... That's not a big strike. That's a political, social, economic, and technical sin...

The 737 Max saga teaches us not only about the limits of technology and the risks of complexity, it teaches us about our real priorities. Today, safety doesn't come first -- money comes first, and safety's only utility in that regard is in helping to keep the money coming. The problem is getting worse because our devices are increasingly dominated by something that's all too easy to manipulate: software.... I believe the relative ease -- not to mention the lack of tangible cost -- of software updates has created a cultural laziness within the software engineering community. Moreover, because more and more of the hardware that we create is monitored and controlled by software, that cultural laziness is now creeping into hardware engineering -- like building airliners. Less thought is now given to getting a design correct and simple up front because it's so easy to fix what you didn't get right later.

The article also points out that "not letting the pilot regain control by pulling back on the column was an explicit design decision. Because if the pilots could pull up the nose when MCAS said it should go down, why have MCAS at all?

"MCAS is implemented in the flight management computer, even at times when the autopilot is turned off, when the pilots think they are flying the plane."
This discussion has been archived. No new comments can be posted.

'How the Boeing 737 Max Disaster Looks to a Software Developer'

Comments Filter:
  • by Anonymous Coward

    I don't know what toxic combination of inexperience, hubris, or lack of cultural understanding led to this mistake.

    Funny. Everyone else does. Outsource to cheapest overseas software firm available. You get what you pay for.

    • by Anonymous Coward

      Thanks Trump. We never saw any 737-Max airplanes plunging from the sky under Obama's watch.

      Trump owns this.

      • by PPH ( 736903 )

        under Obama's watch

        Because the 737 Max wasn't in service during Obama's administration. It was certified during that time. But then the FAA's over dependence on Boeing's analysis got it's start during the Reagan administration.

        • by Richard_at_work ( 517087 ) on Saturday April 20, 2019 @09:31PM (#58465686)

          The 737MAX received its certification in March 2017, under Trumps administration. That doesn't change the fact that the certification program was conducted under Obamas administration, nor that there have been no significant changes in FAA certification policy under the past several administrations.

          • by Anonymous Coward

            YEP!

            I called this BS when I first saw it here on Slashdot. Boeing seriously shet the bed and took both its own people and regulators down the drain with it.

            There were aviation "experts" saying Boeing didn't do anything wrong and that this was all down to pilot error. The amount of sychophantry going on was and still is INSANE!

            Boeing needs a full, systematic, cleaning of its house or else forced to shut down completely. Too many livew were lost and too many people trying to deflect blame for it to ever be tr

      • by Applehu Akbar ( 2968043 ) on Saturday April 20, 2019 @09:53PM (#58465722)

        And furthermore, Notre Dame stood unburned during the whole Obama administration.

  • by SlaveToTheGrind ( 546262 ) on Saturday April 20, 2019 @08:43PM (#58465548)

    And pilots can stall an airplane. But they generally don't, because they're trained to properly use the equipment.

    Having the entire control envelope available is important, not so pilots can do stupid stuff, but so that pilots can respond properly in situations that egghead PhDs and/or software jocks can't predict. Including this one.

    Because if the pilots could pull up the nose when MCAS said it should go down, why have MCAS at all?

    Disturbing.

    • by AK Marc ( 707885 ) on Saturday April 20, 2019 @09:12PM (#58465636)

      Because if the pilots could pull up the nose when MCAS said it should go down, why have MCAS at all?

      Disturbing.

      To put this in armchair quarterback terms, what would have happened if US Airways Flight 1549 had automatically pulled up because "ground collision" was imminent. I'll give you a hint, it would have turned a no-loss-of-life-event into a loss-of-life event.

      Of course, had Air France Flight 447 forced the nose down against the pilot's wishes, that could have saved lives.

      Historically, airplanes have deliberately erred on the side of giving control to the pilots. Until AI is smarter than a human, the pilot should be the final arbiter of decision. This means that some bad pilots will kill, and some good pilots will save, but all pilots will have the power to make the difference.

      • In normal control law, AF447 would have forced the nose down to prevent the stall - it's the very fact that AF447 was not in normal control law that meant that protection was not available to it.

      • by bobby ( 109046 ) on Saturday April 20, 2019 @10:07PM (#58465764)

        Historically, airplanes have deliberately erred on the side of giving control to the pilots. Until AI is smarter than a human, the pilot should be the final arbiter of decision. This means that some bad pilots will kill, and some good pilots will save, but all pilots will have the power to make the difference.

        Thank you thank you thank you.

        And even if the AI is smarter than a human, it's all based in physical hardware, getting inputs from hardware sensors, going through wires, etc., and hardware can fail.

        And the more I'm reading about these crashes, the more I see where the pilots tried to turn electric trim off, autopilot on and off, and still could not get control of the plane. Maybe some other thing was wrong, and everything, not just MCAS, needs to be examined.

      • This means that some bad pilots will kill, and some good pilots will save, but all pilots will have the power to make the difference.

        Human caused deaths is preferable to AI caused deaths. Right.

      • by Kjella ( 173770 )

        Historically, airplanes have deliberately erred on the side of giving control to the pilots. Until AI is smarter than a human, the pilot should be the final arbiter of decision.

        Except this had nothing to do with smartness, the system got bad data and was responding "correctly" to the faulty value. The problem was that it didn't check both sensors and disabled itself if they disagree. That would have been giving control to the pilots, as it were they had to manually disable a system that didn't realize it was faulty and undo the damage. To use a car analogy, a cruise control can't work with a faulty speed reading. It'll then either speed up or slow down suddenly and you have to fix

        • It can be fixed with AI smartness though. AoA sensor shows that there is supposed to be a stall, check other sensors if they confirm (if the airplane is climbing then it probably not stalled etc). After all, this is what the pilots do. But that would make the system extremely complex.

          With your car analogy it's the same, you can make a smarter cruise control that uses the speed sensor, compares it with GPS, accelerometers and trying to visually calculate the speed of the car in order to function correctly ev

    • Not only that but his strike number one and two are both bogus. Many modern fighter jets aren't stable and are only made so by the software used to operate them.
    • by msauve ( 701917 )
      The summary/article is an exercise in hindsight. Shoulda, coulda, woulda. The root problem isn't a lack of redundant inputs, but the belief that any system should be able to override a pilot. That's several pay grades above any software developer.
  • So what then, is the life of a passenger worth?
    • Boeing may be fined and will most likely be sued by the families of the victims (and the airlines that are now losing money because the MAX is grounded). Calculate the total amount of money Boeing will pay because of this, add the cost of the fix and divide the total by the number of people who died.

      That's how much the life of a passenger was worth in this case.

    • by Z00L00K ( 682162 )

      Just read up on the "Ferengi Rules of Acquisition".

  • It can take inputs from multiple sensors, but that's an upgrade. Third world airlines can't afford it unlike the US and European carriers.

    And unstable air-frames aren't anything new. They've been around since the 1970's when the F-16 and other now obsolete fighters were first developed.

    • It can take inputs from multiple sensors, but that's an upgrade

      You are lying.

    • It can take inputs from multiple sensors, but that's an upgrade. Third world airlines can't afford it unlike the US and European carriers.

      This to me is key right here. Who was it that decided it was OK to make that feature optional? That feels like a mix of marketing and engineering, but why would engineering agree to go along with reducing the inputs to a dangerous level? Was that ever really reviewed?

      I don't think this error is wholly on software, at all. Of course I would tend to defer to people who

      • It wasn't optional; it was nonexistent. He may be thinking of a different feature entirely, or he may just be making it up; either way there was never any option to "upgrade" the MCAS in any way.

        • by Luthair ( 847766 ) on Saturday April 20, 2019 @10:49PM (#58465864)

          I assume this is what he's referring to:

          The two safety features in question were an “angle of attack indicator” and an “angle of attack disagree light”, both of which were not included in the aircraft by Boeing as standard safety features

          article [theguardian.com]

          No idea whether it applies or not since I know nothing about plane systems.

          • It doesn't; both of those are display options in the cockpit and have nothing to do with MCAS.

            They're also not safety features since there's nothing in the aircraft operating procedures which would change based on either of those indicators. Which is why they were optional, and why some airlines chose not to purchase them. They're a nice-to-have, nothing more.

        • Mr. c6gunner, stop it.

          You are simply wrong. No idea why you think 20 people here who explained to you THE EXACT SAME THING now 20 times in the same words are all wrong???

      • No. What was optional was an attitude indicator. i.e. a display which said the angle of attack of the aircraft as read by the sensors.

    • by Dunbal ( 464142 ) *
      Or it can take input from a faulty sensor multiple times.
    • by sjames ( 1099 )

      The first point is unclear. There is an 'upgrade' that warns the pilot if the sensors disagree, and an upgrade that displays each sensor's readings, but I haven's seen anything that indicates an upgrade to get MCAS to actually look at both sensors (other than as a proposal for the fix after causing 2 fatal crashes).

      As for the second point, neutral stability is for fighters and aerobatic planes, not cargo and passenger planes. Negative stability is for fighters only.

    • The upgrade is just an informative upgrade that the AoA sensors are out of whack. it helps in that if the warning light is on you taxi back and do not takeoff. Once you takeoff with a bad AoA sensor the warning light cannot save you. If MCAS misbehaves you have 40 seconds to switch it off. The warning light may give you a headstart but if you miss the 40 second window you are dead - upgrade or no upgrade.
      What Boeing is doing now is change MCAS software to switch off if the AoA sensors are disagreeing. This

      • If MCAS misbehaves you have 40 seconds to switch it off .... if you miss the 40 second window you are dead

        Wow. That's a new one. How in the world did you come up with that fantasy?

  • by magzteel ( 5013587 ) on Saturday April 20, 2019 @08:55PM (#58465596)

    From TFA

    "When MCAS senses that the angle of attack is too high, it commands the aircraft’s trim system (the system that makes the plane go up or down) to lower the nose. It also does something else: It pushes the pilot’s control columns (the things the pilots pull or push on to raise or lower the aircraft’s nose) downward"

    The "trim system" is not the system that makes the plane go up and down. From "https://www.skybrary.aero/index.php/Trim_Systems"

    "Trim Systems are considered to be a "secondary" flight control system. By definition, to "trim" an aircraft is to adjust the aerodynamic forces on the control surfaces so that the aircraft maintains the set attitude without any control input. "

    So the pilots use the trim setting so they can stop pulling on the yoke. It's kind of like an attitude cruise control.

    In this instance MCAS is auto-trimming the plane incorrectly due to a bad sensor reading. And then the pilots did not follow their memory procedures for a runaway trim, shut it off, and use the cranks to manually set the trim. It is possible they tried to use the cranks and could not due to the extremely high speed causing the jack screw to bind. In this instance they are supposed to go nose-down to relieve the pressure but either they were too low already or too freaked out trying to go nose-up to manually go nose-down.

    • by c6gunner ( 950153 ) on Saturday April 20, 2019 @09:21PM (#58465656) Homepage

      To be fair, the fact that the manual trim system can be rendered inoperative in certain flight conditions is in itself a rather large safety concern. It's also an issue which precedes the MAX variant; it has apparently been a known problem for many decades. The only reason it hasn't caused a crash previously is because runaway trim is pretty rare, and runaway trim occurring specifically during very low-level flying would be even more rare.

      Yes, there were ways that both of the MAX aircrews could have recovered their aircraft but - at least in the case of the Ethiopean Airlines crash - I can't fault the aircrew much given what we know now. They seem to have done everything according to the book, but simply didn't have the altitude they needed to fix the problem. They could still likely have recovered the aircraft by going outside the manual and doing some very unorthodox things, but blaming them for not doing so would be foolish.

    • Shouldn't the autotrim move the trim wheels rather than the yokes?

      • Shouldn't the autotrim move the trim wheels rather than the yokes?

        The autotrim doesn't move the yokes. It moves the trim wheels, same as when the pilot hits the trim switch on the yoke.
        The pilot though will feel the difference in the yoke because as the trim is adjusted the pilot will no longer have to pull to maintain the same attitude.

    • No. They disabled the auto-pilot and attempted to use manual trim as described in the documentation. But they couldn't compensate for the MCAS changes in time because the manual trim was too slow. So they re-engaged the system and tried to use the automatic trim which was supposedly faster. But the automatic trim was also slower than the changes the MCAS made.

      • No. They disabled the auto-pilot and attempted to use manual trim as described in the documentation. But they couldn't compensate for the MCAS changes in time because the manual trim was too slow. So they re-engaged the system and tried to use the automatic trim which was supposedly faster. But the automatic trim was also slower than the changes the MCAS made.

        This could be true, but I read a different possibility. The plane was flying at a very high speed. As the trim reached its extreme position the high speed would cause the stabilizer to impart a lot of force on the jack screw, making it very difficult to manually turn it. Pilots are trained to go nose-down to reduce the force, but maybe it still didn't help. It appears also that when they re-engaged the system the MCAS re-engaged a few seconds later.

  • by BlueStrat ( 756137 ) on Saturday April 20, 2019 @09:17PM (#58465648)

    ...When you have MBAs in charge cutting costs, like hiring software developers without sufficient relevant experience in flight control systems, limiting testing/simulation/crosschecking, etc etc.

    This was entirely preventable and even predictable.

    Boeing owns this. Hope they've saved plenty of cash for the lawsuits and other legal troubles that will be incoming.

    Strat

    • Have to agree here. Blaming software developers is dumb, as they never have any autonomy except maybe at a startup. They don't design systems, and they rarely understand the full details of what they're designing anyway, they're given a task (process sensor input) and not asked "have you checked all the numbers for the aerodynamic design?" And engineering in general may have build the thing but under the direction of management. I suspect most engineers didn't notice any flaws because they're compartment

  • by Goldenhawk ( 242867 ) on Saturday April 20, 2019 @09:25PM (#58465670) Homepage

    "Big strike #1" is totally incorrect. "Boeing produced a dynamically unstable airframe" is not the case. Rather, the engine change slightly reduced the stability to less than the minimum required by FAA regulations, thus requiring a compensation system to artificially increase the stability back to the minimum required. It was never unstable, PERIOD. It's still quite stable even without MCAS - just not quite as stable as required by regulation.

    I cannot disagree with the incredulity of designing this system with just one AOA sensor as an input. I also cannot fathom how they could possibly design it to NOT have a practical upper limit of its authority, or without an extremely visible notification of the action of the MCAS system. In the name of "we won't have to retrain the pilots" it violates a key tenet of automation: when you change the mode of operation, you notify the operator or user.

    FWIW, I am an aircraft flight test engineer with a specialty in stability and control, with 29 years of experience in the field, and over 10 years of testing on Boeing-derived commercial-class aircraft and autopilot systems. I've flown simulator variants of the 737 and its autopilot, and know exactly how confusing automation can be, especially when it does something unexpected. From the cockpit in real flight, I've watched trained, highly experienced test pilots completely lose their ability to focus on where the airplane is headed because they're trying to troubleshoot a relatively unimportant system that just messed with their sense of expectation. I have a lot to say about this crash, and none of it is good for Boeing.

    • 1-The designers could have auto-trimmed the aircraft to match engine thrust levels instead of angle of attack. That would make the aircraft fly like the previous design. 2-The trim toggles should work always, instead of forcing pilots to crank the trim up or down. 3-Use both angle of attack sensors and compare, the data is already in the computer.
      • 1-The designers could have auto-trimmed the aircraft to match engine thrust levels instead of angle of attack. That would make the aircraft fly like the previous design.

        No, it wouldn't, since the issue isn't due strictly to "thrust levels" but also things like airspeed and angle of attack.

        2-The trim toggles should work always, instead of forcing pilots to crank the trim up or down.

        Yeah, the electrical trim system should work after you turn it off. Good plan.

        3-Use both angle of attack sensors and compare, the data is already in the computer.

        Only smart advice. Good news: it's part of the fix.

    • by Luthair ( 847766 )

      It's still quite stable even without MCAS - just not quite as stable as required by regulation.

      A definition of unstable is not stable, if there is a requirement for for something to be considered stable and it isn't met then by definition it is unstable.

    • by jonwil ( 467024 )

      There are 3 big mistakes Boeing made, First was the decision that (in order to save money and to have something out fast to compete with the A320Neo) the new airplane had to have the same general size and shape (body, wings, tail, landing gear etc) as the old 737 rather than properly designing it so it could take the new engines without causing stability problems. (hence the decision to fix the stability with software rather than redesign the hardware to make it go away)

      The second was (again to save money a

    • by JoeyRox ( 2711699 ) on Saturday April 20, 2019 @11:13PM (#58465932)
      The 737 Max test pilots who work for Boeing disagree with you. They were the ones who pushed to have MCAS engage in more flight scenarios after they found the plane's handling less than acceptable:

      "After the test flights began in early 2016, Boeing pilots found that just before a stall at various speeds, the Max handled less predictably than they wanted. So they suggested using MCAS in those instances, too, according to one former employee..."

      Source: NYT: Changes to Flight Software on 737 Max Escaped F.A.A. Scrutiny [nytimes.com]
    • #1 jumped out at me too, but for a different reason: isn't this common? I'm not an aircraft flight test engineer, so you'll have to correct me if I'm off base here, but aren't there a lot of intrinsically unstable designs which are used anyway thanks to automated stabilization? Rockets come immediately to mind, and flying wings. I imagine the Osprey as well.
    • The Max is not unstable. It just flies differently than the NG. Which means pilots need to do difference training. But Boeing wanted to sell it as it flies the same as NG (and requires no new training) so they added MCAS to artificially make it seem to fly the same. MCAS was never needed from a safety perspective. For a marketing need an extraneous safety feature was added and that feature malfunctioned and crashed the plane.

  • by fahrbot-bot ( 874524 ) on Saturday April 20, 2019 @09:28PM (#58465680)

    It is astounding that no one who wrote the MCAS software for the 737 Max seems even to have raised the possibility of using multiple inputs, ...

    It was on the list of things to do "tomorrow" but the scrums kept running long, so ...

  • by Anonymous Coward

    The OP is out of his league. It sounds like slander to me.

    I haven't heard anything about software being buggy.

    If the engineers and analysts provided the wrong specs, that's not the programmers fault.

    If the software tester didn't find any bugs, then it's not the programmers fault.

    If management knew the software was buggy, and let the product ship, then that was managements fault.

    None of this points to the programmer.

  • by richieb ( 3277 ) <richieb@g m a i l.com> on Saturday April 20, 2019 @10:02PM (#58465748) Homepage Journal
    Nice discussion by an airline pilot, also showing how this situation would look in a simulator [youtube.com]
  • by BobC ( 101861 ) on Saturday April 20, 2019 @10:06PM (#58465760)

    I've written software and been a systems engineer for aircraft instrumentation, and I'm very familiar with FAA standards at all levels, particularly at the certification level. I'm also familiar with the front-end of the process, the gathering and analysis/refinement of requirements.

    Part of the Boeing problem has been assigned to the existence of DERs, independent consultants/contractors who are certified to act on the FAA's behalf.

    Some seek "friendly" DERs willing to grease the certification path. My employer was different, instead pursuing good professionals who were total assholes when it came to FAA certification. We fired more DERs than we kept when they didn't know their shit. It's the difference between an accountant who will help you cheat on your taxes in ways the IRS won't see, and a CPA who's more ethical.

    We actually hire two DERs: One very senior (and expensive) as an auditor, and a junior one he recommended who was willing to work in the trenches with us. She was a real trooper.

    Our goal was to learn how to do FAA certification both faster and better (less wasted effort). Not faster and cheaper or easier or sleazier. We were a small company, and one mistake would be the end of us. Our DERs helped us completely redesign our internal certification system, costly the first time around, and a bargain thereafter. Lots of work, but great results. The FAA loved us.

    Boeing views FAA certification as just another step in another process. It's shameful that people have to die for a company to change its status quo.

  • Agile! (Score:5, Insightful)

    by McLae ( 606725 ) on Saturday April 20, 2019 @10:48PM (#58465862) Homepage
    This the exact thing Agile is supposed to do.

    Do the first version fast to get it out the door making money.

    Fix the bugs in the next version.

    Creeping into every software project with the promise of faster to market, for more profits.

    They made their release date. Now time for next version with fixes.

  • It is astounding that no one who wrote the MCAS software for the 737 Max seems even to have raised the possibility of using multiple inputs, including the opposite angle-of-attack sensor, in the computer's determination of an impending stall. As a lifetime member of the software development fraternity, I don't know what toxic combination of inexperience, hubris, or lack of cultural understanding led to this mistake. But I do know that it's indicative of a much deeper problem. The people who wrote the code f

  • It's a engineering problem and one of simple greed. They stuck a engine under it that was two large for the air frame. As a result the moved it up where the intake was above the leading edge of the wings. This caused the plane to inherently pitch up under thrust. They should have never put that engine on that plane and it wouldn't have required so much tweaking of the software on the plane. All of this could have been overcome if they had not decided to make critical instrumentation a paid for upgrade.

  • by account_deleted ( 4530225 ) on Sunday April 21, 2019 @02:11AM (#58466262)
    Comment removed based on user account deletion

In these matters the only certainty is that there is nothing certain. -- Pliny the Elder

Working...