Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Richard Feynman, the Challenger, and Engineering

Posted by CmdrTaco on Wed Feb 20, 2008 11:28 AM
from the this-is-not-warm-fuzzies-on-a-cold-morning dept.
An anonymous reader writes "When Richard Feynman investigated the Challenger disaster as a member of the Rogers Commission, he issued a scathing report containing brilliant, insightful commentary on the nature of engineering. This short essay relates Feynman's commentary to modern software development."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • I'm a software developer. I would like to think of myself as an engineer but to me that's a higher title that belongs to people who actually engineer original ideas.

    The problem with the shuttle disaster (both of them, really) is external pressures that are not in anyway at all scientific. The pressure from your manager at Morton Thiokol to perform better, faster and cheaper. The pressure from the government to beat those damned ruskies into space at all costs.

    So this is really a case of engineering ethics, when do you push back? As a software developer, I never push back. Me: "There's a bug that happens once every 1,000 uses of this web survey but it would take me a week to pin it down and fix it." My Boss: "Screw it--the user will blame that on the intarweb, just keep moving forward." But could I consciously say the same thing about a shuttle with people's lives at stake? No, I could not.

    So when an engineer at Morton Thiokol said that they hadn't tested the O-Ring at that weather temperature that fateful day and that information was either not relayed or lost all the way up to the people at NASA who were about to launch--it wasn't a failure of engineering, it was a failure of ethics. External forces had mutated engineering into a liability, not an asset.

    And there's a whole slough of them [wikipedia.org] I studied in college:

    * Space Shuttle Columbia disaster (2003)
    * Space Shuttle Challenger disaster (1986)
    * Chernobyl disaster (1986)
    * Bhopal disaster (1984)
    * Kansas City Hyatt Regency walkway collapse (1981)
    * Love Canal (1980), Lois Gibbs
    * Three Mile Island accident (1979)
    * Citigroup Center (1978), William LeMessurier
    * Ford Pinto safety problems (1970s)
    * Minamata disease (1908-1973)
    * Chevrolet Corvair safety problems (1960s), Ralph Nader, and Unsafe at Any Speed
    * Boston molasses disaster (1919)
    * Quebec Bridge collapse (1907), Theodore Cooper
    * Johnstown Flood (1889), South Fork Fishing and Hunting Club
    * Tay Bridge Disaster (1879), Thomas Bouch, William Henry Barlow, and William Yolland
    * Ashtabula River Railroad Disaster (1876), Amasa Stone
    So I agree with Feynman's comments in relationship to engineering and the further comments to software development. But I don't find them to be a fault in the nature of engineering, just a fault in our ethics. What does capitalism and competitiveness drive us to do? Cut corners, often.
    • by DBCubix (1027232) on Wednesday February 20 2008, @11:44AM (#22489542)
      The Kansas City Hyatt Regency walkway collapse was an engineering problem. The contractor asked to take a shortcut (instead of threading a nut up a three story threaded rod, they asked to cut the rod and offset it several inches) and the engineers rubber-stamped it without checking what the ramifications would be. The engineering part was not originally flawed, but it was when they approved the change order.
      • by Sanat (702) on Wednesday February 20 2008, @12:49PM (#22490620)
        I stayed at this Hyatt over several different weekends while there was dancing and music on the ground floor. What would happen is that several individuals would get the walkways to start swaying and then reinforce the sway by shifting their bodies at the right instant causing additional sway from the positive feedback. it was not unusual to experience 3 to 4 inches of sway.

        Although this swaying is not normally mentioned in the articles about the construction of the Hyatt, it went a long way towards weakening and stressing the connectors supporting the floors.

        Two of my friends were dancing on the floor when the walkways gave way and both were killed.

         
      • Re: (Score:3, Interesting)

        Same thing happened with the Citibank building in NYC - fortunately that error was caught by a student studying the plans!
      • by russotto (537200) on Wednesday February 20 2008, @01:45PM (#22491552) Journal

        The Kansas City Hyatt Regency walkway collapse was an engineering problem. The contractor asked to take a shortcut (instead of threading a nut up a three story threaded rod, they asked to cut the rod and offset it several inches) and the engineers rubber-stamped it without checking what the ramifications would be. The engineering part was not originally flawed, but it was when they approved the change order.
        Right, except that the original design wouldn't have worked, as the integrity of the threads could not have been maintained during construction and thus the nut could not have been put on. So in software terms it was a last-minute patch to fix a show-stopper, which wasn't adequately unit-tested.
        • I've just looked at the Wikipedia article (and sketch) showing thw defect. As an engineer (yes, a real one, albeit mechanical discipline), all I can say about what was done is, what an unimaginative solution.

          They could have still split the threaded rod under the upper walkway, and re-joined it with a threaded coupling, just below the nut supporting the upper walkway. If the nut can support the upper walkway, then the threaded coupling could easily support the lower walkway.

          In my experience, the solution u
        • Threading also takes material from the total material and improper threading will cause fissures in the material which under stress cause failure of material.
          This was a combination failure. Like most failures it requires many things to come into alignment before the disaster occurs. The Space Shuttle and Sky Bridge did fail because of one thing, but several factors that came together that occurred simultaneously then this disaster occurred. If any one of these factors where to be mitigated or removed then t
    • by Vicious Penguin (168888) on Wednesday February 20 2008, @11:48AM (#22489604)
      > What does capitalism and competitiveness drive us to do? Cut corners, often.

      Maybe, but remember what your own example shows -> What is the cost/benefit of fixing/preventing an error? Is a week of debug time worth missing your target ship date? Maybe, maybe not - depends on the error.

      A blanket indictment of capitalism is quite unfair. You would still have the same cost/benefit analysis regardless of economic system you toiled under.

      Is is not possible to engineer against all eventualities; trying to do so will usually keep you from ever getting off the ground.
      • by Protonk (599901) on Wednesday February 20 2008, @12:01PM (#22489800) Homepage
        This is true to an extent, but safety concerns can and should be engineered for. You are absolutely right that there exists no direct corollary between software debugging for some non-critical application and meeting safety margins for a critical product. However some software IS critical. Flight software (This portion of Feynman's essay about NASA's flight software is amazing), software for hosptial applications (pharmacy, PCA's, microsurgery), ABS/suspension control software. Those are applications with VERY critical outcomes. Safety conerns need to be built in to the process.

        But I do agree that tradeoffs occur under any system. Those tradeoffs just let us make better decisions under capitalism whereas we can't allow the information from those tradeoffs to inform us economically in a socialist system.

          • by Protonk (599901) on Wednesday February 20 2008, @04:59PM (#22494504) Homepage
            It's not a random assertion at all. It's a foundation of economics. the world is full of information particular to place and time, on other words, the nitty-gritty. If you were to make a statistical model of part of the world, that stuff would get buried in the "other" term. Unfortunately, where there is a lot of "other" it becomes hard to model. Take for instance, who to give cars to. Should I have a survey and have the outcome determines who gets the car? Should I give the car to someone who needs it the most or will use it the most effectively? How do I judge that? how do I stop people from lying to me? I could, alternately, just sell the car to someone for an agreed upon price. That means I learn at least how much it is worth to them (it may be worth more) and the car goes somewhere. Prices transmit information and preferences better than any 5 year plan or government study. Sometimes markets have failures and those need to be dealt with, but that is not what I am talking about.
    • Re: (Score:3, Interesting)

      There are other disasters that don't stem from the profit motive:

      Loss of the USS Thresher during initial sea trials.

      Steam Line Rupture on the USS Iwo Jima.

      Both of those were caused by engineering (the first) or procurement faults.

      The thresher was lost with all hands due to (among other things) a failure in modeling the high pressure air system and inappropriate welds on seawater systems.

      The Iwo Jima suffered a steam line rupture that killed a few guys because the wrong material was used on a high pres/temp
    • Re: (Score:3, Informative)

      There is a point you miss there I think. It is the top-to-bottom design philosophy vs the bottom-to-top. The first one gives objectives first then designs every part so that it fulfills the general objective. The latter focuses on designing simples elements and assemble them as more complex elements with defined capacities and known weaknesses.

      This article states that the second approach is inherently better than the top-to-bottom approach. This is clearly an engineering problem. I am not sure I agree wi
    • In order to have a real sense of the "nature" of engineering, you have to look at more than the failures. You have to look at the successes that occurred in the midst of these same pressures. I'd start by looking into the Manhattan project, of which Feynman played a part in. The exercise of finding other examples is left for the reader.
    • I'm a software engineer too. However I've worked on projects where a software failure could get people killed or destroy hundreds of millions of dollors of "stuff". For example the software might be processing radar data inside a little gadget that flays at mach four and caries an explosive warhead. In those cases to don't just say "the user will blame the bug on The Internet" and let it go.

      The thing with software is that it is such a wide field. If you are wrinting a web based survey program, so what i
    • Re: (Score:3, Insightful)

      I'm a software developer. I would like to think of myself as an engineer but to me that's a higher title that belongs to people who actually engineer original ideas.

      Well I know I'm missing the point of your post with this, but a quick google comes up with this description of an engineer:

      a person who uses scientific knowledge to solve practical problems

      I think your higher title should be an 'inventor'. Engineers are the guys that generally plod away using well tested mechanical or other scientific knowledge to get everyday jobs done (just like a software engineer really?). I work as IT support/coder for a bunch of engineers here and while they sometimes may be using old ideas in new ways, most of their work is just that plodding awa

      • Don't tell that to the "professional engineers", though. Their head will fly off if they're one of the 80% of those who think that "software engineer" is tantamount to blasphemy.

    • Re: (Score:3, Insightful)

      Blaming the shuttle disaster on capitalism is erroneous. I do not necessarily disagree with your assessment in general, but capitalism was not at fault in that particular instance. What was at fault was bureaucrats trying to look good to their superiors and present a positive public image at the cost of real engineering.

      I would say that in general is the meta-problem, not capitalism. In its current form in the US capitalism has caused the existence of many large entities that use hierarchical systems of

      • by esocid (946821) on Wednesday February 20 2008, @12:09PM (#22489946) Journal
        Apparently you've never taken engineering ethics. The first class I had to take as a general engineering major. Needless to say, I changed majors but still got a hell of a lot out of that ethics class. The parent was right. These were all cases of cutting corners, either in terms of cost or time. Managers wanted it done quickly and cheaply, whether that meant mixing concrete improperly, or buying sub-par materials, or just ignoring what the engineers are telling them. It always came down to about 95% managerial and the rest engineering error.
  • wow (Score:5, Funny)

    by loconet (415875) on Wednesday February 20 2008, @11:36AM (#22489394) Homepage
    For a second there I thought I read "Rogers Communications" and "brilliant" and "engineering" in the same sentence. I thought I had been kicked to an alternate universe where I wouldn't be able to escape. I am glad to be back.
  • Did anyone get through before the story hit the front page? I'd be interested in reading, but Google doesn't have a cached version of the story.
  • by StarfishOne (756076) on Wednesday February 20 2008, @11:39AM (#22489454)
    A future essay relates Feynman's commentary to modern web hosting, load balancing and the so-called Slashdot effect"
  • Mirror (Score:3, Informative)

    by fishdan (569872) on Wednesday February 20 2008, @11:40AM (#22489480) Homepage Journal
    http://duartes.org.nyud.net/gustavo/blog/post/2008/02/20/Richard-Feynman-Challenger-Disaster-Software-Engineering.aspx [nyud.net] As a side note, could someone make a grease monkey script to make all links frmo /. run through coral? it just makes sense
  • by Protonk (599901) on Wednesday February 20 2008, @11:44AM (#22489538) Homepage
    To be fair, the Challenger disaster actually preceeded NASA's slogan and procurement policy of "faster, better, cheaper" by a bit. More to the point, Feynman's article should be a cautionary tale to ANYONE in a engineering field. It isn't a matter of one field being subject to unscientific pressures and another field being immune. No technology or industry is immune from the pressures and problems that caused the challenger disaster. Anyone who claims to be well adapted to safety concerns enough to not spend lots of time and effort on fixing them is foolish. The nuclear industry still has to practice strong QC on parts, procedures and maintenance and CONTINUE that practice. Same with commercial aviation, acute medical care, etc. Constant vigilance is rewarded only with another uneventful day. That is the fundamental problem. Vigilance is expensive and time consuming. these are not pressures from the profit motive. They apply to government as well as civilian ventures.
    • Yes, like it or not cost analysis and time to market are integral to engineering. Finding the correct balance is what make a great engineer.
      • Right, but these tradeoffs exist everywhere. If the engineering team had been allowed to do their work appropriately they would have accessed the 2 pc o ring and rejected it based on safety concerns. The fact that it came from a lower bidder isn't really prima facia evidence that capitalism caused the challenger accident. :)
  • by sphealey (2855) on Wednesday February 20 2008, @11:44AM (#22489540)
    (I will refrain from a four-step Profit post). Standard technique: latch on to an essay by a brilliant and insightful person. Extend the insights of that person slightly into a different field with usual compare-and-contrast, brand-extension writing techniques. Claim that resulting essay (and self) are as insightful as the original essayist.

    It doesn't work 99.994% of the time, generally because very few people are as insightful as the original brilliant person.

    sPh
    • Re: (Score:3, Interesting)

      good point. I would suggest reading up on Dr Feynman as a precursor. Or, for those who prefer the flickering screen; there are several video interviews with the great man. One from Horizon called "The Pleasure of Finding Out" is VERY watchable. Also his book "Surely You're Joking Mr Feynman" is a hoot! Highly recomended. Richard Feynman is one of the greatest safe crackers who ever lived and in the top 10 of minds of the 20th Century.
    • While most commentaries on brilliant analysis are not brilliant, a few are.

      Edward Tufte's analysis of Dr. Feynman's brilliant analysis is brilliant, warranting a full chapter in Visual Explanations [amazon.com]. What makes it special is that it is not "hey, yeah, that's a good idea, I'm smart too" but instead a study of why Dr. Feynman's analysis is brilliant.
      • Re: (Score:3, Informative)

        I don't have my copy of Visual Explanations handy, but I've read it and I was at a talk Tufte gave on this subject, and my recollection of it is rather different. Without directly criticizing Feynman, Tufte actually comes up with a significantly superior analysis of the root cause of the disaster. Feynman spread he blame around many places, finding bad science, bad engineering, inaccurate statistics, poor procedures and documentation, politics influencing design, and most importantly and famously, a disconn
  • Hm. (Score:5, Insightful)

    by gardyloo (512791) on Wednesday February 20 2008, @11:51AM (#22489630)
    The blog post makes a nice contribution by linking to Feynman's original thoughts (for example, here: http://www.ranum.com/security/computer_security/editorials/dumb/feynman.html [ranum.com] ), ones I haven't read for a long time (and was happy to be reminded of). However, the author makes the mistake of thinking that the original thoughts need to be interpreted and summarized for the reader. Feynman's words by themselves are simple to understand, are concise, and contain just the tone for which geeks go gaga. Anyone interested in the subject will be able to make his or her own judgements about the engineering and politics involved in the Shuttle development, engineering in general, and the extensions to software development.
    • Re: (Score:3, Insightful)

      This is a very good point. Feynman has the unique quality of startling intelligence, curiosity, and straightforwardness. Some authors need to be summarized. Feynman just needs to be trotted out every generation or so.
      • Re: (Score:3, Informative)

        I agree, and tried not to summarize at all. Mostly I just tried to link what Feynman said to software, rather than make a fool of myself paraphrasing him. That's also why the entry is really short, and basically tells people to go read the source :) cheers.
        • Oh absolutely. I can't read the article right now. :) But I'm not going to crucify you for making the parallels. I remember reading the chapters about NASA's flight software testing and getting goosebumps. It's THAT good. I think you are right for making that parallel and suggesting its relevance. There are a fair number of coders alive today who weren't adults when Mr. Feynman was alive, sadly.
  • And here I was on the verge of releasing my twin papers on how the 9/11 Commission Report can be applied to software development, and how the Warren Commission Report on the Kennedy assassination applies to P2P.
  • Surely You're Joking (Score:3, Interesting)

    by Yoweigh116 (185130) <yoweigh AT gmail DOT com> on Wednesday February 20 2008, @12:00PM (#22489782) Homepage Journal
    Offtopic, but I highly recommend Surely You're Joking, Mr. Feynman [amazon.com], the autobiography he narrated on his deathbed. It's got some great stories in it, like when he surreptitiously went around picking locks at Los Alamos or his personal recollections of the Trinity nuclear tests.
    • It isn't really off-topic. I think the essay in question comes from the other volume (What do you care what other people think?). Both are outstanding books and well worth the shelf space.
  • I'm not sure if he is stating that a bottom up testing method is readily available in all situations, but it sure is a hell of a lot easier with data rather than with physical designs. Scanning and testing code is much easier than building a CPU and testing it from the bottom up (not that I ever have). He does make the distinction that it is less costly in the long run, and I'd probably agree with him, not from experience with this particular application, but experience in general with preventative maintena
    • For critical applications, bottom up design is not impractical. It is impractical for non-critical applications. Even with physical applications, bottom up design has some clear advantages.

      I do not personally feel that one of those advantages is overall cost savings. I think that most top-down design programs are cheaper overall than their bottom-up counterparts (all things being equal). However the benefit in terms of clear and understandable safety margins is almost impossible to replicate.

      Easy exampl
  • by Martin Spamer (244245) on Wednesday February 20 2008, @12:18PM (#22490080) Homepage Journal
    The biggest problem is most software developers are NOT chartered professional software engineers, so have no personal, professional and legal responsibility for their work. That is why IT is full of cowboys and trust is nearly none existent. Software Engineers must become a chartered only profession, so that people who are not chartered are not allowed to practice.

    To qualify as a Professional Engineer we should place good practice above short term gains. Professional Engineers should be truthful and objective and have no tolerance for deception or corruption. Professional Engineers only work in areas were they are competant. Professional Engineers build their reputation on merit and their skills through continual learning and the skills of their charges through ongoing mentoring.

    We wouldn't have to put up with the shoddy work of cowboys, because they wouldn't be allowed to practice. We wouldn't have to put up with orders that counteract professional ethics or good practice, because legal responsibility trumps commercial pressures. The professional wouldn't be undermined by fast to market but poor quality work. We could place trust in third party tools, software & services and we would not have to put up with EULA that diavowed responsibility for damage.
    • Your heart's in the right place, but it would not and cannot work.

      Why? Simply - an excess of demand and a shortage of resources. There is simply too much demand for software development and there aren't enough Computer Science curricula in existence to meet that demand.

      And this is coming from a degreed engineer. Not a licensed professional, however. Yeah, I took and passed the EIT, but never went for the PE. Why? In my original field, telecommunications, there never was any requirement at any of my employer
  • They said that the management at NASA didn't want to cancel the flight of the challenger because it was such a high profile launch even though they were warned about the O rings.
  • May be there will be some sunny day when I will listen to what Linus Pauling says about vitamin C, what Fomenko [wikipedia.org] says about history [wikipedia.org] and what Richard Feynman says about programming.

    But that day is not today.
  • by gosand (234100) on Wednesday February 20 2008, @01:43PM (#22491510) Homepage
    I've been in software quality and testing for 14 years. I've worked at very large corporations as well as startups. There is a WIDE gap in software development process in our industry. Many people like to call themselves software engineers when they are developers. There is a huge difference. Engineering is a discipline that follows well-defined rules, and it usually takes time. But I think the very important thing to point out is that some software requires engineering - other software does not. If I go into a startup company that is trying to develop a blog/wiki site and try to implement a NASA-like software development methodology, they will fail. Likewise, software to control a heart monitor should be engineered and closely controlled. Sometimes quality and perfection is the goal, other times it might be time-to-market that is critical. You have to fit the process to your business. A bridge is a bridge, and they should all be engineered pretty much in the same way. You can't say the same thing about software.

    I think that this is a very key point to software development. I have seen companies who spent entirely too much time and money trying to eliminate all defects from their software when it wasn't the critical part of their business. Yes, we should always strive to eliminate defects, but you can't get them all. You have to know when to pick your battles, and when to accept the risks. If we're talking about life-or-death software, or security, or other very critical things - you need to focus on those.

    There's a grid I have seen used that is a great tool when doing projects.
    Schedule, Cost, Quality, Scope.
    1 can be optimized, 1 is a constraint, and the other 2 you have to accept. Period. It is a more useful version of the "fast, good, cheap - pick two"
  • by wannabegeek2 (1137333) on Wednesday February 20 2008, @03:37PM (#22493264)
    I work in the aerospace industry, specifically an airline, as a manager of an Engineering subgroup. (if "manage" is what you call what I do)

    One of the first things I have a new hire do is read Feynman's appendix to the Challenger Report. Primarily to instill a respect for dealing with data, not desires or pressures, and to (re)enforce the concept that "it worked last time", does NOT make it right or safe to do the same thing again.

    The pressure / desire from above or parallel organizations within the company is constant, and usually precipitated by the latest operational interruption. All to frequently the refrain is along the lines of "but last time you authored a deviation, this is only a little bit more". When I feel the pressure is starting to cause situational ethics creep, I pull out Feynman's appendix, and read it myself, or have the affected person on my staff read it.

    It is amazing how effective it is in restoring sanity, and a healthy respect for the ability of the hardware to kill you (and / or your customers).

    Richard Feynman gave many things to this world, and especially certain segments of it. It's my opinion however that one of his best and most unsung gifts was the Challenger Report Appendix. It should be required reading for ANYONE who will ever touch or direct action on hardware that could even remotely present a potential for injury or death.

    The message was not rocket science, but as the Columbia accident proved the rocket scientists still can't get it right.
    • I scanned TFA and I'm not sure he has a clue about Linux, IMHO.
      He appears in the 5th panel of this very cool webcomic [dresdencodak.com] and you don't!

          • Re: (Score:3, Interesting)

            I had never heard of Dresden Codak before this post but am now getting hooked while going through the archive. I think it's hilarious, but then I grew up in Los Alamos...

            The linked comic is funny in a postmodern way (wondertwins vs. historical quantum theory) and the art is fantastic. A lot better than I could ever do.