Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Transportation IT

The Shameful Open Secret Behind Southwest's Failure? Software Shortcomings (nytimes.com) 159

Computer programmer Zeynep Tufekci now writes about the impact of technology on society. In an opinion piece for the New York Times, Tufekci writes on "the shameful open secret" that earlier this week led Southwest airlines to suddenly cancel 5,400 flights in less than 48 hours. "The recent meltdown was avoidable, but it would have cost them."

Long-time Slashdot reader theodp writes that the piece "takes a crack at explaining 'technical debt' to the masses." Tufekci writes: Computers become increasingly capable and powerful by the year and new hardware is often the most visible cue for technological progress. However, even with the shiniest hardware, the software that plays a critical role inside many systems is too often antiquated, and in some cases decades old. This failing appears to be a key factor in why Southwest Airlines couldn't return to business as usual the way other airlines did after last week's major winter storm. More than 15,000 of its flights were canceled starting on Dec. 22, including more than 2,300 canceled this past Thursday — almost a week after the storm had passed.

It's been an open secret within Southwest for some time, and a shameful one, that the company desperately needed to modernize its scheduling systems. Software shortcomings had contributed to previous, smaller-scale meltdowns, and Southwest unions had repeatedly warned about it. Without more government regulation and oversight, and greater accountability, we may see more fiascos like this one, which most likely stranded hundreds of thousands of Southwest passengers — perhaps more than a million — over Christmas week.

And not just for a single company, as the problem is widespread across many industries.

"The reason we made it through Y2K intact is that we didn't ignore the problem," the piece argues. But in comparison, it points out, Southwest had already experienced another cancellation crisis in October of 2021 (while the president of the pilots' union "pointed out that the antiquated crew-scheduling technology was leading to cascading disruptions.") "In March, in its open letter to the company, the union even placed updating the creaking scheduling technology above its demands for increased pay."

Speaking about this week's outage, a Southwest spokesman concedes that "We had available crews and aircraft, but our technology struggled to align our resources due to the magnitude and scale of the disruptions."

But Tufekci concludes that "Ultimately, the problem is that we haven't built a regulatory environment where companies have incentives to address technical debt, rather than passing the burden on to customers, employees or the next management.... For airlines, it might mean holding them responsible for the problems their miserly approach causes to the flying public."
This discussion has been archived. No new comments can be posted.

The Shameful Open Secret Behind Southwest's Failure? Software Shortcomings

Comments Filter:
  • by skogs ( 628589 ) on Saturday December 31, 2022 @06:49PM (#63171194) Journal

    I don't think we need to press some stupid government oversight into this. This is typical mismanagement, and happens to punt them square between the legs as it should. Yes customers were impacted ... but more useful than a thousand angry phone calls and a government writeup or fine is the fact that they literally made ZERO dollars during this busy time of the year. They lost money. Aircraft that aren't moving don't make money.
    Refunds.
    Hotel reimbursements.
    Lost luggage deliveries.

    I even had a guy pop on a plane in colorado springs for a short hop to Denver. From Denver he was supposed to go to texas. The texas trip didn't happen. He sat at a gate for HOURS watching as the plane he was supposed to get on sat a few hundred yards back from the gate, because they didn't have the brainpower available to sit a pilot into the seat of the other aircraft and move it the hell out of the way ... or use a different gate to disembark passengers on. He ended up cancelling the whole thing, refunding, and paying for a damned uber to drive him an hour south back home.

    This is industry correcting itself. You screw up big enough and it hurts your wallet. Ignore it and it hurts more later. Then southwest simply goes out of business and firesales all their assets to a reasonable company that can manage to...manage...properly.

    • by youngone ( 975102 ) on Saturday December 31, 2022 @07:13PM (#63171224)
      Southwest got $7 billion in bailouts during covid and spent it on share buybacks like all those big companies did. They also gave their CEO a massive pay rise.
      Let's hope they're allowed to go broke.
      • Re: (Score:3, Informative)

        by david.emery ( 127135 )

        Those payments were to keep the employees on the job. How many layoffs did Southwest have during Covid times?

        Sure, I'd definitely go for some more regulation that specifically governs responsibilities, even in the face of weather or FAA problems, perhaps catching up to the EU's more comprehensive set of regulations governing these kinds of situations and airline responsibilities. But that's -independent- of the money that was pumped to keep the airline industry and their employees going.

        I do NOT want the

        • by thegarbz ( 1787294 ) on Saturday December 31, 2022 @09:01PM (#63171434)

          Those payments were to keep the employees on the job. How many layoffs did Southwest have during Covid times?

          Irrelevant. A share buyback is a good indication that they didn't need the extra money to pay employees.

          But to your EU comment, there's no regulations in the EU that govern this specific issue. There are airlines in Europe with just as flaky management systems in place, and there have been plenty of spectacular failures like this as well. The EU places more financial pressure on airlines though since they are on the hook if you get to your destination more than 3 hours late due to the airline's fault.

          • Re: (Score:2, Interesting)

            by david.emery ( 127135 )

            I'm advocating for passenger rights. You're advocating for government management of a business. Not the same.

            I do take your point about how Southwest spends its money, and that's more an indictment on the lack of provisions on the government funding than it is about how the company spent the funds.

            • You're advocating for government management of a business.

              I'm advocating for nothing, try and pay attention who you are talking to, we're not logged in AC. EU's regulations are passenger rights and they are a good thing that the USA should follow.

              But they do nothing for this problem other than provide some very minor (in the grand scheme of risk management) financial incentives.

        • I do NOT want the US to start following the EU's rules for 'how to design systems.' Southwest will suffer enough in both direct costs and reputation, they don't need government to start managing Southwest's internal IT.

          The EU mandates compensation for delayed or cancelled flights - although I don't know what happens if the weather or a volcano is the direct cause - but they certainly don't make any rules as to how the IT systems are designed. One exception here, they get very upset if personal information

      • Re: (Score:3, Insightful)

        by CWCheese ( 729272 )
        whomever modded the comment Insightful is delusional, those funds were for maintaining employment, not for capital improvement. had SWA spent the 7B on a new scheduling system, the same complainers would be complaining they misspent money that could have saved jobs during the #WuhanCoronaVirus BioWeapon scare
        • Re: (Score:2, Troll)

          by Rick Zeman ( 15628 )

          whomever modded the comment Insightful is delusional, those funds were for maintaining employment, not for capital improvement. had SWA spent the 7B on a new scheduling system, the same complainers would be complaining they misspent money that could have saved jobs during the #WuhanCoronaVirus BioWeapon scare

          So the stock buybacks were to preserve the finance department and the lawyers' jobs?

          • ...those funds were for maintaining employment, not for capital improvement..

            And yet they spent lots of it on share buybacks.

    • by Local ID10T ( 790134 ) <ID10T.L.USER@gmail.com> on Saturday December 31, 2022 @07:15PM (#63171230) Homepage

      yeah... the conclusion in the article/summary of "We need MOAR GOVERNMENT REGULATION" is bullshit.

      More government regulation does not solve technical debt.

      We all know technical debt builds up because it costs money to dedicate developer time to fix the issues. So they get ignored until they become critical.

      Southwest just got proof that this problem is critical. It cost them a lot of real money, and a big reputation hit -which will cost even more money to correct.

      Every business has a problem with technical debt. Most just get lucky and don't get publicly smacked in the face with theirs.

      • by brxndxn ( 461473 )

        Totally agreed.. Maybe we need some more government regulation about how the media (including fucking Slashdot) tends to push government expansion propaganda in the news. In a true free market, the government would regulate something like safety standards, publish scheduling and pricing stats, and that's about it.. Then, we, the consumers, would be able to choose our flights based on price and airline performance.

      • Comment removed based on user account deletion
      • That argument only goes so far, how many routes are only served by one airline? Obviously they tend to be the smaller ones but if I'm travelling between Hicksville and End-of-the-world, I want to use one airline if possible so that it is their responsibility to sort it out if a flight is delayed or cancelled.
        As an extreme example, Delta use (or used?) CIN as a major hub. That was fine for US nationals but a total PITA for foreigners entering the country because US Immigration only ran two lines for non-ci

      • More government regulation does not solve technical debt

        Indeed. The government itself is a poster child for technical debt, running its own systems on antiquated hardware and software.

      • Government regulation does not mean micromanaging an airline.

        Govt, through FAA can specify the rights of passengers. From being bumped off a flight, to delays, to cancellations to minimum seat width/pitch to restrooms in the plane and availability of drinking water , food for longer flights. Just specify these are the rights of the passengers, and the airline must carry third party insurance to meet the liabilities.

        We can even think of credit card companies and other travel services companies that pay o

      • Technical debt is more than just fixing the issue. Technical debt is very often from really crappy designs done too quickly. Sometimes in the startup days when staffing was based hiring your out of work friends, sometimes it's from wanting to beat the rest of the crowd to be first to market. But the end result is very often crap that is treated like precious scriptures that shall not be modified.

        Sometimes this is just normal business thinking: new features that are a bit shoddy will make us money, fixing

    • Absolutely!! Indeed, the mere fact that people are calling for regulation here literally just made me substantially more skeptical of regulation.

      This is essentially the *ideal* example of a case where the free market is working well. We have a very competitive market with little consumer lock-in where the individuals hurt by bad service are the people deciding what flight to purchase. People are going to be substantially less likely to buy tickets on southwest for awhile after this even ignoring all the lost revenue for those two days.
      Government regulation of this kind of technical backend would risk introducing a single point of failure and making things worse.

      If you want to impose government regulation in a situation like this then what shouldn't be regulated?

    • by BetterSense ( 1398915 ) on Saturday December 31, 2022 @07:22PM (#63171244)
      The problem is, the airlines as a group have no effective competition and they know it. Due to economic policy that hollowed out passenger rail in the 20th century exactly when it should have been invested in, the US is the only "developed" country without passenger rail, even for extremely train-friendly routes like Dallas to Houston, which city pair alone is the perfect distance for a train, and has over 100 flights a day, but we still can't build a train, because the US has billions for roads and roads only. So if you want to get somewhere, you can either drive on increasingly rickety, dangerous and crowded road infrastructure, or fly. The airlines aren't competing against anything. You are going to fly anyway, and they know it.

      Due to our governments' radical insistence on funding road and roads only, our only method of travel for medium and long distances is to burn jet fuel, even on medium distance routes that would be literally faster by train to start with. Anything happens to the oil supply, and our country is fucked. Not to mention from a climate POV we have committed to burning megatons of jet fuel when we could be riding in electric trains on routes that favor them. Brought to you by the same incompetence-as-smokescreen-for-corruption economic geniuses that decided you should lose your medical insurance if you get fired.
      • Passenger rail, as in AmTrak? That was so bloody inefficient!
      • Given the spectacular boondoggle that the LA San Francisco line has proved to be,

        https://www.theguardian.com/us... [theguardian.com]

        It's not clear that America can do trains. You're right of course in theory, but replicating the French experience with the TGV is obviously not easy.

        However it's easy to ignore the trains that do work in the US; the Boston DC route is a clear success - though could be better - whilst the amount of commuter rail always comes as a surprise. And of course Canada's record on long distance passenger

    • by fermion ( 181285 )
      Software is written to express business rules. Business rules prioritize regulations for which a company can be held accountable. So when we were stuck on a plane for hours on a tarmac, rules changed to make airlines accountable for that.

      Airlines are not accountable for weather. It is important for SW to say that staff and planes were available to push this only as a weather issue. But to prevent future cascading failures, there has to be some accountability for not minimizing the effect of weather on fl

    • Govt should not be dictating or micromanaging stuff.

      But it does have a role to play in preventing the race to the bottom.

      FAA can specify minimum seat pitch and minimum seat width and minimum arm rest width.

      FAA can specify the rights of the passengers when it comes to over booking, bumping off and rescheduling based on cancellations. It can demand airlines to carry liability insurance to pay compensation to passengers. Let the private insurance company and the airlines negotiate how best to ensure pass

  • It's true (Score:5, Interesting)

    by rmdingler ( 1955220 ) on Saturday December 31, 2022 @07:03PM (#63171214) Journal

    The system had 90's software running, according to some reports, and the airline was unable to contact flight personnel when the implosion was at it's peak. Pilots and flight staff were stuck in airports with stranded passengers.

    Southwest, though, created a niche market for itself by deviating from the Hub and Spoke [wikipedia.org] model of most mainstay carriers. TLDR: If you were flying from a medium sized market to another medium sized market, you could fly direct without a layover in Houston, Chicago, Dallas, Washington, or NY/NJ.

    The flaw in the ointment, as it were, is that when the arctic storm hit right before the holidays, Southwest's employees were scattered to East Hell and back, rather than concentrated around a few major airports. Logistically, they were proper fucked.

    • So the software is antiquated, is it?

      Buf if their entire operations model depended on getting crews to "commute" to fair flung airports, and these flight attendants and pilots were themselves stranded far from where they were supposed to be, along with all of the other storm-affected travellers, how is some fancy, not-30-year-old software supposed to get them out of this logjam?

      This carping about software on Slashdot is like professional groups for civil engineers who design bridges and tunnels screami

      • Of course. Folks bitch and moan about the things to which they are best acquainted. Don't get me started on the brother-in-law.

        Reconnecting with employees and rerouting stranded passenge3rs would've been easier with updated software, pero, the house had lost too many cards to recover from the disaster.

        The major airlines (Delta, American, United) have expensive agreements in place to expedite the placement of a disenfranchised passenger on a competitor's flight in the event of an individual stranding. Southw

      • They were unable to take advantage of crew and equipment where it sat to either reduce the magnitude of the problem or reposition remaining crew to where they needed to be. Essentially they waited a week until people should generally be where they were supposed to be to re-start. They front-ran a few flights though which helped them reposition crew that were in the wrong spots and get a few passengers moving early.

        On Monday the reaction in the industry was there isn't much they can do until Friday.

        Better

    • by skogs ( 628589 )

      This is a bullshit excuse. There were no emergency landings due to snow.

      You want to know where your pilots, ground crew, etc are? Right where they landed. The only real issue was that they let a busted ass computer tell them what to do. You're a pilot, you look outside and see a plane you're authorized to fly....get the hell in it and fly. That was the 60s style...and it wasn't hub/spoke then either.

    • Nothing wrong with 90s era software. Banks use software written in the 1960s. It’s been thoroughly vetted. Flight numbers are six digits long because of the IBM S/360 mainframe that managed them.

      • I was gonna say... I didn't know bits and digits could get old.

      • by Bert64 ( 520050 )

        Depends how many other things have changed since the software was written, often things that were never even considered at the time.
        You can quite often reach a situation where software no longer works, and requires major changes in order to keep functioning. Even a lot of small incremental changes can result in a messy and unstable patchwork.

        • For Southwest, the issue appears to be that they just don't have software to handle basic tasks, it's manual.

          Even if you aren't tracking a crews location, it would be pretty easy to have a self-service portal where they can report their location instead of having to make a phone call to the department that inputs that data and sit on hold for hours (per reports). It looks like they just ignored this kind of automation.
  • I don't really quite understand the problem. Yes, I get that airline reservation and booking systems need to be able to manage many simultaneous clients and that the various locking and other issues can get complex. So I understand why upgrading the whole system to more modern equipment might be pretty expensive.

    But I don't understand why it would cause a problem specifically in the recent situation. Here it seems like the problem was just coming up with an initial assignment of planes to routes and peop

    • From here, it looks like they were too optimized: they lacked rested pilots who could bring other pilots to where they needed to be in time for their next scheduled duty flights. When you have no spare crews and overbooked planes, it's easy to lock up such a system. Ask those of us who've done complex and optimized routing tables, especially at peak load of an unmaintained system.

    • It could have been manually scheduled if they had the procedures in place for employee reporting to support it. The storm and holiday put too much of the staff in abnormal positions and the airline literally had no idea where their staff was and if they were legal to fly. That last part is really important...

    • There are strict federal regulations for how flight crews are to be scheduled since the Colgen Air disaster. Sure having a manual schedule at the start of the day would have been nice if weather didn't cause a cascade of delays each of which required the scheduling software to verify that the now delayed flight was not going to work the crew beyond the federal limit. This software could handle a few dozen flights at a time and could take it several minutes to process this information. Multiply this by 1000s

    • From the insider reports: Southwest can't do the reschedule unless it knows where the planes and the crews are, and they don't track that data automatically. The crew have to make a phone call to report their position and the planes location, but due to volume they were on hold for anywhere from a few hours to double digit hours, just to report position.
  • by christoban ( 3028573 ) on Saturday December 31, 2022 @08:13PM (#63171352)
    OK, I was part of HP's effort to rewrite the entire industry's software, ground to top, including the scheduling software, about 10 years. It was union demands to spend less on software at the time that caused that entire effort to be abandoned. Had it been given another several months (after years of work), none of this would have been an issue.
    • You didn't even read the summary.

      • You're assuming the summary is an accurate reflection of the real state of affairs. I vaguely remember HP hawking some software to the airline I was working for at the time (and this was more like 7 years ago) but some other provider's software was held to be superior.
        The process was of no interest to me so I never bothered finding out more.

    • by maladroit ( 71511 ) on Sunday January 01, 2023 @12:36AM (#63171684) Homepage

      a) Southwest was never part of SABRE, and probably wasn't part of this effort. They have their own stack.
      b) This sure seems to resemble second-system syndrome.
      c) Unions? Really? That sounds like folklore, not fact.

      I really, really doubt that some magical system that was just months (MONTHS!) away from solving all of the industry's problems was just canceled due to some budgetary pressure.

      More likely: HP's customers took a look at was was being produced and said ... nope.

      • I really, really doubt that some magical system that was just months (MONTHS!) away from solving all of the industry's problems was just canceled due to some budgetary pressure.

        I remember hearing that British Airways did exactly that. Some software package was running late (it always does) but almost complete, then a financial crisis hit the airline industry and BANG, cancelled. As to when this happened, I think it was the fallout from 911.

    • LoL,

      In your haste to blame TEH UNIONS on everything, you've come off looking like a complete fuckwit.

      With this level of ineptitude, I suspect the reason whatever software you were writing was cancelled was done to your own incompetence.

  • by 140Mandak262Jamuna ( 970587 ) on Saturday December 31, 2022 @08:28PM (#63171376) Journal
    The FAA has been issuing so many rules about pilot and crew rest periods, total time over a rolling period, exceptions and additional rules. They have been adding rules rather simplistically, repeatedly scanning for pilots who would qualify for each flight. and updating availability tables.

    It is quite easy to mess up the logic and make it O(N^3) or O(N^4). Back in 2000 or 2010 adding an outer loop to scan the whole table looked easy, and probably reduced testing time to roll out the upgrade or implementation of a new rule. But as the number of flights and crews expanded the scaling issues are coming back to bite them.

    Often developers implement a bad solution, knowing fully well the solution is bad and does not scale. But redoing the architecture would mean so much of testing to prove correctness, adding one more loop lets the developer explain to the legal side the implementation would meet the legal requirements. Moore's law saved their asses for a while. One blogger claimed Southwest can only schedule 300 or 400 flights a day! After two days, all the pilots and crew have been idled for 48 hours and the pool of available crew increases, and the scanner hits available pilot/crew sooner in the loop and they are able to finish the whole schedule six or seven loops deep.

    • It's not about adding "one more outer loop." It's about testability.

      I work in the mortgage industry, where there are thousands of interacting regulatory requirements, that vary from state to state. We manage these by implementing prioritized rule sets, and test using ordinary unit testing.

      In the 90's, when Southwest's code was written, unit testing wasn't a thing. Nobody had heard of the "single responsibility principle." Code tended to be written in extremely long blocks with many conditional expressions,

  • "But Tufekci concludes that "Ultimately, the problem is that we haven't built a regulatory environment where companies have incentives to address technical debt, rather than passing the burden on to customers, employees or the next management.... For airlines, it might mean holding them responsible for the problems their miserly approach causes to the flying public.""

    It's not up to the government to hold them responsible; that's the stockholder's and the board of directors' job. With the amount of money th

  • Uh, right. (Score:5, Insightful)

    by msauve ( 701917 ) on Saturday December 31, 2022 @08:43PM (#63171400)
    >Without more government regulation and oversight,

    Because government is an exemplar of fine software engineering. Just look at the IRS and DHS [slashdot.org]!
  • Be careful (Score:4, Informative)

    by pz ( 113803 ) on Saturday December 31, 2022 @08:53PM (#63171412) Journal

    Be careful, I say, be careful: Shiny and new is orthogonal to capable.

    From posts I've read written by SW pilots on another forum, the software at SW is utterly inept. That has nothing to do with how old it is. SW did not know where its pilots were. If your scheduling software doesn't know where the critical personnel are, and whether they are legal to fly, it doesn't matter if it's decades old or just installed.

    So the article, derisively criticising SW's software just because it is old, needs better perspective.

    • It's true that new software can be just as inept as old software. But antiquated software is _guaranteed_ to be inept.

      In the 90's, unit tests were not a thing. No one had heard of the single responsibility principle. Segmenting software into layers wasn't well understood. Functions and procedures often ran into hundreds of lines of code with many conditional branches, nobody knew what those functions actually did. Code side effects were common practice, leading to a tangled mess.

      Failing to modernize such c

  • of the article aside I hope that this doesn't hurt their overall business too much and they're able to fully recover.. Times are difficult enough as it is.
  • by LindleyF ( 9395567 ) on Saturday December 31, 2022 @09:50PM (#63171528)
    The engineer who steps up and fixes a tech debt problem BEFORE it causes a disaster will get a pad on the head at best, or a demerit for working on the wrong thing at worst. That same engineer doing the same work AFTER the disaster gets promoted.
    • Smart engineer notices the problem, tests the fix, holds it back till all hell breaks loose. Then comes in, knight in shining armor, fixes, collects bonus and retires
  • Is almost always the biggest technical debt. Companies have code that has been running for decades that has no version control and would be hard to match with it's source code and compile defines. Even if you did have the all the source code you would have to carefully reverse engineer it to know what the code is supposed to do. A lot of this code is bad. If it is written in C you might have a chance at reverse engineering it. If it is badly written in an object oriented language good luck.
    I suspect
  • see the BlancoLirio channel on youtube for a little different view. A few more details are added.

  • by jenningsthecat ( 1525947 ) on Sunday January 01, 2023 @04:25AM (#63171904)

    "Ultimately, the problem is that we haven't built a regulatory environment where companies have incentives to address technical debt, rather than passing the burden on to customers, employees or the next management.... For airlines, it might mean holding them responsible for the problems their miserly approach causes to the flying public."

    What's needed is very strong DIS-incentives to NOT address technical debt. There are already plenty of incentives - maintaining market share, not having to refund tickets, having happy shareholders, etc. But such incentives clearly aren't enough.

    I've said it before, and I'll keep saying it - the only short-term solution to badly behaving corporate persons is to put the asses of the responsible human c-suite persons in a non-club-Fed prison. Of course, the long-term solution is to dispense with the corporate personhood fiction and find another structure that serves both companies and society.

  • This sort of thing is inevitable, especially with a short time horizon. Any of us who have tried to justify costs of an overhaul or replacement of an old but mostly working system will be familiar: The first automation of a previously manual system pays out tremendously, so much so that people expect such payouts with all IT projects.

    Maintenence or replacement of an IT system cannot have the same payout -- the system still runs just as well as it did on day one and the probability/costs/length of failure

  • Everyone keeps telling us that in the future code will be written by AI. If that happens, we'll have meltdowns every day.

  • What was the company that just eliminated customer service, seeing it cheaper to lose customers than to try to retain them ?

Heisenberg may have slept here...

Working...