Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Network The Internet Networking News

The July Galileo Outage: What Happened and Why (berthub.eu) 49

New submitter Myself writes: There's a funny thing about a global satellite system that beams signals down to anyone to use: It also means anyone can monitor the performance thereof. So when such a system suffers a crippling days-long outage and the operators are tight-lipped about why, look no further than Bert Hubert (who you may know from the PowerDNS project) to scramble together a bunch of code and a worldwide network of volunteers, to analyze exactly what happened. This is the story of how and why the Galileo GNSS network was down for a whole week.
This discussion has been archived. No new comments can be posted.

The July Galileo Outage: What Happened and Why

Comments Filter:
  • Summary (Score:5, Insightful)

    by lgw ( 121541 ) on Thursday November 07, 2019 @01:27PM (#59391254) Journal

    TFS was weak even by Slashdot standards. Here's the good bit from TFA:

    The outage in the ephemeris provisioning happened because simultaneously:

    * The backup system was not available
    * New equipment was being deployed and mishandled during an upgrade exercise
    * There was an anomaly in the Galileo system reference time system
    * Which was then also in a non-normal configuration

    Every disaster hangs from the end of a long chain of fuckups.

    • I will reduce your accurate reply (and the linked article) to this:

      Three Stooges Syndrome with a dash of bureaucracy and a side order of too many contractors.

    • Re:Summary (Score:4, Funny)

      by Calydor ( 739835 ) on Thursday November 07, 2019 @01:44PM (#59391326)

      On the other hand it's pretty amazing it took this many fuckups of this magnitude to actually disable the system.

      • by lgw ( 121541 )

        Just goes to show: no matter how good the engineering, you can't idiot proof a system, because they will always invent a bigger idiot.

  • by Zoxed ( 676559 ) on Thursday November 07, 2019 @01:38PM (#59391300) Homepage

    To avoid unfair judgement readers should remember that the system is not yet live.

    • I'm sure the EU will get around to it someday.

      • The original schedule was for 24+6 satellites in 2020. Though I'm skeptical they will reach that goal, it is currently still the goal.

    • Re:Not yet live (Score:5, Insightful)

      by Myself ( 57572 ) on Thursday November 07, 2019 @01:41PM (#59391316) Journal

      They say that, but they also advertise that it's better than GPS.

      That needs a big fat "*when it works at all".

      • by BAReFO0t ( 6240524 ) on Thursday November 07, 2019 @01:56PM (#59391366)

        The fact that it had an outage already implies it worked the entire rest of the time.

        Remember that this "series of fuck-ups" extremely likey only happened, because when you work on a staging system, as opposed to a shipped system, whose explicit purpose is to be able to work on it, and you can save work by shutting it down, then that is what you do!

        Don't tell me you threat your test installation of the software you write like a stable live version. Because they you missed the point, or are abusing the words and really have an acual testing installation somewhere else.
        Anyone who ever had to work an a live system ... be it a high availability electrical power system or a server that half the planet depends on, knows it is a fucking nightmare, an order of magnitude more expensive and time-consuming (if done right), and you want to avoid it nearly any cost. So you have a test system
        (And if it is mission-critical, you always have everything thrice. So three simultaneous live systems, three hot spares, and three cold spares, on three sites,using three independent implementations. (Fuck unit tests!) And only on top of that, comes your staging/testing setup. Again in threes of threes.

        • by Myself ( 57572 )

          I don't think the article is actually critical of the downtime, moreso the doublespeak about it. Claiming that the system is "better than GPS" but has an availability target of 77% is just weird. Claiming that one person caused it, when quite a lot of dominoes had been set up just waiting to be knocked over, is also disingenuous.

          And FYI, I worked in telecom for ten years and had one service-affecting outage the whole time, which was contained to the maintenance window. I know a thing or two about redundant

          • Yes, a have to agree. That is just ... wtf?

          • by Slayer ( 6656 )

            The very ineffective communication has probably a simple reason: politics. If one senior member speaks up, then this senior member seems to have taken over the entire system, which runs counter to the claimed status as "international cooperation". If a representative of a smaller contributor speaks up, the larger contributors will be pissed, and if someone from a large participating country speaks up, accusations of taking over the whole systems will pop up. This leaves the European Commission as the only c

        • by GrahamJ ( 241784 )

          Netflix would like a word with you.

        • Where was my literacy when I wrote his?

          I am truly sorry. There is no excuse.

      • Non-live doesn't yet meet it's spec. Hardly needs a qualifier?

    • To avoid unfair judgement readers should remember that the system is not yet live.

      No, it's not yet at full operational capacity. It is however very much live and providing location services to many receivers, and has been for a few years already.

  • by Pyramid ( 57001 ) on Thursday November 07, 2019 @03:18PM (#59391708)

    It would seem the Galileo system is quite analogous to the EU itself... Fragmented, loaded with bureaucracy, suffering from serious communications issues and generally rife with fiefdoms within fiefdoms.

    • It would seem the Galileo system is quite analogous to the EU itself.

      If you remove the "herp dep teh EWWWWW" slant, you simply get:

      It would seem that [large organisation] is quite analogous to [other large organisation].

      • by Pyramid ( 57001 )

        "If you remove the "herp dep teh EWWWWW" slant, you simply get..."

        So we're both in agreement. Except for some reason, my statement left you butt-hurt and you felt the need to restate it in emotional terms.

        • I think everyone is in agreement. The fact you used it as a hit piece on the EU rather than focusing on all governments of the world, and that you were corrected for your stupid bias doesn't make the GP butthurt, it just makes you petty. To the Americans complaining about the EU government: Physician, heal thyself!

        • Well no not really. The fact these you singled out the EU and Europe is in itself bias. Then again i centrally expect drooling stupidity from anti Europe people. Your use of "butthurt" is consistent.

  • by ahu ( 4707 ) on Thursday November 07, 2019 @03:27PM (#59391738) Homepage

    I should really visit /. more often, very happy to have hit the news here! Managed to revive my absolutely ancient /. account too. Thanks for all the visits, and if there are any GPS/Galileo/GLONASS/BeiDou questions, fire away!

    • by Myself ( 57572 )

      Holy crap, a 4-digit! ;) Hey Bert!

      I was chuckling at the "new submitter" moniker, since I've been here since about 1997, just never got much into submitting stories, I guess.

    • Welcome back!

      Couple of questions:

      - What kind of equipment is used to monitor the GNSS systems? off-the-shelf receivers or do you use anything special?

      - Saw in the article that a "cold boot" takes some time to get the system to full precision again, could you measure the progress while it happen or did you only have data after it happened? (saw that the outage that made you interested, but don't know if you analyzed data from the outage itself or only after it)

      - Does your monitoring show that the galileo

      • by Myself ( 57572 )

        I can speak to the equipment question: Any receiver that can both receive Galileo and output raw data frames can theoretically be supported. In practice, the Ublox receivers are ubiquitous and cheap, everyone's using them, so they're the only ones supported right now. (Needs to be an 8-series or 9-series, I believe.)

        I just bought an F9P and I'm hoping to use its multiple communication ports to act as both a Galmon receiver and an RTK base at the same time. We'll see! I just got shipping confirmations so I d

  • The article was a fantastic read. Whatever your preferences and politics, the science of navigation satellites is close to a black art.
    • saying it's like a black art implies something special... it really is not, just wait until spaceX broadcast their position...

      billions and they still cant sort out a GPS augmentation system...

      • by Pyramid ( 57001 )

        To my knowledge, the Starlink sats have no onboard atomic clock nor does the ground based infrastructure that drives them provide anything close to the precision necessary to generate accurate ephemeris data needed for useful locationing.

Computers are not intelligent. They only think they are.

Working...