Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Microsoft Windows Linux

Linus Torvalds Defends Windows' Blue Screen of Death (itsfoss.com) 70

Linus Torvalds recently defended Windows' infamous Blue Screen of Death during a video with Linus Sebastian of Linus Tech Tips, where the two built a PC together. It's FOSS reports: In that video, Sebastian discussed Torvalds' fondness for ECC (Error Correction Code). I am using their last name because Linus will be confused with Linus. This is where Torvalds says this: "I am convinced that all the jokes about how unstable Windows is and blue screening, I guess it's not a blue screen anymore, a big percentage of those were not actually software bugs. A big percentage of those are hardware being not reliable."

Torvalds further mentioned that gamers who overclock get extra unreliability. Essentially, Torvalds believes that having ECC on the machine makes them more reliable, makes you trust your machine. Without ECC, the memory will go bad, sooner or later. He thinks that more than software bugs, often it is hardware behind Microsoft's blue screen of death.
You can watch the video on YouTube (the BSOD comments occur at ~9:37).

Linus Torvalds Defends Windows' Blue Screen of Death

Comments Filter:
  • by jhoegl ( 638955 ) on Saturday December 06, 2025 @08:42AM (#65839295)
    BSoD was telling you what was going on, but they made it difficult to understand what to do. When you are the main OS normal people use, then you need to make it clear what is going on.

    Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.

    Every time I got a BSoD, and debugged it, it was always a driver or hardware

    This has nothing to do with todays BS Windows crapware on their systems, just the general history of BSoD.

    With that I agree, BSoD was blamed by default, because it was telling you the problem in a terrible way.
    • by Insanity Defense ( 1232008 ) on Saturday December 06, 2025 @08:58AM (#65839317)

      I did the following things that reduced BSOD's massively.

      1/ UPS with brownout protection.

      2/ Put the swap file on its own partition.

      3/ Move all applications and data to their own partition leaving the C: drive with only Windows itself on it.

      My experience was that most BSOD's were 3 categories, power fluctuations, programs (including windows itself) interfering with the swap file and programs interfering with Windows on the hard drive. Also after that defrags were much less needed as the only partition that badly fragmented was the C: drive with windows itself fragmenting itself.

      • None of that makes any sense unless you have a drive that is woefully unreliable and starts corrupting shit in flight.

        There's no "interfering". You either write to the windows directory or you don't. You either write to the swap file or you don't (actually no you just don't full stop, software has no basis for nor ability to interfere with the swapfile without operating under elevated privileges). It doesn't matter where windows files are, they are always in the same place: %windir%, and software doesn't ma

        • Yeah, most BSODs I've seen were from shitty drivers of cheap hardware. Sometimes shitty drivers of expensive hardware. Sometimes shitty software with too many privileges.

          Hard to think of something else, but then I've been away from windoze for over 2 decades now.

        • Ever dive deep into your network, even at home? Seen Layer 1 errors? Why? It's just wire, wtf?

          But it happens. Even if you're not twisting the cables to see what happens.

          Wi-Fi of course is expected to have Layer 1 imperfections.

          • Of course that stuff happens, the point is that simply using a different partition doesn't have any impact on the result of those errors.

            • My point was that in an environment where random errors occur you may minimize the errors with reliable processes. But the unspoken caveat... If you have everything on one device, that becomes your point of failure. Partitioning a drive doesn't give you multiple points of physical failure. Logical points, perhaps...

      • I havent seen a blue screen in 10 years, except deliberately when trying new boundaries for over clocking.

    • by Tony Isaac ( 1301187 ) on Saturday December 06, 2025 @10:16AM (#65839371) Homepage

      I don't think it's fair to blame Microsoft for cryptic BSOD logs. When a memory chip goes bad, the OS doesn't have any way to know whether the chip is bad, or whether some driver bug caused the memory checksums to fail. If a component overheats and starts spitting out garbage, how is the BSOD supposed to diagnose that? If hardware is installed that isn't quite compatible, is the BSOD supposed to be able to display a nice, human-friendly message telling you that the model number of your component is a mismatch?

      A lot of errors that cause BSODs *are* technical and require a knowledgeable professional to diagnose. This is not unlike a doctor who has to take obscure human symptoms and piece together what is going wrong in a human body.

      • by AmiMoJo ( 196126 )

        If there were better APIs they could do that. An API for sensors, with a maximum safe limit supplied for each one. When a BSOD happens, it could do a little memory test on the affected area.

        • And how exactly would the OS know what the "affected area" was, or that the issue was bad memory in the first place? It could just as well be a bad controller, or an overheated chip, or a poorly-seated RAM chip, or any of a million other things.

          • by AmiMoJo ( 196126 )

            To crash the OS out wound need to be in the kennel, so do a quick test if that RAM using something like MARCH C.

    • BSoD was telling you what was going on, but they made it difficult to understand what to do.

      The BSoD only ever gave you enough information to tell you what driver crashed. Or a simple error code. It still does. That hasn't changed.

      Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.

      Be careful what you wish for. Error logs and the tools are great and all, but if a user is unable to go read on the MSDN Docs how to debug something they will not have a hope in hell of understanding the debug output either. Kernel panics are no better in this regard either. The average user (heck the average poweruser) has no hope in hell of understanding what went wron

      • Sometimes you'd get a BSOD that was a fairly clear call to action; when the error called out something recognizable as the name of part of a driver; but that is mostly just a special case of the "did you change any hardware or update any drivers recently?" troubleshooting steps that people have been doing more or less blind since forever; admittedly slightly more helpful in cases where as far as you know the answer to those questions is 'no'; but windows update did slip you a driver update; or a change in O
    • Generally speaking, once you've gotten the BSoD, you're not reading logs.

      And. "often it is hardware behind Microsoft's blue screen of death."

      It is not well appreciated that Windows is reliant on independent vendors, manufacturers, etc. for drivers in particular. This is at once the greatest advantage (Microsoft can 'welcome' any hardware manufacturer that will bother to write drivers) and greatest vulnerability (Microsoft has to either do a LOT of work to insulate Windows from bad drivers, or suffer the con

    • Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.

      You're talking about a kernel panic, which can be useless without a kernel memory dump. And you've already got the windows syslogs in the form of the event viewer, which is already going to capture anything relevant. Few normal people have idea what they'd even do with system logs. Virtually none of them have any idea what to do with a kernel memory dump beyond going to bleepingcomputer or reddit and posting "Every time I do this I get a bsod, help!" with advice often being to typically start pulling out ha

    • by tlhIngan ( 30335 )

      Windows NT used to give you a whole bunch of details when it hit a BSoD - NT4 bluescreens were wildly informative, but to the average user, completely useless. It was just a bunch of numbers that had no meaning to them or provide them with any pointer to what the problem was. It didn't help that many drivers adopted the 8.3 naming convention making it even more obscure.

      Also completely useless because the screenful of information was there but you couldn't do anything with it - you couldn't print it or anyt

  • It's been a while since I managed on-prem windows server instances - like 15 years ago - but I certainly remember seeing some BSODs on machines that had ECC memory!
  • by W2k ( 540424 ) on Saturday December 06, 2025 @09:08AM (#65839321) Journal
    That the infamous Windows BSoD, at least since the WinNT era started, are almost always caused by dodgy hardware, is common knowledge to anybody who has spent the least amount of time as a support tech on Windows machines. It's true that they could be better at communicating this.

    I've never used ECC in my personal machines - I'm sure it's great - but since the early 00's or so, BSoDs are just not a thing that regular users experience unless they have bottom-tier or broken hardware, and people that buy low quality stuff are not likely to want to spring the extra cash for ECC anyway.
    • Most people who have (not so) fond memories of the BSoD predate that era and experienced it on a daily basis. The problem was drastically reduced going from Windows 95 to 98 to 2000/XP, to the extent that it's impossible for hardware to be the primary culprit. Windows dominated the landscape, but they weren't the only OS around and nothing else was that unstable despite using the hardware of that era. Before NT, Windows was an absolute mess. I think the only reason most people put up with it was that they d
      • Win9x and Win2k (and the other NT descendants) are fundamentally different operating systems. In general, NT had a much more robust kernel, so system panics were and remain mainly hardware issues, or, particularly in the old days, dodgy drivers (which is just another form of hardware issue). I've seen plenty of panics on *nix systems and Windows systems, and I'd say probably 90-95% were all hardware failures, mainly RAM, but on a few occasions something wrong with the CPU itself or with other critical hardw

        • NT also virtualized the DOS environment (ntvdm) where 9x didn't and that is a very important distinction. In th 9x days, games and even drivers were executed in the DOS environment. As soon as you would open command.com, it was a ticking timebomb for when a blue screen would happen. If you could avoid DOS and 16bit memory shimmed binaries, 9x wasn't nearly as fragile as everyone generally dealt with

      • by jabuzz ( 182671 ) on Saturday December 06, 2025 @10:29AM (#65839387) Homepage

        Adding to that I was primarily a Linux user even back then, but would occasionally dual boot into Windows. So same hardware, BSOD galore or at least frequent enough to be very annoying, but Linux as stable as could be and not a kernel panic in sight.
        So clearly not all down to unreliable hardware.

      • Before NT, Windows was an absolute mess. I think the only reason most people put up with it was that they didn't know anything better was possible and since Windows was so widespread it was a misery everyone shared.

        I think that many of those people were also recent DOS users. Given that DOS systems would often simply freeze up several times per day and require a reboot (easy to do since any bug in the user's application could do this), once they added a protected mode pseudo-kernel to Windows (maybe starting with Windows/386 2.1), it was actually a slight improvement over what they were used to since DOS crashes could sometimes be isolated to one virtual terminal.

    • by yanyan ( 302849 )

      With win10 in recent months i noticed that the Nvidia video driver hiccups and throws a BSOD practically after each time the OS gets an update. The fix is to completely remove the driver and related files using a tool like DDU and then to reinstall the driver. The BSODs stop... until the next OS update. Which is why i keep the PC disconnected from the network unless completely, unavoidably necessary.

  • by Profiterole ( 7455138 ) on Saturday December 06, 2025 @09:09AM (#65839323)
    Back in 1995, on a brand new PC configured dual boot, no overclocking, I would get PANIC messages from the Linux kernel but Windows worked fine, no BSOD. Out of despair I reached out to Linus Torwalds and he very kindly helped me out, suggesting I had defective memory chips. He was correct. There were 2 defective memory chips. I guess Windows was just too slow to reveal the defective chips.
    • I meant Torvalds. Apologies.
    • by Zarhan ( 415465 ) on Saturday December 06, 2025 @09:31AM (#65839337)

      Linux has utilized - pretty much forever - all the available memory as cache/buffer, so you were bound to run into the problem much sooner.

      The Win95/98/ME could run for long time without ever accessing particular physical memory chips.

      Windows NT didn't have this problem, but on the other hand WinNT and successors also had better isolation so if a driver crashed due to memory issue, it recovered better (This applies really to WinNT 3.5 and perhaps 4, back when it was still going with the Dave Cutler's VMS-derived approach - WinNT 3.5 is almost a microkernel).

      • The Win95/98/ME could run for long time without ever accessing particular physical memory chips.

        This is surprising to me. Back in those days I had a pretty expensive custom PC with 32MB RAM, which was a lot for that time, and that RAM was constantly full. And when the 1.6GB hard disk started to fill, the OS stability really went to hell. Also, back in those days it wasn't just BSODs that you had to worry about - the OS would frequently lock up as well.

        Windows NT didn't have this problem, but on the othe

    • It could also be that Windows didn't allocate memory in the same pattern as Linux, and by chance didn't happen to need to use the defective chips. I've certainly seen Windows BSODs due to bad memory, so I know it's capable of detecting such problems.

    • It's nothing to do with speed. Windows / Linux has no ability to read or write faster or slower to RAM, the speed is set at boot. It's about accessing a broken area with data critical enough to cause a system error.

    • How did you confirm that this was the cause of the problem? By replacing the memory chips with known good ones?

    • I heard many stories in the mid 90 where people had loads of BSOD problems, but Linux would run more stable. At some point, I came across the explanation that Windows and Linux use memory differently, meaning that physically bad memory would cause Windows to crash early on, but Linux much later, and vice versa. But people who were running Linux in the mid 90s would rarely check if Windows would run better, so the vice versa part wasn't found out as often.

      Windows up to about 7 would crash more due to softw

  • Bad cooling was a huge problem for PCs as well. There was not only the cheap fans that quickly wore out, but also people who kept the computer in a drawer, or on the floor where it acted as a stationary roomba vacuuming up all the dust within reach.

  • Why is it that the vast majority of PC builders don't care about ECC memory? Why didn't Microsoft push for this instead of TPM 2?
    • Cheap (Score:4, Informative)

      by JBMcB ( 73720 ) on Saturday December 06, 2025 @10:49AM (#65839425)
      Because ECC adds price and, usually, is slower than regular memory. What has mainly driven PC hardware is gaming, and gamers care about speed, not long-term stability.

      RAM speed doesn't matter as much as it used to for framerates, though, unless you are overclocking a ton, in which case you don't care about stability anyways.
      • I don't know if I agree that it's mainly gaming. There have been a lot of uses for computers everywhere and rarely do you find ECC in business computers or computers in medical and research centers or even software companies
      • It adds price because Intel forced an artificial market segmentation down everyone's throats.

    • Because TPM takes the users computer and makes it Microsoft's. It is a builtin, pre-installed rootkit. It was never about making the computer safer for the user, but safer for Microsoft.

      • Because TPM takes the users computer and makes it Microsoft's. It is a builtin, pre-installed rootkit. It was never about making the computer safer for the user, but safer for Microsoft.

        It has a side benefit of showing the world who the ignorant people are, like you. TPM 2.0 doesn't restrict anything. It's nothing more than a secure storage area.

      • /facepalm

        TPM can't rootkit anything. It doesn't have interrupts, and it can't write to system memory. It's not much different from any device you'd connect over a serial port, meaning all your computer can do is send and receive data from it. Components attached to your PCI-e bus will have a much greater ability to rootkit you, like your GPU, your ethernet adapter, or even your USB devices. Yet, despite your yammering, you don't think twice about any of those.

        Its only purpose is to serve as a witness, nothi

    • Why didn't Microsoft push for this instead of TPM 2?

      That's like saying why doesn't the government focus on creating regulations for meat rather than funding a cure to ear infections. The two have nothing at all in common with each other. Not in use, practice, or application.

      Add to that, ECC has downsides. It's more expensive and it's slower. It has its place, but that place just simply isn't on most desktops. If you're running a computer crunching critical financial transactions, yeah ECC is a good idea. If you spend your time teabagging other players in CoD

  • I've described the boot process of a modern PC to people in the past, to make this point.

    From power on to firmware and POST. Then the lights are turned on for all the areas of hardware responsibility. Some happen right away, others further down the line. Boot managers, OS, drivers... logins, more drivers, then - finally - the system subsides into an orderly management of resources using an incredible juggling act of interrupt management and carefully segregated multitasking where everybody must be orderly.

    A

    • by Kokuyo ( 549451 )

      The same can be said for human reproduction and yet nobody's cheering for crack-babies.

      • If you deliberately introduce malware into your species boot process, it's not that evolution got it wrong.

  • Not that suprising with a software person.

    With ECC, memory will go "bad" too, just a bit later. If you buy good quality memory, it will be fine for far longer than the time you are going to use it. Bad memory will overwhelm ECC way before that time.

    • One major difference, assuming you've got full platform support(should be the case on any server or workstation that isn't an utter joke; but can be a problem with some desktop boards that 'support' ECC in the sense that AMD didn't laser it off the way Intel does; but don't really care); is that ECC RAM can (and should) report even correctable errors; so you get considerably more warning than you do with non-ECC RAM.

      If you pay no attention to error reports ECC or non-ECC are both rolling the dice; though
      • by isj ( 453011 )

        From 2004 to 2007 I had a system with ECC memory (dual Opteron). I checked the MCE logs regularly and it seemed that every day or so there were errors that ECC RAM fixed. If it had been plain parity RAM the problem would likely have gone unnoticed that potentially propagated errors.

        That made me realize that even with good components shit happens and a bug-free program can still crash due to memory errors.

        • by gweihir ( 88907 )

          That is why I run memtest86+ for several days on new RAM. I once had an Infineon module crap out after about 2 days.

      • by gweihir ( 88907 )

        ECC on memory is 1-error correction 2-error detecting. On 3 errors it will just correct to the wrong value. I agree that the warning is nice (if your OS gives you one and you pay attention to it), but relying on the correction is pure foolishness.

  • The best part is when Torvalds unknowingly calls Musk "too stupid to run a tech company."

  • Linus switches to Windows.

  • One thing I've learned from the past 10 years of supporting Windows 10/11 is it's almost always software. I had a problem with a keyboard doing runaway repeats on the Windows Hello PIN entry screen, and I swapped keyboards till I was blue in the face until I realized that it was the supplemental support software (SetPoint) which was causing the fault. Yes. Software is even screwing basic I/O devices now. It's a solved problem and software developers still manage to out-clever themselves into system instabil

  • On enterprise grade hardware with minimal third party drivers Windows has been dead stable for a very long time. Start adding shitty drivers for junk you bought at Walmart and suddenly it's unstable. Now add the heartbreak of trying to run expensive old audio hardware with half written drivers that were never fully updated for the post-Vista era (At home. EMU I'm looking at you.) and you'll really see some fireworks.
  • You run Windows and it blue screens. You run Linux or FreeBSD on that same hardware and it doesn't panic or log anything that looks like hardware issues, it runs for years without issue.

    Yes ECC RAM solves random bit flips, but they happen so infrequently they can't be the cause of every Windows BSOD.

Don't get suckered in by the comments -- they can be terribly misleading. Debug only code. -- Dave Storer

Working...