Linus Torvalds Defends Windows' Blue Screen of Death (itsfoss.com) 70
Linus Torvalds recently defended Windows' infamous Blue Screen of Death during a video with Linus Sebastian of Linus Tech Tips, where the two built a PC together. It's FOSS reports: In that video, Sebastian discussed Torvalds' fondness for ECC (Error Correction Code). I am using their last name because Linus will be confused with Linus. This is where Torvalds says this: "I am convinced that all the jokes about how unstable Windows is and blue screening, I guess it's not a blue screen anymore, a big percentage of those were not actually software bugs. A big percentage of those are hardware being not reliable."
Torvalds further mentioned that gamers who overclock get extra unreliability. Essentially, Torvalds believes that having ECC on the machine makes them more reliable, makes you trust your machine. Without ECC, the memory will go bad, sooner or later. He thinks that more than software bugs, often it is hardware behind Microsoft's blue screen of death. You can watch the video on YouTube (the BSOD comments occur at ~9:37).
Torvalds further mentioned that gamers who overclock get extra unreliability. Essentially, Torvalds believes that having ECC on the machine makes them more reliable, makes you trust your machine. Without ECC, the memory will go bad, sooner or later. He thinks that more than software bugs, often it is hardware behind Microsoft's blue screen of death. You can watch the video on YouTube (the BSOD comments occur at ~9:37).
Re: (Score:2)
Not sure if Linux is dead, but Linus just fired elona from the service formerly known as twitter.
https://www.instagram.com/reel... [instagram.com]
For incompetence.
What does a guy have to do to get a funny mod? (Score:3)
Not sure if Linux is dead, but Linus just fired elona from the cesspool formerly known as twitter.
FTFY
Re: (Score:2)
Fix humbly accepted.
Re: (Score:2)
It works on my machine better than Windows.
But mostly because i use an AMD video chip, AMD should just give up and port the linux drivers to Windows already.
BSoD was an indicator (Score:5, Insightful)
Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.
Every time I got a BSoD, and debugged it, it was always a driver or hardware
This has nothing to do with todays BS Windows crapware on their systems, just the general history of BSoD.
With that I agree, BSoD was blamed by default, because it was telling you the problem in a terrible way.
Re:BSoD was an indicator (Score:4, Interesting)
I did the following things that reduced BSOD's massively.
1/ UPS with brownout protection.
2/ Put the swap file on its own partition.
3/ Move all applications and data to their own partition leaving the C: drive with only Windows itself on it.
My experience was that most BSOD's were 3 categories, power fluctuations, programs (including windows itself) interfering with the swap file and programs interfering with Windows on the hard drive. Also after that defrags were much less needed as the only partition that badly fragmented was the C: drive with windows itself fragmenting itself.
Re: (Score:3)
None of that makes any sense unless you have a drive that is woefully unreliable and starts corrupting shit in flight.
There's no "interfering". You either write to the windows directory or you don't. You either write to the swap file or you don't (actually no you just don't full stop, software has no basis for nor ability to interfere with the swapfile without operating under elevated privileges). It doesn't matter where windows files are, they are always in the same place: %windir%, and software doesn't ma
Re: (Score:2)
Yeah, most BSODs I've seen were from shitty drivers of cheap hardware. Sometimes shitty drivers of expensive hardware. Sometimes shitty software with too many privileges.
Hard to think of something else, but then I've been away from windoze for over 2 decades now.
Re: (Score:2)
Ever dive deep into your network, even at home? Seen Layer 1 errors? Why? It's just wire, wtf?
But it happens. Even if you're not twisting the cables to see what happens.
Wi-Fi of course is expected to have Layer 1 imperfections.
Re: (Score:2)
Of course that stuff happens, the point is that simply using a different partition doesn't have any impact on the result of those errors.
Re: BSoD was an indicator (Score:2)
My point was that in an environment where random errors occur you may minimize the errors with reliable processes. But the unspoken caveat... If you have everything on one device, that becomes your point of failure. Partitioning a drive doesn't give you multiple points of physical failure. Logical points, perhaps...
Re: (Score:2)
I havent seen a blue screen in 10 years, except deliberately when trying new boundaries for over clocking.
Re:BSoD was an indicator (Score:4, Insightful)
I don't think it's fair to blame Microsoft for cryptic BSOD logs. When a memory chip goes bad, the OS doesn't have any way to know whether the chip is bad, or whether some driver bug caused the memory checksums to fail. If a component overheats and starts spitting out garbage, how is the BSOD supposed to diagnose that? If hardware is installed that isn't quite compatible, is the BSOD supposed to be able to display a nice, human-friendly message telling you that the model number of your component is a mismatch?
A lot of errors that cause BSODs *are* technical and require a knowledgeable professional to diagnose. This is not unlike a doctor who has to take obscure human symptoms and piece together what is going wrong in a human body.
Re: (Score:2)
If there were better APIs they could do that. An API for sensors, with a maximum safe limit supplied for each one. When a BSOD happens, it could do a little memory test on the affected area.
Re: (Score:2)
And how exactly would the OS know what the "affected area" was, or that the issue was bad memory in the first place? It could just as well be a bad controller, or an overheated chip, or a poorly-seated RAM chip, or any of a million other things.
Re: (Score:2)
To crash the OS out wound need to be in the kennel, so do a quick test if that RAM using something like MARCH C.
Re: (Score:2)
BSoD was telling you what was going on, but they made it difficult to understand what to do.
The BSoD only ever gave you enough information to tell you what driver crashed. Or a simple error code. It still does. That hasn't changed.
Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.
Be careful what you wish for. Error logs and the tools are great and all, but if a user is unable to go read on the MSDN Docs how to debug something they will not have a hope in hell of understanding the debug output either. Kernel panics are no better in this regard either. The average user (heck the average poweruser) has no hope in hell of understanding what went wron
Re: (Score:2)
Re: (Score:2)
Generally speaking, once you've gotten the BSoD, you're not reading logs.
And. "often it is hardware behind Microsoft's blue screen of death."
It is not well appreciated that Windows is reliant on independent vendors, manufacturers, etc. for drivers in particular. This is at once the greatest advantage (Microsoft can 'welcome' any hardware manufacturer that will bother to write drivers) and greatest vulnerability (Microsoft has to either do a LOT of work to insulate Windows from bad drivers, or suffer the con
Re: (Score:2)
Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.
You're talking about a kernel panic, which can be useless without a kernel memory dump. And you've already got the windows syslogs in the form of the event viewer, which is already going to capture anything relevant. Few normal people have idea what they'd even do with system logs. Virtually none of them have any idea what to do with a kernel memory dump beyond going to bleepingcomputer or reddit and posting "Every time I do this I get a bsod, help!" with advice often being to typically start pulling out ha
Re: (Score:2)
Windows NT used to give you a whole bunch of details when it hit a BSoD - NT4 bluescreens were wildly informative, but to the average user, completely useless. It was just a bunch of numbers that had no meaning to them or provide them with any pointer to what the problem was. It didn't help that many drivers adopted the 8.3 naming convention making it even more obscure.
Also completely useless because the screenful of information was there but you couldn't do anything with it - you couldn't print it or anyt
Bless his heart...but... (Score:1)
Re: (Score:2)
Linus is right, but this is really not news (Score:4, Interesting)
I've never used ECC in my personal machines - I'm sure it's great - but since the early 00's or so, BSoDs are just not a thing that regular users experience unless they have bottom-tier or broken hardware, and people that buy low quality stuff are not likely to want to spring the extra cash for ECC anyway.
Re: (Score:3)
Re: (Score:2)
Win9x and Win2k (and the other NT descendants) are fundamentally different operating systems. In general, NT had a much more robust kernel, so system panics were and remain mainly hardware issues, or, particularly in the old days, dodgy drivers (which is just another form of hardware issue). I've seen plenty of panics on *nix systems and Windows systems, and I'd say probably 90-95% were all hardware failures, mainly RAM, but on a few occasions something wrong with the CPU itself or with other critical hardw
Re: (Score:2)
NT also virtualized the DOS environment (ntvdm) where 9x didn't and that is a very important distinction. In th 9x days, games and even drivers were executed in the DOS environment. As soon as you would open command.com, it was a ticking timebomb for when a blue screen would happen. If you could avoid DOS and 16bit memory shimmed binaries, 9x wasn't nearly as fragile as everyone generally dealt with
Re: Linus is right, but this is really not news (Score:5, Insightful)
Adding to that I was primarily a Linux user even back then, but would occasionally dual boot into Windows. So same hardware, BSOD galore or at least frequent enough to be very annoying, but Linux as stable as could be and not a kernel panic in sight.
So clearly not all down to unreliable hardware.
Re: (Score:2)
Before NT, Windows was an absolute mess. I think the only reason most people put up with it was that they didn't know anything better was possible and since Windows was so widespread it was a misery everyone shared.
I think that many of those people were also recent DOS users. Given that DOS systems would often simply freeze up several times per day and require a reboot (easy to do since any bug in the user's application could do this), once they added a protected mode pseudo-kernel to Windows (maybe starting with Windows/386 2.1), it was actually a slight improvement over what they were used to since DOS crashes could sometimes be isolated to one virtual terminal.
Re: (Score:2)
With win10 in recent months i noticed that the Nvidia video driver hiccups and throws a BSOD practically after each time the OS gets an update. The fix is to completely remove the driver and related files using a tool like DDU and then to reinstall the driver. The BSODs stop... until the next OS update. Which is why i keep the PC disconnected from the network unless completely, unavoidably necessary.
No BSOD but Linux PANIC (Score:5, Interesting)
Re: (Score:1)
Re: (Score:2)
And this is exactly the reason why he doesn't bother with computers without ECC memory.
Re:No BSOD but Linux PANIC (Score:5, Informative)
Linux has utilized - pretty much forever - all the available memory as cache/buffer, so you were bound to run into the problem much sooner.
The Win95/98/ME could run for long time without ever accessing particular physical memory chips.
Windows NT didn't have this problem, but on the other hand WinNT and successors also had better isolation so if a driver crashed due to memory issue, it recovered better (This applies really to WinNT 3.5 and perhaps 4, back when it was still going with the Dave Cutler's VMS-derived approach - WinNT 3.5 is almost a microkernel).
Re: (Score:2)
This is surprising to me. Back in those days I had a pretty expensive custom PC with 32MB RAM, which was a lot for that time, and that RAM was constantly full. And when the 1.6GB hard disk started to fill, the OS stability really went to hell. Also, back in those days it wasn't just BSODs that you had to worry about - the OS would frequently lock up as well.
Re: (Score:2)
It could also be that Windows didn't allocate memory in the same pattern as Linux, and by chance didn't happen to need to use the defective chips. I've certainly seen Windows BSODs due to bad memory, so I know it's capable of detecting such problems.
Re: (Score:2)
It's nothing to do with speed. Windows / Linux has no ability to read or write faster or slower to RAM, the speed is set at boot. It's about accessing a broken area with data critical enough to cause a system error.
Re: (Score:2)
How did you confirm that this was the cause of the problem? By replacing the memory chips with known good ones?
Re: (Score:2)
Windows up to about 7 would crash more due to softw
That, and cooling (Score:2)
Bad cooling was a huge problem for PCs as well. There was not only the cheap fans that quickly wore out, but also people who kept the computer in a drawer, or on the floor where it acted as a stationary roomba vacuuming up all the dust within reach.
did anyone answer that question? (Score:2)
Cheap (Score:4, Informative)
RAM speed doesn't matter as much as it used to for framerates, though, unless you are overclocking a ton, in which case you don't care about stability anyways.
Re: Cheap (Score:3)
Re: (Score:2)
It adds price because Intel forced an artificial market segmentation down everyone's throats.
Re: (Score:1)
Because TPM takes the users computer and makes it Microsoft's. It is a builtin, pre-installed rootkit. It was never about making the computer safer for the user, but safer for Microsoft.
Re: (Score:2)
Because TPM takes the users computer and makes it Microsoft's. It is a builtin, pre-installed rootkit. It was never about making the computer safer for the user, but safer for Microsoft.
It has a side benefit of showing the world who the ignorant people are, like you. TPM 2.0 doesn't restrict anything. It's nothing more than a secure storage area.
Re: (Score:3)
/facepalm
TPM can't rootkit anything. It doesn't have interrupts, and it can't write to system memory. It's not much different from any device you'd connect over a serial port, meaning all your computer can do is send and receive data from it. Components attached to your PCI-e bus will have a much greater ability to rootkit you, like your GPU, your ethernet adapter, or even your USB devices. Yet, despite your yammering, you don't think twice about any of those.
Its only purpose is to serve as a witness, nothi
Re: (Score:2)
Why didn't Microsoft push for this instead of TPM 2?
That's like saying why doesn't the government focus on creating regulations for meat rather than funding a cure to ear infections. The two have nothing at all in common with each other. Not in use, practice, or application.
Add to that, ECC has downsides. It's more expensive and it's slower. It has its place, but that place just simply isn't on most desktops. If you're running a computer crunching critical financial transactions, yeah ECC is a good idea. If you spend your time teabagging other players in CoD
What's impressive to ... (Score:2)
I've described the boot process of a modern PC to people in the past, to make this point.
From power on to firmware and POST. Then the lights are turned on for all the areas of hardware responsibility. Some happen right away, others further down the line. Boot managers, OS, drivers... logins, more drivers, then - finally - the system subsides into an orderly management of resources using an incredible juggling act of interrupt management and carefully segregated multitasking where everybody must be orderly.
A
Re: (Score:2)
The same can be said for human reproduction and yet nobody's cheering for crack-babies.
Re: (Score:2)
If you deliberately introduce malware into your species boot process, it's not that evolution got it wrong.
Just shows he does not really understand hardware (Score:2)
Not that suprising with a software person.
With ECC, memory will go "bad" too, just a bit later. If you buy good quality memory, it will be fine for far longer than the time you are going to use it. Bad memory will overwhelm ECC way before that time.
Re: (Score:2)
If you pay no attention to error reports ECC or non-ECC are both rolling the dice; though
Re: (Score:2)
From 2004 to 2007 I had a system with ECC memory (dual Opteron). I checked the MCE logs regularly and it seemed that every day or so there were errors that ECC RAM fixed. If it had been plain parity RAM the problem would likely have gone unnoticed that potentially propagated errors.
That made me realize that even with good components shit happens and a bug-free program can still crash due to memory errors.
Re: (Score:2)
That is why I run memtest86+ for several days on new RAM. I once had an Infineon module crap out after about 2 days.
Re: (Score:2)
ECC on memory is 1-error correction 2-error detecting. On 3 errors it will just correct to the wrong value. I agree that the warning is nice (if your OS gives you one and you pay attention to it), but relying on the correction is pure foolishness.
It's a great video (Score:2)
The best part is when Torvalds unknowingly calls Musk "too stupid to run a tech company."
This could be the year.. (Score:2)
Linus switches to Windows.
Nope. It's software. (Score:1)
One thing I've learned from the past 10 years of supporting Windows 10/11 is it's almost always software. I had a problem with a keyboard doing runaway repeats on the Windows Hello PIN entry screen, and I swapped keyboards till I was blue in the face until I realized that it was the supplemental support software (SetPoint) which was causing the fault. Yes. Software is even screwing basic I/O devices now. It's a solved problem and software developers still manage to out-clever themselves into system instabil
3rd party drivers (Score:1)
Linus is just wrong (Score:2)
You run Windows and it blue screens. You run Linux or FreeBSD on that same hardware and it doesn't panic or log anything that looks like hardware issues, it runs for years without issue.
Yes ECC RAM solves random bit flips, but they happen so infrequently they can't be the cause of every Windows BSOD.