Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft Windows Bug Programming

That Time The Windows Kernel Fought Gamma Rays Corrupting Its Processor Cache (microsoft.com) 166

Long-time Microsoft programmer Raymond Chen recently shared a memory about an unusual single-line instruction that was once added into the Windows kernel code -- accompanied by an "incredulous" comment from the Microsoft programmer who added it:

;
; Invalidate the processor cache so that any stray gamma
; rays (I'm serious) that may have flipped cache bits
; while in S1 will be ignored.
;
; Honestly. The processor manufacturer asked for this.
; I'm serious.
invd


"Less than three weeks later, the INVD instruction was commented out," writes Chen. "But the comment block remains.

"In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess."
This discussion has been archived. No new comments can be posted.

That Time The Windows Kernel Fought Gamma Rays Corrupting Its Processor Cache

Comments Filter:
  • by johnjones ( 14274 ) on Sunday November 25, 2018 @10:44AM (#57696724) Homepage Journal

    preparing your software for failures in hardware due to common problems such as radiation might be a good idea...

    This is why some firms/states would not trust microsoft to critical functions....

    • Sure they did (Score:5, Insightful)

      by rsilvergun ( 571051 ) on Sunday November 25, 2018 @11:02AM (#57696818)
      that's what their embedded OSes were for. AFAIK this was in their consumer code base.

      If I had to guess this was because of a real processor bug Intel didn't want to admit to. I remember when Win XP hit the shop I was at was flooded with dead computers from upgrades. Manufacturers had been selling bad ram in computers for years. By default Win98 would only make use of the first 64 MB of ram in most cases (there was a registry hack I've long since forgotten to force it to use your entire ram before going to the cache).

      Anyway, XP's installer would copy the CD into ram to make the (very slow) install run faster. So you got to find out your OEM stuck bad ram in your box the hard way when the installer blew up. The best part was the upgrade couldn't roll itself back gracefully. I don't remember all the steps to fix it but it was a pain. We just did software where I was at too so it was fun having to send them somewhere else to get new ram and have them yell at me that the ram was fine. Good times.
      • Re:Sure they did (Score:5, Interesting)

        by msauve ( 701917 ) on Sunday November 25, 2018 @02:31PM (#57697714)
        "If I had to guess this was because of a real processor bug Intel didn't want to admit to."

        Alpha particles affecting memory is a known, but uncommon, issue. This code invalidated the cache when coming out of S1 (sleep) state. The deeper (S2+) sleep states already invalidate the cache. The longer the processor is in a static state (sleep), the more chance that an alpha particle hit will flip a bit. Invalidating the cache when coming out of a sleep state has no meaningful impact on performance. The time to re-fetch is nothing compared to the amount of time spent sleeping. Of course, there are many more bits in RAM which could be affected, so a problem is more likely to occur there, which this doesn't address.

        But it hurts nothing, avoids an (admittedly rare) issue, and is but a single instruction. I wonder why they removed it?
        • S1 i supposed to keep the cache fully powered up. How's it going to make any difference if an alpha particle hits the cache memory cells while the core clock has stopped?

          • by msauve ( 701917 )
            "How's it going to make any difference if an alpha particle hits the cache memory cells while the core clock has stopped?"

            It's not clear what you're asking. If a bit in the cache gets changed, it corrupts the instruction or data. That the cache is powered up makes no difference.
            • I'm saying the risk of cache corruption from gamma rays should be no different between S0 and S1.

              • by msauve ( 701917 )
                "the risk of cache corruption from gamma rays should be no different between S0 and S1."

                But invalidating the cache when returning from S1 removes any (even remote) risk. And there's no downside. Better is better.
                • There's a performance and power consumption impact. Otherwise they wouldn't have any cache at all.

                  • by msauve ( 701917 )
                    I understand how, not knowing how sleep states work, you would think that.
                    • So you don't understand that coming out of a sleep state and not having any data in the cache at all results in more stalls and main memory access and how that translates to a performance hit and more power consumption?

                    • by msauve ( 701917 )
                      No, you simply don't understand that any delay needed to fetch from RAM when coming out of a sleep measured in seconds is absolutely meaningless in the real world.
            • by epine ( 68316 )

              It's not clear what you're asking. If a bit in the cache gets changed, it corrupts the instruction or data. That the cache is powered up makes no difference.

              Wrong.

              Next contestant, please.

              You're assuming the cache has not parity or ECC mechanism where active use would eliminate single-bit errors before they accumulate into undetectable errors, whereas pickling the cache in quiescent warm brine would not.

              • by msauve ( 701917 )
                But see, that's just it. I'm not assuming anything, it's you who are making the ASSumption. The odds of a double hit (which might pass the parity check) are multiplied billions (GHz) of times when it's sitting there in static sleep for a second or two.
          • by mlyle ( 148697 )

            It's just that the stuff will have sat for an indeterminate, long time while the clock is stopped-- providing an unusually long window for a bit to flip-- and resuming even from S1 is a relatively costly operation.

            I think overall it is silly, but if you have ECC RAM and non-ECC cache, and spend most of the time in S1, it's not completely crazy.

          • by Agripa ( 139780 )

            S1 i supposed to keep the cache fully powered up. How's it going to make any difference if an alpha particle hits the cache memory cells while the core clock has stopped?

            The cache is protected against data corruption by ECC or parity however if multiple bit errors accumulate within one word, this protection fails. During normal operation, the cache is continuously scrubbed of errors so this is not a problem.

        • >Alpha particles affecting memory is a known, but uncommon, issue.

          A known issue for plastic packaging. The alpha emitters are in the plastic.

        • by Agripa ( 139780 )

          Of course, there are many more bits in RAM which could be affected, so a problem is more likely to occur there, which this doesn't address.

          High performance integrated SRAM is orders of magnitude more susceptible to radiation induced soft errors than DRAM which is why SRAM caches have included ECC or parity protection almost since they were first used.

          Oddly enough, DRAM has actually become more resistant to radiation induced soft errors over the last couple of generations but this is more than cancelled by the increasing amount of DRAM used.

      • If I had to guess this was because of a real processor bug Intel didn't want to admit to.

        I was wondering that too. The article suggests it is true.

    • Re: (Score:3, Informative)

      by Anonymous Coward

      Reading the full story, it's rather strongly implied that it was actually a workaround for a bug in the processor which the manufacturer hadn't found yet, and was blaming on cosmic rays.

      • by mikael ( 484 )

        I know stray radar microwaves can take out a PC. There was weather radar station close to where I lived. Whenever my smartphone app received a heavy rain warning, my gaming PC would crash seconds before.

        • Was your gaming PC's case solid metal, or did it have large windows / oversized vents?
    • by mikael ( 484 ) on Sunday November 25, 2018 @11:49AM (#57697034)

      One component that many defence contract required was a Nuclear Event Detector. This little component would set a pin when it detected the precursor of a nuclear detonation. What the system did next was up to the vendor, but usually it would involve a shutdown and disconnect of ports and power lines.

      • by Anonymous Coward
        Maxwell HSN-1000 [ddc-web.com]. You can't buy them new from Maxwell but you can get them used from recycled military gear for around $150.
  • by NotSoHeavyD3 ( 1400425 ) on Sunday November 25, 2018 @10:45AM (#57696730) Journal
    Since it explains the reasoning why that code is there.(Since another developer could come by and wonder why that code is there.) I've seen way too many people put in a comment like ;invalidate cache and call it a day.
    • by Anonymous Coward

      Would have been somewhat better if they left in which processor, and which manufacturer, they were talking about.

      • You're very right there. Also I'm guessing there's probably some issue tracking so putting that in there would be nice as well. I'm just so surprised the original developer didn't put in some pointless comment.
    • It needs a reference to the errata from the vendor. Future revisions may need to tweak code flow and understand exactly what this is trying to achieve.

    • by Anonymous Coward

      +1

      Note from professional programmer: I can read the code to see WHAT is happening, and HOW it is happening. I need the comments to explain WHY it is happening, and WHY I should care. During code review, this comment would get a "awesome comment" comment.

    • by shabble ( 90296 )

      Since it explains the reasoning why that code is there.(Since another developer could come by and wonder why that code is there.).

      But... the code isn't there. The code itself was commented out shortly after.

      What's more concerning is why the commented stuff was actually left in there, since I'm presuming they had source control even back then.

      And "in case someone put it back in later" isn't really covered since the same sort of code could conceivably be put elsewhere in the code without the programmer seeing this bit of code.

    • The comment could have included "Use this instruction with care. Data cached internally and not written back to main memory will be lost", INVD man [felixcloutier.com].
  • by vadim_t ( 324782 ) on Sunday November 25, 2018 @10:48AM (#57696746) Homepage

    The need for error checking has been around for a very long time. Yes, cosmic particles are indeed a thing, and result in increased memory errors at high altitude, in airplanes, or especially in space.

    I remember parity RAM being around in the 90s, and I'm pretty sure it's older than that. Pretty much any server these days uses ECC for this reason.

    I run ECC and record the occassional bit flip in my logs once in a while. These can be found at /sys/devices/system/edac/mc/mc0/.

    What's odd is that ECC is not routinely used in all hardware. Depending on the conditions it can be of great help, as the rare bit flip can cause strange problems that can take ages to track down. And it works well for figuring out when you have a bad memory module -- the computer will figure it out on its own.

    • RAM is cheap enough that ECC or similar tech should be routine. Iâ(TM)ll pay 10-15% more per GB for this.

      • by arth1 ( 260657 )

        The problem is that you need a CPU and north bridge that can handle it, which adds to the initial costs. For Intel, for example, a Xeon CPU costs (artificially) a good deal more than a comparable speed i3/5/7/9, which is an upfront cost that consumers aren't willing to eat, and they tend to choose either a cheaper CPU or a faster CPU for the same kind of money.

        • by vadim_t ( 324782 )

          Or you could buy AMD, which seems to have excellent support for it.

        • by mlyle ( 148697 )

          The most real problem is that this is a way for motherboard and CPU vendors to segment the market, and prevent commodity PC hardware from being used for critical things. Home users "don't need" ECC, so it can be left off the cheap stuff.

        • by Agripa ( 139780 )

          The problem is that you need a CPU and north bridge that can handle it, which adds to the initial costs. For Intel, for example, a Xeon CPU costs (artificially) a good deal more than a comparable speed i3/5/7/9, which is an upfront cost that consumers aren't willing to eat, and they tend to choose either a cheaper CPU or a faster CPU for the same kind of money.

          In most cases Intel's Xeon and consumer CPUs are the same hardware so the only difference in production might be testing time. Intel's artificial market segmentation of ECC is more about price discrimination [wikipedia.org] then costs which can be seen by their tying ECC to use of the proper south bridge which has nothing to do with it.

    • by dargaud ( 518470 )
      I have a friend who had written his own accounting software in the 80s on a 6502 PC. Once there was a discrepancy of a few $ at the end of the month. He spent an entire month backtracking the error through software logic, then software debug, then finally assembly until he found the exact place where a single bit had flipped in memory. Took him a month.
      • Comment removed based on user account deletion
        • Comment removed based on user account deletion
        • by Megol ( 3135005 )

          Apple II, BBC Micro, Commodore 64, Commodore PET, Atari 8bit series etc. There were many alternatives.

        • by Anonymous Coward

          Technically the Commodore 64 used a 6510 rather than a 6502, although in practise the only difference was that the 6510 had an extra 8-bit I/O port used for bank switching memory and talking to the tape drive.

          Commodore owned MOS Technology who made the 6502 so they made quite a few custom variants like this for various computers and devices.

      • Comment removed based on user account deletion
    • What's odd is that ECC is not routinely used in all hardware.

      Nothing odd about it. It costs more, It performs worse, and the vast majority of the incredibly rare errors that are caused end up being entirely non-critical due to the way people generally use computers.

      If you have a database server handling critical information all day then it makes sense. But hell for the vast majority of workloads your computer is more likely to get "Aw. Snap! Something went wrong" Along with a frowny face displayed in your browser. Any time a consumer is doing anything remotely import

      • by arth1 ( 260657 )

        Nothing odd about it. It costs more, It performs worse

        Not always. Modern ECC does the fetch and verification in parallel, negating most of the slowdown. And some registered ECC (which used to be slower) is now faster, as it does pre-fetch before the actual request.

        • Always. Without fail.

          The process of check and verification itself was only a small part of the performance of memory. ECC memory is almost impossible to find at common desktop speeds with almost all of them being in the sub 3000MHz except for the truly ultra-expensive modules.

          Where someone wants to pay for equal speeds and chose something like a 2166 module the ECC memory invariably has far worse latency figures.

          ECC memory has a lower upper speed limit, lower than the actual standard speed capability of a m

    • What's odd is that ECC is not routinely used in all hardware.

      For a lot of systems and uses, the rate of error occurrence doesn't justify the area cost of ECC. For all fabrication processes in the last decade, error rates per SRAM bit have been decreasing faster than the increase in number of SRAM bits, meaning that the total error rates for most chip families have been decreasing. Furthermore, the vast majority of errors in SRAM never propagate to user-discernible outcomes. For these systems, the user is more interested in a lower initial price or better performan

      • by Agripa ( 139780 )

        I think you are confusing SRAM and DRAM.

        DRAM soft error rates leveled off a couple generations ago. SRAM soft error rates are a couple orders of magnitude higher and have remained so for integrated SRAM caches. A discussion of the difference and why it exists would be interesting.

        Other than some odd exceptions, integrated SRAM caches have been protected by ECC or parity almost since they were first used.

        • I think you are confusing SRAM and DRAM.

          DRAM soft error rates leveled off a couple generations ago. SRAM soft error rates are a couple orders of magnitude higher and have remained so for integrated SRAM caches. A discussion of the difference and why it exists would be interesting.

          DRAM per Mbit error rates have not dropped as precipitously as SRAM error rates. Over the last decade, SRAM error rates have dropped by a few orders of magnitude, faster than the increase in the total number of SRAM bits on a chip due to scaling and chip area increase. Ten years ago, the SRAM error rate was quite a bit higher than the DRAM error rate, by about an order of magnitude. Especially with the introduction of FinFET/tri-gate, SRAM error rates have plummeted and are now somewhat lower than that f

    • What's odd is that ECC is not routinely used in all hardware. Depending on the conditions it can be of great help, as the rare bit flip can cause strange problems that can take ages to track down. And it works well for figuring out when you have a bad memory module -- the computer will figure it out on its own.

      Others have already covered the higher cost and performance hit of ECC RAM.

      The most visible symptom of a random bit flip is that your program crashes. The RAM a program occupies far exceeds the R

    • There are so many machines out there with domain names in memory, that squatting domains that are a single bit-flip away can be quite interesting [dinaburg.org].
    • > What's odd is that ECC is not routinely used in all hardware

      I know why. It's a pain to implement on arbitrary logic - as opposed to memory.

      TMR is more appropriate, however the tool support for TMR is still abysmal. Synopsis should have a tmr command you can apply to a module and have it just happen. Instead you waste weeks fighting the optimizer to prevent it removing the TMR you put in manually.

      • For software, you still need to put in the countermeasures by hand, because you've also got things like control flow integrity and other aspects to deal with. Also you don't need the overhead of TMR for all values, just critical system variables and the like, so having a tool try and do it automatically doesn't work.
    • by Agripa ( 139780 )

      What's odd is that ECC is not routinely used in all hardware. Depending on the conditions it can be of great help, as the rare bit flip can cause strange problems that can take ages to track down.

      Whether ECC is used or not depends on the likelyhood of an error and how serious the consequences will be. The number of errors depends on how much memory is used (not installed), how long it is used, and oddly enough some factor related to the access rate. Since servers tend to have much more memory and operate for longer times than desktops, ECC makes more sense for them.

      Who cares about errors while playing a game or media or doing consumer type tasks which do not tax the computer? But if my workstatio

  • by Brett Buck ( 811747 ) on Sunday November 25, 2018 @10:59AM (#57696800)

    It seems to make good sense to put in some protections against register or other bit flips, they do happen from time to time. He probably meant cosmic rays instead of gamma rays, but that definitely can happen and i have spent many, many, hours of my life putting things in software that detect these and recover properly. I have one processor type that has something like this about once a month, very consistently, over several decades.

    • If it's being done rarely, and before exceptionally critical operations, then maybe it makes sense. Although, if someone bothered to take it out, then it was probably happening too often and thus affecting performance...

  • by starless ( 60879 ) on Sunday November 25, 2018 @11:03AM (#57696830)

    A real gamma ray wouldn't do much, and would just pass through, unless it pair converted to electron and positron.
    But cosmic rays (charged particles) would be more likely to interact.

    • by Anonymous Coward

      Gamma rays lose energy while passing through materials by knocking electrons around. This can involve many collisions and many displaced electrons depending on the energy of the gamma ray. Higher energy photons will go a ways without interacting much, but as they lose energy collisions can become more frequent and at some point they can quickly dump the rest of their energy in a smaller volume. Charged particles stop by practically the same process, just interact more strongly and so are more likely to d

    • by mnmn ( 145599 )
      I agree. I came here to comment on 'why is this strange' but looks like many slashdotters (at least ones with physics backgrounds) feel similarly.

      It seems ridiculous when you take a cpu rma to Intel for an rca on some OS crash, but their response is that the cpu is fine, it was a cosmic particle. But it's true, and statistically this can happen to any bit in any register. Especially with the lithography processes producing ever smaller gates with few atoms manning the gate/bit.
  • by Crashmarik ( 635988 ) on Sunday November 25, 2018 @11:12AM (#57696884)

    Your cpu has been asleep for an apriori unknown amount of time, you are powering back up you'd absolutely want to clear the cache to purge any potential bit flips. It's a relatively cheap way of insuring data integrity.

  • by Laxator2 ( 973549 ) on Sunday November 25, 2018 @11:17AM (#57696904)

    I think they use laptops on the International Space Station and there you are not protected from cosmic rays by the blanket of the Earth's atmosphere. Just read up on the phosphenes experienced by the astronauts as they try to go to sleep.

    Not sure if "gamma rays" is the correct term here, as high-energy protons are most likely to create a local change in electric charge density. With modern processors being built ont the 14 nanometres process this becones a serious problem. All the processors that are used in spacecraft and control vital functions are radiation-hardened. That usually means older fabrication processes (wider paths reduce the probability of cross-talk) and amorphous silicon (a monocrystal can sustain permanent damage from a particle of high enough energy)

    Overall, it does make sense if it is meant to be used in space.

    • by Agripa ( 139780 )

      With modern processors being built ont the 14 nanometres process this becones a serious problem.

      Susceptibility is more complicated than just the minimum feature size. It was a serious problem generations ago.

      Denser processes use gate insulators with a higher dielectric constant to store more charge and also provide more drive for a given area. These things make a process more resistant to radiation induce soft errors. The same things caused the susceptibility of DRAM processes to level off or even decrease slightly starting a couple generations ago.

  • Yeah, I'll get this was before they discovered that their processor packaging material was radioactive and that was ramdonly flipping bits. Seriously, radioactive RAM was on culprit which ran Sun Microsystems, Inc. out of business. It took them years to find it. They even started ECC their motherboard data paths, looking to see if their data centers were near nuclear research facilities. By the time they found it it was too late. ...that and they should have ditched Solaris for Linux, but...
  • Sounds like a smoke screen for something else.

    If the cache is susceptible to random gamma rays, or, more likely, cosmic rays, and has no ECC, it is NEVER trustworthy, and should be permanently disabled.

    It's like the Intel floating point bugs (yes, plural). Since the end user has no idea WHICH of the operations will produce an erroneous result, NONE of the operations' results are usable, ever.

    Could be worse. Intel once had a "genius" purchasing agent that got a "good deal" on clay for the ceramic package o

    • by Agripa ( 139780 )

      If the cache is susceptible to random gamma rays, or, more likely, cosmic rays, and has no ECC, it is NEVER trustworthy, and should be permanently disabled.

      While it is shut down, the cache is not being continuously scrubbed by ECC or parity allowing bit errors to accumulate and defeat the ECC or parity after it is powered up. Invalidating and reloading the contents of the cache makes perfect sense in this situation.

  • by QuietLagoon ( 813062 ) on Sunday November 25, 2018 @11:51AM (#57697044)
    Nowadays, it probably is far, far more likely that Microsoft's horrendous Windows QA will result in bad data than stray gamma rays flipping bits in a sleeping cache.
  • Comment removed (Score:5, Insightful)

    by account_deleted ( 4530225 ) on Sunday November 25, 2018 @12:08PM (#57697122)
    Comment removed based on user account deletion
    • On occasion, I've had to keep the commented out code with comment explanation why this code must not occur. Otherwise, people keep coming in trying to fix code that's not broken.
      • On occasion, I've had to keep the commented out code with comment explanation why this code must not occur. Otherwise, people keep coming in trying to fix code that's not broken.

        This.

        I've left the wrong code in, commented with a detailed explanation as to why it's wrong, so someone doesn't come and 'fix' it again.

    • Or maybe someone made a mistake. The specification seems to imply you need to flush the cache *BEFORE* entering the S1 state and the hardware is responsible for the rest:

      "15.1.1 S1 Sleeping State
      The S1 state is defined as a low wake-latency sleeping state. In this state, all system context is preserved with the exception of CPU caches. Before setting the SLP_EN bit, OSPM will flush the system caches. If the platform supports the WBINVD instruction (as indicated by the WBINVD and WBINVD_FLUSH flags in the

  • Microsoft code does not contain comments.

    To thwart lawyers finding out the true intentions of the strategies, Bill Gates decreed that the code should not have comments. Famously he said, "I am paying you to write code, not comment."

  • by nbvb ( 32836 ) on Sunday November 25, 2018 @02:04PM (#57697598) Journal

    Anyone surprised by this must have not been around during the UltraSPARC days ....

    I must’ve replaced 1000+ of those damn chips when the “Sombra” modules came out. Mirrored SRAM to protect against the ecache bit-flips. Kernel panics due to “ecache parity errors” were so common ....

    Cache scrubbers in the Solaris kernel. Replacement CPUs. All of it helped.

    This stuff is real and painful if you had a data center full of gear susceptible to it.

  • by toxygen01 ( 901511 ) on Sunday November 25, 2018 @03:15PM (#57697954) Journal
    A friend of mine, developer of the spreadsheet SW back in the days of DOS a Norton Commander, had one customer who would keep complaining about the SW crashing from time to time. These kind of crashes would only happen to this customer and no other.

    He installed a debug build on the customer's site and and waited... and fair enough, the SW would crash, and crash again and again... at completely random places in the code. In some cases there was literally no way those lines of code could make the program crash under any circumstances.

    Well, he spent days trying to debug it and came up empty handed. Until it struck him to look at the time when the SW is crashing. And fair enough, it was crashing on one particular day in a week usually in the time-span of few hours during that day. Now comes the interesting part -- the customer's site was actually a railway station on the Slovakia-Ukraine border (in town called Uzghorod). So he called the customer to ask if there was a train in the station regularly on that day and hour every week and voila, there was one train coming from Ukraine to Slovakia with some goods. So he asked the customer to take Geiger counter and see if there was anything going on in the air.

    They found out one of the train cars was radiating like hell. It was used for transferring spent nuclear fuel before. And Ukrainians thought they would save some money by using it for regular cargo after EOL. I wouldn't like to be a person living near those railway tracks...

    tl;dr
    Spreadsheet SW was crashing on the computers in the train station and thanks to customer complaints they found out the crashes were caused by radioactive train coming regularly to the station.
  • This is actually pretty common and has gone on for a long time, especially on systems that were striving to be low-to-zero downtime.

    Some of the idle processing on AS/400s would periodically re-write the microcode from disk. When I asked a core developer why, they cited gamma rays flipping a bit. I then asked if a lead umbrella wouldn't do the job better, and they said yes, but the umbrella would have to be about six feet thick.

    • by Agripa ( 139780 )

      Some of the idle processing on AS/400s would periodically re-write the microcode from disk. When I asked a core developer why, they cited gamma rays flipping a bit. I then asked if a lead umbrella wouldn't do the job better, and they said yes, but the umbrella would have to be about six feet thick.

      Cache and memory scrubbing is a standard feature even on x86 consumer desktop processors whether the user has access to it or not. Motherboards which support ECC memory may make the settings which control scrubbing available in the BIOS. Scrubbing applies to every level of cache which is ECC or parity protected and to main memory if ECC protected.

  • Cosmic rays causing ram errors, is a thing. Scientists estimate it will happen to PCs, at ground level, about once a year. Surprisingly, which year does not matter much because as the tech gets smaller, the capacity gets larger, so the die size stays about the same.

    Once a year might not sound like much, but that is not "at the end of the year", it can happen right away. Chance is strange that way. 8-)

    MS should probably -not- have commented it out...

    • by Agripa ( 139780 )

      Cosmic rays causing ram errors, is a thing. Scientists estimate it will happen to PCs, at ground level, about once a year. Surprisingly, which year does not matter much because as the tech gets smaller, the capacity gets larger, so the die size stays about the same.

      Moore's law is about economics and includes cost reduction per transistor from increasing die size so I wonder if the total die area of memory has actually increased at the high end of consumer hardware.

      Once a year might not sound like much, but that is not "at the end of the year", it can happen right away. Chance is strange that way. 8-)

      MS should probably -not- have commented it out...

      A couple of DRAM generations ago it was something like 1 bit per year per gigabyte but later DRAM generations actually improved slightly. My workstations went from 2GB to 8GB in my current one and my next will likely be 64GB but they all use 4 x dual sided DIMMs so the same number of chips but the silicon c

      • Magnetic media is not so prone to this. But this makes me wonder if the SSD drives, we are all using now, are having this problem??

        Maybe SSDs have better data check and correction functions, but maybe we should keep a hard drive in our computers to reload the SSD, if necessary.

        • by Agripa ( 139780 )

          Magnetic media is not so prone to this. But this makes me wonder if the SSD drives, we are all using now, are having this problem??

          Maybe SSDs have better data check and correction functions, but maybe we should keep a hard drive in our computers to reload the SSD, if necessary.

          Both hard disk drives and solid state drives use block based error correction. Several bad bits can be corrected in each sector and sectors may even be considered good with several bad bits below a specified threshold.

          Where SSDs compare poorly to HDDs is endurance and retention time but as long as they are not used for unpowered offline storage like a hard drive might be, retention time is not a problem and few users are going to reach endurance limits. There is a new standard for SSD retention time but I

  • A quick read through the ACPI specification implies that the caches should be flushed *before* entering the S1 state and letting the hardware deal with the rest.

    I'm not sure what to make of the comment. Part of the comment makes it apear as though this instruction comes after waking (making it pointless since the cache is already invalid). If this comment is about before going into the sleep state then it wasn't a manufacturer who asked for this, it was the ACPI specification itself, and not flushing the ca

Life is a whim of several billion cells to be you for a while.

Working...