



NVIDIA Warns Its High-End GPUs May Be Vulnerable to Rowhammer Attacks (nerds.xyz) 13
Slashdot reader BrianFagioli shared this report from Nerds.xyz:
NVIDIA just put out a new security notice, and if you're running one of its powerful GPUs, you might want to pay attention. Researchers from the University of Toronto have shown that Rowhammer attacks, which are already known to affect regular DRAM, can now target GDDR6 memory on NVIDIA's high-end GPUs when ECC [error correction code] is not enabled.
They pulled this off using an A6000 card, and it worked because system-level ECC was turned off. Once it was switched on, the attack no longer worked. That tells you everything you need to know. ECC matters.
Rowhammer has been around for years. It's one of those weird memory bugs where repeatedly accessing one row in RAM can cause bits to flip in another row. Until now, this was mostly a CPU memory problem. But this research shows it can also be a GPU problem, and that should make data center admins and workstation users pause for a second.
NVIDIA is not sounding an alarm so much as reminding everyone that protections are already in place, but only if you're using the hardware properly. The company recommends enabling ECC if your GPU supports it. That includes cards in the Blackwell, Hopper, Ada, and Ampere lines, along with others used in DGX, HGX, and Jetson systems. It also includes popular workstation cards like the RTX A6000.
There's also built-in On-Die ECC in certain newer memory types like GDDR7 and HBM3. If you're lucky enough to be using a card that has it, you're automatically protected to some extent, because OD-ECC can't be turned off. It's always working in the background. But let's be real. A lot of people skip ECC because it can impact performance or because they're running a setup that doesn't make it obvious whether ECC is on or off. If you're not sure where you stand, it's time to check. NVIDIA suggests using tools like nvidia-smi or, if you're in a managed enterprise setup, working with your system's BMC or Redfish APIs to verify settings.
They pulled this off using an A6000 card, and it worked because system-level ECC was turned off. Once it was switched on, the attack no longer worked. That tells you everything you need to know. ECC matters.
Rowhammer has been around for years. It's one of those weird memory bugs where repeatedly accessing one row in RAM can cause bits to flip in another row. Until now, this was mostly a CPU memory problem. But this research shows it can also be a GPU problem, and that should make data center admins and workstation users pause for a second.
NVIDIA is not sounding an alarm so much as reminding everyone that protections are already in place, but only if you're using the hardware properly. The company recommends enabling ECC if your GPU supports it. That includes cards in the Blackwell, Hopper, Ada, and Ampere lines, along with others used in DGX, HGX, and Jetson systems. It also includes popular workstation cards like the RTX A6000.
There's also built-in On-Die ECC in certain newer memory types like GDDR7 and HBM3. If you're lucky enough to be using a card that has it, you're automatically protected to some extent, because OD-ECC can't be turned off. It's always working in the background. But let's be real. A lot of people skip ECC because it can impact performance or because they're running a setup that doesn't make it obvious whether ECC is on or off. If you're not sure where you stand, it's time to check. NVIDIA suggests using tools like nvidia-smi or, if you're in a managed enterprise setup, working with your system's BMC or Redfish APIs to verify settings.
Re: (Score:2)
CSPs in Germany could theoretically demand a refund
Not even, these customers use cards that have ECC and are not affected. ECC was a feature they paid for in the cards they acquired and serves the purpose of preventing a number of memory errors problems. If they de-activated ECC they should not be able to complain about the consequences.
It's a DRAM limitation (Score:4, Informative)
Not anything to do with the CPU or GPU. As the fabrication process node gets smaller the effect is easier to induce.
And plain DDR5 has built-in ECC too.
Is your AI dealer using ECC? (Score:2)
I'm sure you can ask.
ECC memory is not a cure-all here (Score:3)
Re:ECC memory is not a cure-all here (Score:4, Informative)
Most ECCs are single bit-correcting, double bit detecting. So a 1 bit flip will be detected and corrected. Two flipped bits will be detected. (and if you wonder, 3 bits will result in an incorrect bit being corrected).
The problem with DDR5 RAM is internally, DDR5 chips have built in ECC. However that only protects the chip - it doesn't make the whole RAM module ECC. But there are ECC DDR5 RAM modules as well. And the problem is it's basically impossible to find because regular DDR5 RAM always mention "on chip ECC" so regular non-ECC DDR5 RAM modules pollute your search for ECC DDR5 RAM. Makes it harder to find.
Anybody managed to reproduce Rowhammer? (Score:2)
I noticed way back that all the papers were done on laptops and many laptops skimp on refresh cycles to conserve energy, leading to a much higher sensitivity to this type of attack. I tried the test-code on 3 different desktop machines, absolutely no effect. Has anybody by now reproduced the Rowhammer-effect on a regular computer?
Fixed the Headline (Score:1)
NVIDIA Warns Its High-End GPUs May Be Vulnerable to Rowhammer Attacks, Raises Prices
Sucks for all you gamer card jerkoffs (Score:2)
Ok, great, see them next year. (Score:2)
So even theoretically (Score:2)
What exactly is the risk assessment of this? GPUs aren't really used as CPUs in any meaningful sense outside maybe some niche uses.
I don't remember any actual functioning exploits, and theoretical exploits were about privilege escalation in utterly unrealistic scenarios. On top of it since there's usually some kind of memory scrambling done at system level making memory locations unpredictable, and for rowhammer attack to work, you need predictable and very specific memory arrangement.
And with GPU memory, t