Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Graphics

Nvidia RTX 40-Series GPUs Hampered By Low-Quality Thermal Paste (pcgamer.com) 50

"Anyone who is into gaming knows your graphics card is under strain trying to display modern graphics," writes longtime Slashdot reader smooth wombat. "This results in increased power usage, which is then turned into heat. Keeping your card cool is a must to get the best performance possible."

"However, hardware tester Igor's Lab found that vendors for Nvidia RTX 40-series cards are using cheap, poorly applied thermal paste, which is leading to high temperatures and consequently, performance degradation over time. This penny-pinching has been confirmed by Nick Evanson at PC Gamer." From the report: I have four RTX 40-series cards in my office (RTX 4080 Super, 4070 Ti, and two 4070s) and all of them have quite high hotspots -- the highest temperature recorded by an individual thermal sensor in the die. In the case of the 4080 Super, it's around 11 C higher than the average temperature of the chip. I took it apart to apply some decent quality thermal paste and discovered a similar situation to that found by Igor's Lab. In the space of a few months, the factory-applied paste had separated and spread out, leaving just an oily film behind, and a few patches of the thermal compound itself. I checked the other cards and found that they were all in a similar state.

Igor's Lab examined the thermal paste used on a brand-new RTX 4080 and found it to be quite thin in nature, due to large quantities of cheap silicone oil being used, along with zinc oxide filler. There was lots of ground aluminium oxide (the material that provides the actual thermal transfer) but it was quite coarse, leading to the paste separating quite easily. Removing the factory-installed paste from another RTX 4080 graphics card, Igor's Lab applied a more appropriate amount of a high-quality paste and discovered that it lowered the hotspot temperature by nearly 30 C.

This discussion has been archived. No new comments can be posted.

Nvidia RTX 40-Series GPUs Hampered By Low-Quality Thermal Paste

Comments Filter:
  • by newcastlejon ( 1483695 ) on Monday July 22, 2024 @07:36PM (#64647876)
    Without saying who actually made the mistake this article is worthless.

    "This results in increased power usage, which is then turned into heat. Keeping your card cool is a must to get the best performance possible."

    Yeah, nah. Performance might suffer because it's throttling, but that would mean it uses less power.

    • by gweihir ( 88907 )

      Indeed. Time to name and shame.

      • If anyone should be shamed it's "Igor's Lab".
      • by Rei ( 128717 )

        Surely it's NVidia, not 3rd parties, because it's long been common across NVidia cards but not others. The 3xxx series was also famous for this thermal paste problem.

        • by flink ( 18449 )

          Surely it's NVidia, not 3rd parties, because it's long been common across NVidia cards but not others. The 3xxx series was also famous for this thermal paste problem.

          Unless you are using a reference card from NVidia, the card in your PC was built by a third party manufacturer like MSI or EVGA. They buy the reference card specs and chips from NVidia and then design and assemble cards themselves. The 3rd party OEMs would have mounted the components including GPU and heat sink, and they would have chosen what thermal paste to use.

          If anything is NVidia's "fault" I guess it would be that they design GPUs that run close to their thermal limit, causing them to be susceptible

          • Unless you are using a reference card from NVidia, the card in your PC was built by a third party manufacturer like MSI or EVGA. They buy the reference card specs and chips from NVidia and then design and assemble cards themselves. The 3rd party OEMs would have mounted the components including GPU and heat sink, and they would have chosen what thermal paste to use.

            While OEMs can select and design components of the cards, I do not know for sure if NVidia requires specific components like particular thermal paste to be used or merely suggests certain ones. It would not surprise me if NVidia insisted that "ExtremezCooling Paste BVC-14 must be used for the following GPUs . . Exceptions must be granted in writing . . ." We will have to see.

    • by war4peace ( 1628283 ) on Tuesday July 23, 2024 @02:56AM (#64648418)

      Reading comprehension failure.
      "[...]your graphics card is under strain"[...](which) "results in increased power usage, which is then turned into heat."

      That is absolutely correct.
      More usage results in higher power consumption.

      The throttling bit comes after that statement and is not a cause, but an effect.
      Also, power usage increases, the hotter the GPU becomes, until it throttles, which is a corner case.

      Under the same load, a GPU at, say, 80 degrees Celsius uses up more power for the same result than the same GPU at, say, 60 degrees Celsius.

      https://on-demand.gputechconf.... [gputechconf.com]

      • by Rei ( 128717 )

        I wasn't aware of this - thanks for this info. :) I do compute-heavy workloads where throughput per watt hour matters, so I power cap the cards, but the ideal wattage point is always a bit fuzzy because of how hot one is willing to let their system get.

        • Glad to have helped a bit.
          I have noticed this with my first watercooled GPUs, back when GTX 970 was a powerful card.
          When on air, the top frequencies were always lower than when the card was liquid cooled, while power usage was higher.
          This kept being valid for subsequent generations and is still valid now.

          Liquid cooling is expensive, indeed, but you could squeeze extra performance that way. However, YMMV - this heavily depends on workload type, etc.

          I know someone who does 3D rendering, and their RTX 3090 wer

      • More usage results in higher power consumption.

        I'm not tracking this logic. Why would there be more usage? There'd be less.

        In your haste to accuse the OP of a reading comprehension fail, you seem to have missed that the quote you cited is talking about playing games, not rendering a 3D animation, so the work is time-bound. You don't play a game for twice as long when the graphical fidelity drops by half: you play the game for the same amount of time, just with worse fidelity.

        If you're failing to transport the heat away efficiently, it has to reduce the

        • I'm not tracking this logic. Why would there be more usage? There'd be less.

          I apologize, my definition of "usage" must be different from yours.
          You seem to attribute a time measure to "usage", while I am thinking more of "percentage between 0% (no usage) and 100% (full usage)".

          In your haste to accuse the OP of a reading comprehension fail, you seem to have missed that the quote you cited is talking about playing games, not rendering a 3D animation, so the work is time-bound.

          No, it is not.
          When the GPU is put to work, it does not care which type of work it does (well, OK, there are cases and cases, but bear with me). It will perform the work and heat up in the process. When the GPU heats up, it will use more power to perform the same amount of work as it heats up, because its effic

          • I think you may have lost sight of the fact that the article is simply about graphics cards not going vroom vroom because they’re being throttled. That’s it. It never once mentions “efficiency”, “watts”, or anything else resembling what you’re talking about, but it does have this gem a bit later (emphasis mine):

            Modern graphics cards use lots of power and all of it is turned into heat

            The quote the OP pulled that you’re trying to defend was not the nuanced argument about changes in performance per watt as temperatures vary that you

            • I think you may have lost sight of the fact that the article is simply about graphics cards not going vroom vroom because they’re being throttled. That’s it.

              Maybe you missed the fact that the beginning of the Slashdot entry is not from the article, but written by the reader who submitted it. It is a general description of what happens when a GPU is fully utilized. It's an introduction, if you will. Please read it again, below. I took the liberty to emphasize the relevant part.

              "Anyone who is into gaming knows your graphics card is under strain trying to display modern graphics," writes longtime Slashdot reader smooth wombat. "This results in increased power usage, which is then turned into heat. Keeping your card cool is a must to get the best performance possible."

              • Yeah, in my editing, I accidentally removed an indication of my awareness of that. It doesn’t change that the sentence still isn’t about what was being claimed, nor would it change the OP’s read on it. It’s a distinction without difference.

            • I think you may have lost sight of the fact that the article is simply about graphics cards not going vroom vroom because they’re being throttled. That’s it.

              Leaving aside who wrote the comment (it's not from an article) you can't separate the concepts. The inefficiency drives faster into throttling than it would otherwise. It's a case of silicon thermal runaway that is well understood in electrical engineering lines. If your cooling degrades it causes positive feedback pushing you towards a throttling situation. The difference is also not insignificant in the region being discussed.

              • The inefficiency drives faster into throttling than it would otherwise. It's a case of silicon thermal runaway that is well understood in electrical engineering lines.

                I completely agree and was already aware of everything you said, which is exactly why it does not result in increased power usage when gaming, which is the topic of contention. In the case of gaming, which is what that quote was talking about, the inefficiency leads to excess heat, which leads to throttling, which leads to a reduced power draw for the duration of the gaming session, which necessarily means less usage, not more.

    • Without saying who actually made the mistake this article is worthless.

      Errr all of them. Igor's lab specifically points out that most integrators use the same source and type of thermal paste. This is a widely spread issue and has been reported in MSI, Gigabyte, ASUS and PNY cards. The internet is full of these stories right now. Personally I had the problem with an ASUS RTX 4080S TUF. Worked great for about 4 months before it needed a teardown and re-paste. And for the 4080S TUF it was especially obvious since that thing had a BEAST of a heatsink fan combination and thermals

    • by eepok ( 545733 )

      Despite other commenters, I can't find anything in the article from PC Gamer or the source article from Igor's Labs naming a particular manufacturer EXCEPT for the Manli GeForce RTX 4080 16GB Gallardo (https://www.manli.com/en/product-detail-Manli_GeForce_RTX%E2%84%A2_4080_Gallardo_(M3535+N688)-315.html).

      I've never even HEARD of Manli before this article, so I'm guessing it's non-US manufacturer. I'll reserve judgement until other MFGs are inspected.

      • I have first hand experience with an ASUS TUF 4080S doing this. Jump on reddit and you'll find reports from a mix of manufacturers about this.

        I also suspect there's a second issue at play here. The 40xx series has an astonishingly high contact force specified for heatsinks likely leading to thinner past applications than ever before (virtually all of the cards have a rear spring retention mechanism for this reason). That would make the situation worse if the root cause is accurate.

  • I've got the poverty-grade 4060 that the internet peanut gallery hates. Seems to run everything that I'm interested in playing just fine at 1080p and I've got no complaints. Gamers who are like that old joke about audiophiles [reddit.com] are just impossible to please.

    • by Zuriel ( 1760072 )
      It's not that the 4060 doesn't work, it's that Nvidia decided to cripple it by saving $15 on RAM chips even though it's a fairly expensive part. Which is more or less the same problem people have with the bad thermal paste, come to think of it.
      • It's crippled by the bus, especially in my old-ass PC with PCI-E 3. The RAM is not even the biggest problem. But it does at least come within with a reasonable power budget. I should take some readings from it now so I can compare them later and determine whether it's degraded, so I know if I have to do something about thermal paste.

        • by Zuriel ( 1760072 )
          They kind of multiply each other - the slower bus is less of a problem if you have enough VRAM because then you can just copy assets onto the card once instead of sending them over the PCIe bus over and over again.
          • Yes, I got the 16GB version. It was a significant enough price increase to the 4070 that I feel it made sense for me. At some point I'll score a deal on an AM5 mb+cpu+ram combo (just as I got this AM4 combo cheap on eBay) and then I'll get up to PCIE 4 and my bus bandwidth will double. Even with my 1600AF this card will do 1080p60 ultra in pretty much everything, and it will do 4k60 in older titles at near ultra. I had to turn down some of the filtering to get 4k60 in FO4, but I got to keep the view distanc

        • Another day, another set of 3 downmods from my harasser.

          What a pathetic coward.

      • by Rei ( 128717 )

        It also makes it a terrible choice for AI. VRAM is king with AI. And bus speeds are the second most important thing if your model doesn't fit entirely in VRAM.

        Even if your model fits comfortably inside your VRAM with lots of room to spare, excess VRAM lets you run batching, which dramatically improves your performance.

      • It's not that the 4060 doesn't work, it's that Nvidia decided to cripple it by saving $15 on RAM chips even though it's a fairly expensive part. Which is more or less the same problem people have with the bad thermal paste, come to think of it.

        Nvidia wasn't trying to save money. Their goal was to prevent the 4060 from being "too" good of a value and cannibalizing 4070 and 4080 sales. It's not just Nvidia that does this. Every company carefully plans out their lineup for value and price to manage sales and margins.

    • I use one on my Blue Iris server, with CodeProject.AI, along with a single thread of video transcoding (TDarr node).
      Excellent for the job, GPU hotspot stays at 56 degrees Celsius with GPU fans turned off (they only start spinning at 60 C). When not transcoding and only doing surveillance camera AI detection, it stays at 48 C.
      The reason I picked the 4060 is it being the cheapest nVidia card with AV1 hardware encoding, and I encode all my timelapse surveillance videos in AV1.

    • by flink ( 18449 )

      They skimped on VRAM in the 4060. If you only play at 1080, then it won't really matter much. But it kinda sucks because it's otherwise a very affordable card. If they had let it cost an extra $20, it would have gone from a good card to a great one. It doesn't mean it's bad. If it suits your needs, then it is a great card for you, just not for the dedicated gamer who wants something that can comfortably do 4k.

    • ITS 2024. we had 1080 gaming 12 years ago. bragging about 1080 shouldn't be a thing....
      • Who's bragging? I'm saying that the reviewers who trashed the GPU didn't take into account that some people are totally okay with gaming in 1080p. Hell, my middle aged eyesight can hardly distinguish the difference between 1080p and 4k anymore anyway.

      • ITS 2024. we had 1080 gaming 12 years ago. bragging about 1080 shouldn't be a thing....

        Your 1080p from 12 years ago looks nothing like the 1080p of today. Your 12 year old cards would struggle to run several modern day titles at 1080p. The thing is, ... 1080p is actually more than sufficient of a resolution for many games.

        It's one of the reasons that AI upscaling is so important. 1080p was a perfectly fine resolution, but people upgraded their monitors which put a lot of extra pressure on GPUs for largely a meaningless fidelity increase. I actively miss the days of 1080p gaming. 4K screens ju

  • The margins on these cards are often razor thin, due to nVidia simultaneously controlling the product (down to approving the design of both the board and the cardboard box) and the pricing of the RAM and GPU and setting the MSRP. It's no surprise that corners get cut. On launch, MSRP can even be below cost, resulting in some board vendors resorting to rebates to try to hit MSRP. I blame nVidia rather than the board vendors.

  • Straight from linked story:

    >In my case, changing the paste on the RTX 4080 Super didn't improve the hotspot delta (so I'm going to try some PTM7950 at some point) but the overall die temperature did drop by a few degrees. I'm regularly monitoring all four cards to see if the temperatures start to creep up but I also feel that I really shouldn't have to do this.

    So no problems mentioned in the topic. And yet it's followed by...

    >Mind you, if they all started using PTM7950, then none of this would be an i

  • by Fly Swatter ( 30498 ) on Tuesday July 23, 2024 @01:13AM (#64648316) Homepage
    If it's thin that is fine, but if it gets thin enough when hot from operation it might not be fine if the GPU is positioned vertically since it can then slowly slowly run down within the heatsink to chip sandwich gap. Vertical also puts strain on this gap since even with hefty mounting supports gravity on the heavy heatsink and fan will eventually win, more so if the computer is moved or jostled a lot.

    If the chips are horizontal in your install, in theory the paste can't as easily run down anywhere and surface tension is on your side even more. Plus the heatsink isn't being pryed away from the top edge by gravity relative to the other lower edge (causing an inconsistent gap that favors paste running down from the wider top edge gap when vertical).
    • You should have mentioned "horizontal with chip at the bottom of the heatsink".
      In the vast majority of PCs, the heatsink hangs below the chip when the GPU is mounted horizontally. This arguably makes the situation even worse.

      • The important part is the even distribution of force and no paste run from gravity. Horizontal will be better for this regardless of whether the heat sink is sitting on or being pulled up against the GPU.

        Well unless the heat sink mount design is just really poor, but that's on the engineer.
        • Horizontal will be better for this regardless of whether the heat sink is sitting on or being pulled up against the GPU.

          I'm keen for you to back up this statement with actual test results. Again if there were any deviation in gaps it would show up in die to hotspot temperature differences between orientation. Simply putting your PC case on the side for a second or two and showing that the deviation spike up would be enough to prove your point. Can you?

    • Not really. Gravity is largely irrelevant given the incredible mounting pressures of the GPU heatsinks these days. It's like that ol' wives tale that you should top exhaust your computer because hot air rises. Yes technically correct, but that effect gets absolutely dominated by any fan to the point of irrelevance. Also virtually all cards are mounted in a way that the GPU is not holding the heatsink but rather is sandwiched between it. I.e. the heatsink is screwed to the back bracket and to all corners of

  • Krikey, use semi-sticky thermal pads, not paste.
    • You don't use pads for the GPU die itself. Pads do not have as good thermal conductivity as paste and wouldn't be good enough for that use, plus you want there to be as small a gap as possible between the heatsink and the die -- a pad is obviously a lot thicker than paste.
      • Yes and no. Yes there are actually thermal pads that are on par with the best of pastes. They are usually thin carbon/graphite pads or even some newer graphene pads (which are super expensive at like $40/pad).

        No, it's not common, especially the graphene ones. That said there have been several GPUs on the market - the only ones I've seen were AMD - which have thermal pads instead of paste on the GPU die. The ASUS R9 Fury Strix springs to mind as one of them that used a carbon pad instead of paste.

  • Then the same must be for AMD based GPU's by the same manufacturers.
    • No. Different GPUs have different requirements and it's possible also different mandated manufacturer specs. NVIDIA is well known for having demanding integrator specs. You can look for example at ASUS. ASUS has used carbon thermal pads on several of their AMD cards, but never for NVIDIA. That said the AMD cards they used them on were known to have uneven dies due to the chiplet design so maybe there was specific requirements there which made thermal pads better.

      Also the problem may be more pronounced on di

  • Isn't it time to do away with this 1980s kludge? Shouldn't the heat sinks now be manufactured integral to the device?

    • What a horrible idea. Are you implying that the vendors know what the best cooling solution is to the point of making it non-removable? How about the GPUs with coolers on them so large they don't fit in cases? What about the people who prefer blowers (despite NVIDIA's ban on integrators making blower coolers specifically so that business customers don't buy consumer GPUs?). What about watercooling?

      There is no 1980s kludge here. Thermal paste is a specific material designed for a specific engineering purpose

When you are working hard, get up and retch every so often.

Working...