Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
IT Technology

New BIOS Updates Attempt To Keep Ryzen 7000X3D Processors From Frying Themselves (arstechnica.com) 59

An anonymous reader shares a report: Over the weekend, users on Reddit and YouTube began posting about problems with AMD's newest Ryzen 7000X3D processors. In some cases, the systems simply stopped booting. But in at least one instance, a Ryzen 7800X3D became physically deformed, bulging out underneath and bending the pins on the motherboard's processor socket. In a separate post, motherboard maker MSI indicated that the damage "may have been caused by abnormal voltage issues." Ryzen 7000X3D processors already impose limits on overclocking and power settings, but new BIOS updates from MSI specifically disallow any kind of "overvolting" features that could give the CPUs more power than they were built to handle.

You can still undervolt your CPU to attempt to reduce temperatures and energy usage by giving the CPU a bit less power than it was designed for. The Ryzen 7000X3D processors are set to a lower voltage than regular Ryzen 7000 CPUs by default because the extra L3 cache layered on top of the processor die can raise temperatures and make the CPU more difficult to cool. This has also made the chips much more power-efficient than the standard Ryzen chips, but that efficiency comes at the cost of overclocking settings and other features that some enthusiasts use to squeeze more performance out of their PCs.

This discussion has been archived. No new comments can be posted.

New BIOS Updates Attempt To Keep Ryzen 7000X3D Processors From Frying Themselves

Comments Filter:
  • That's a throwback (Score:2, Informative)

    by Powercntrl ( 458442 )

    They had a similar problem with death by overheating 16 years ago. [youtu.be]

    • by Z00L00K ( 682162 ) on Monday April 24, 2023 @02:31PM (#63473488) Homepage Journal

      It's a balance between size and performance.

      A larger chip casing would be more expensive and use more space on the motherboard but it would make it easier to get rid of excess heat.

      The issue with heat is also why superslim laptops are worse when it comes to performance relative to the less slim ones with the same theoretical performance and desktops and towers have even better performance since there's even more room to get rid of the heat.

      • It's a balance between size and performance.

        A larger chip casing would be more expensive and use more space on the motherboard but it would make it easier to get rid of excess heat.

        Processor cooling is a multi-stage process.

        1) Get the heat out of the die
        2) Transfer it efficiently from the chip package to a heatsink
        3) Release it to a cooling fluid, either water or air.

        Increasing the package size will help with the second stage of this process by increasing the area in contact with the heatsink. But if the additional layers of cache memory prevent heat from being removed from the die you're going to have problems during periods of high workload when lots of cores are running at high clo

    • by DJRikki ( 646184 )
      Looks like 22 years going by the (C) at the end - cool Project X Team 17 MOD soundtrack too
  • by willy_me ( 212994 ) on Monday April 24, 2023 @02:18PM (#63473470)

    There has to be a reason why AMD put the cache on top of the CPU die. My initial thought is to simply put the cache between the CPU die and the interposer. This would allow the CPU die to be as close to the cooler as possible. It should allow for the same sort of power draw / heat generation as with there non-3D parts.

    But the engineers at AMD are not idiots and they would have certainly already thought of this. But it does beg the question as to why. Anyone have a good guess as to why?

    • by Z00L00K ( 682162 )

      I expect to see many more variants in the future, but basically we will probably end up in a situation with larger CPU casings and sockets just to keep things cool.

      One thought that has crossed my mind is if we will see changes to motherboards and computer cases where the CPUs are on one side and the expansion slots are on the other side. Or CPUs on both sides of the motherboard.

    • by edwdig ( 47888 )

      Probably because the CPU interacts with the motherboard, but the cache only interacts with the memory controller on the CPU. All those pins connecting the CPU to the motherboard would have to run thru the cache. if the cache was on the bottom. The connections will be a lot easier to make this way.

      • I was also thinking of that, but they could just run vias through the cache. You might loose 5-10% of the cache but you can always just make the die a bit larger. The additional trace length would be negligible so there wouldn't be any signal integrity issues. So there must be something else...
        • by jwdb ( 526327 ) on Monday April 24, 2023 @04:23PM (#63473720)

          The X3D variants are just X chips with an additional layer of cache added on top, right? In that case, they probably want to keep the manufacturing process for the two lines as similar as possible: i.e. the X3D runs through the same line as the X but just with an extra step to add a layer at the end.

          • Thanks, this sounds reasonable. But would they have really made this compromise if they knew it was going to impose a power limit? Adding requirements to a design for the first time will often cause issues. But this is supposed to be a second gen design - the requirements are not new. They should have observed these issues with their first gen parts so it is not like it was unexpected. I thought their second gen parts would solve this issue - but apparently not.

            AMD did apparently increase the voltag

            • by edwdig ( 47888 )

              The root issue here is the socket design specifies a maximum amount of heat the processor is allowed to produce, and all the surrounding components are designed around that specification.

              The Ryzen 7950X is designed to run at the max clock speed possible without going over the thermal limits. Sure, an individual chip might be able to be overclocked a little and stay in spec, but in general, there's very little extra headroom. It's the current flagship processor that offers maximum performance.

              The 7950X 3D is

              • If you want that 3D cache, you have to give up some clock speed to meet the thermal limits.

                But that is the issue - the 3D cache version has lower thermal limits. Not just the CPU - but the cache and CPU combined has lower thermal limits then a non-3D part. This works out great for efficiency, but not clock speeds. Turns out SRAM has very low power draw.

                I can only assume that the additional cache between the CPU die and the IHS adds a non-trivial amount of thermal resistance. And with a reduced ability to shed heat, the CPU die will end up running hotter then a normal die given IHS at the s

                • by jwdb ( 526327 )

                  You make a fair point that maybe there is a better way to deal with the extra thermal resistance that the extra cache introduces. However, practicalities might be interfering: if this is a niche part bought only by serious gamers, there might not be enough profit there to justify the extra R&D needed to improve the cooling. There could be other reasons as well, and maybe a later generation of the 3D chips will include better thermal designs, but only if the market demand can justify it.

  • by OverlordQ ( 264228 ) on Monday April 24, 2023 @02:19PM (#63473472) Journal

    It's like Monster cables for audio. How much is actual demonstrable useful improvement and not just 'Number go up'

    • by CAIMLAS ( 41445 ) on Monday April 24, 2023 @02:46PM (#63473514)

      To my mind, it hasn't made much of a meaningful impact since the early days of multicore CPUs being common, and became less meaningful as eg. gaming workloads moved the CPU to the GPU.

      So that puts the end of relevance somewhere between oh, 2007 when Crysis came out, and maybe 2012 or so. Hell, I was playing games with a 2012-ish AMD machine for years with an upgraded GPU and games didn't suffer any for having an older, slower CPU.

      Back in the 90s, when you could get 2x the performance out of an overclock with a water cooled overclock, sometimes even more (like with the first generation Celerons), it definitely made sense: games were still pretty dependent on single core clockspeed.

      I'm not aware of anything which would actually benefit from an overclock of the CPU on today's modern systems - even low-end laptop systems have 2GHz+ multicore CPUs.

      If you're bricking CPUs specifically designed for low voltage (like this one is) due to overclocking, that's entirely on you. There's a reason why certain CPUs aren't considered overclockable...

      Low power and overclocking don't go together.

      • That's because AMD CPUs are already factory-overclocked way over all reasonable limits -- because single-thread performance does matter.

        Only some problems are parallelizable. Algorithms where that's easy to do have already moved to the GPU. For the rest, it's either impossible or hard. You can of course deal with multiple problems in parallel, but there's only so much that can be done without much human effort. See compiles for example: on my 64-way early threadripper, the only time I see all cores at 100%

      • I'm not aware of anything which would actually benefit from an overclock of the CPU on today's modern systems - even low-end laptop systems have 2GHz+ multicore CPUs.

        the nvidia 4090 is CPU bottlenecked in many games.

        if you play at low resolutions (e.g., 1080p) and want high framerates (>120), regardless of GPU, CPU performance plays a very large role.

        at 4k, CPU performance doesnt affect much (assuming you have a modern ~12+ thread CPU), but games are getting much more demanding, plus recent ports to PC have been rather lazy, relying on PC brute force performance to overcome poor coding... so CPUs are becoming important again.

    • On my own personal desktop, going from 3.5GHz to 4.2GHz ("all-core turbo") makes a very noticeable performance impact on my seriously outdated system (i7-3930k). That being said, it requires a substantial cooling solution. The usefulness of it, in my case, comes from eking a bit more useful life out of the machine before i eventually replace it. If the chip fries it's around $30 to replace it, but that hasn't happened in the half-decade it has been running like this (and it is definitely showing its age ver
    • by edwdig ( 47888 )

      It's like Monster cables for audio. How much is actual demonstrable useful improvement and not just 'Number go up'

      It depends on what you're doing and which CPU you're picking.

      The high end CPUs are clocked out of the box extremely close to the max the chips can handle. You won't get much benefit from overclocking them. But sometimes you can buy a chip lower down the product range and overclock it to get performance similar to the top end chips. If you need performance but you're on a tight budget, it can be a cost effective way to go.

      As to how much it matters, well, it probably doesn't matter if you're gaming or browsin

      • by Rhipf ( 525263 )

        The problem is that those cheaper CPUs are cheaper because they didn't pass the quality assurance check at the factory to be sold as high-end units. If you are then going to overclock them to get to the specs of the costlier CPUs you have to also accept the risk that bad things might happen to the chip as a consequence.

        • by dryeo ( 100693 )

          Sometimes, almost all the CPU's pass quality assurance check, but the manufacturer still needs lots of cheaper CPU's, so some get binned into the cheap bin. Those are worth overclocking, still a gamble but less so once a generation has been manufactured for a while.

    • by genixia ( 220387 )

      I disagree with your analogy. Monster cables were never much more than a marketing play designed to separate people from their money. You could get identical results with any half-decent cable costing much less. At least overclockers get a measurable improvement in performance for some benchmark definition or other that represents a real-world task.

      That said, a lot of overclocking just doesn't make sense, especially now where desktop CPU power just isn't really an issue for many people. GPU power has tak

  • Expected (Score:5, Insightful)

    by Tailhook ( 98486 ) on Monday April 24, 2023 @02:22PM (#63473480)

    These CPUs are on the ragged edge. A quick look at the specs for a 7000X3D: Max power draw is 276W, and core voltage is limited to 1.4V. That's 197A (!) of current. V, being less than 1% of A, means that precise voltage regulation is crucial: any small error in regulation, due to either bad regulators or bad firmware, will rapidly smoke the device.

    If you're buying this stuff you're a beta tester. The board manufacturers and BIOS developers are shipping before all the corner cases are worked out. That's fine if they honor their warranties, etc. But just have realistic expectations.

    • A quick look at the specs for a 7000X3D: Max power draw is 276W, and core voltage is limited to 1.4V. That's 197A (!) of current.

      Nice one, but that's not really how it works.
      Do you honestly believe all that power goes in through a single wire?

      • Number of wires has nothing to do with amps.

        • It has everything to do with amps per wire, which matters a lot.

          • Not in this case. Power consumption determines heat production, end of story. Get out of here with your nonsensical rambling.

            • You're swinging from one aspect to another quite quickly, don't you.
              Joule's law and equation clearly determine the thermal (heating) effect of current. In case of CPUs, things are much more complicated than just Joule's equation, because we are talking about current flowing through numerous wires.
              You're severely oversimplifying how things work. That value of 197A makes no sense, because that's valid only if all current flows through a single wire (which it does not, in a CPU).

              Just as a random example, look

    • by Anonymous Coward
      The leading edge cuts you; the bleeding edge already did.
    • Re:Expected (Score:4, Informative)

      by Phydeaux314 ( 866996 ) on Monday April 24, 2023 @03:34PM (#63473600)

      I'm not sure where you're getting 276W. AMD defines maximum electrical draw (package power tracking, or PPT) as 1.35x the rated TDP.

      Non-X3D parts on AM5 are limited to 170W TDP at stock, which has a maximum power draw of 230W.
      X3D parts on AM5 are limited to 120W TDP, which is 168W electrical. However, the default behavior limits them to 65W TDP, or 88W electrical.

      https://www.anandtech.com/show... [anandtech.com]

      AMD uses a really weird definition of TDP that isn't super consistent between versions, but even with all the inconsistencies it's generally a pretty good approximation for power draw and thus heat output.

  • by CAIMLAS ( 41445 ) on Monday April 24, 2023 @02:39PM (#63473498)

    This is not a processor problem, it's clearly a motherboard - and specifically, an MSI - problem.

    The CPU itself has no control over the power it's fed. That determination is made by the motherboard and the voltage regulators.

    If a user wants to mess with that, it's not the CPU frying itself.

    • by UMichEE ( 9815976 ) on Monday April 24, 2023 @03:26PM (#63473590)

      But shouldn't the CPU have temperature sensors and shut itself off when it reaches a dangerous temperature? I've worked on chips that sold for $1-2 that had that feature. Surely a $200-400 CPU should do that.

      • by CAIMLAS ( 41445 )

        I mean, probably. I'm not knowledgeable enough about what's possible in that domain to know what's common or what can be done.

        To my mind, it's basically a design decision. I think it's probably reasonable to have some degree of thermal control on the CPU, but I acknowledge it probably comes at a cost.

        Unless there's a specific environmental requirement for thermal protection (eg. space station?) I can see it being overlooked.

        I've had all sorts of things die thermal death, so to hear that $1 devices have ther

        • by Khyber ( 864651 )

          "I've had all sorts of things die thermal death, so to hear that $1 devices have thermal controls is a bit surprising to me."

          Even shitty $5 LED flashlights come with thermal regulation in the form of thermistors on the power driver board. The off-road lighting I manufacture also has that, plus a temp sensor in the MOS-10 power IC that simply shuts the unit off entirely if it gets too warm.

          Almost everything has some sort of temp sensing/regulating component on it, now days.

      • Modern CPUs have multiple thermal sensors on the die in crucial areas. The trick is getting the power supply chain to respond fast enough when VTEC kicks in.

        It's kind of bonkers, really. The density of transistors is so high these days that you can't switch all the transistors in an area at once, or they create a hot spot and instantly vaporize. We've reached the point where even if we make transistors smaller, we may not be able to use all of them.

    • by Anonymous Coward
      MSI? Well there's your problem.
      • by CAIMLAS ( 41445 )

        I wasn't going to say it. But yes.

        They've been the "pretty LED light gamer board at a discount" for a decade at this point.

        • by Shinobi ( 19308 )

          For more than two decades. I've had them in my Book of Grudges ever since 2001 or so, when I had 3 MSI boards in a row fail on me.

          Another, later, incident was when a cheap test cluster was being built and the buyer disregarded my advice and bought MSI motherboards. 75% had some sort of serious flaw, ranging from dying at first boot, through serious hangs, to weird performance drops, to flaky connectors. Poor Barton core Athlon XP's, probably the most stable AMD CPU's ever, let down by MSI and VIA back then

        • I've been building PC's for over 20 years, and my take on MSI is that they like to squeeze out that last little bit of performance by pushing things just a bit too hard. It lets them win the benchmarks on those review sites that scale the plots so a 1% difference between the "top" and "bottom" performing boards looks absolutely massive. But that also comes at a cost, and that they have this problem doesn't surprise me.

          The other brand that seems to be heavily affected is Asus, which also isn't surprising a

    • > an MSI - problem.

      No surprise there!

      The idea that they make a low power part and lock it to 120W and then market it to gamers is like - do you even know any gamers AMD?

      The 7950X seems better for power anyway. Looks like it's unlocked and can be brought down to 105W TDP.

      AMD often has new experimental chips, which is great, but I learned not to touch that stove years ago.

    • Au contraire, the microprocessor has control over its voltage and frequency.

    • No. Flat out wrong.

      The motherboard doesn't "push" power into the CPU. The CPU controls its power draw. It does this based on what work it is doing. It is not the motherboard's responsibility (or even something possible for it to do) to control the amount of power going to the CPU. It does control voltage (kind of- really the CPU requests a voltage, and the VRM supplies it), but that's not the problem here. The problem is that the CPU is drawing more than it can dissipate.
  • The BIOS accidentally included some debug code that had the HCF (Halt and Catch Fire) opcode.

  • by chefren ( 17219 ) on Wednesday April 26, 2023 @02:50AM (#63477218)

    Please read this much better article on the topic:
    https://www.anandtech.com/show... [anandtech.com]

    The problems arise when the SoC voltage increases over 1,25V (according to AMD) due to RAM overclocking, even when this overclocking is just done by purchasing an RAM kit officially supported by the motherboard manufacturer and the appropriate EXPO profile is selected in the BIOS.

    This might set the SoC voltage to 1,35V or higher and that is the environment where CPUs have fried.

    ASUS is for example rolling out beta BIOSes right now to limit the RAM voltage to 1,3V maximun regardless of what the EXPO profile states. This is likely to mean some RAM won't be able to run with their stated clocks/timings, meaning people will have paid a premium for officially supported high end RAM and now won't be able to get the performance they paid that premium for.

    I think the best quick fix for everyone is to disable any RAM overclocking, including EXPO, which will set the SoC voltage to 1,15V. Then wait for this to settle and updated BIOSes to become available and only then enabling any RAM overclocking. Possibly also asking for refunds because you paid extra for what essentially amounted to false advertising (since some RAM was marketed as officially supported by the motherboard when run using EXPO - just check various motherboard RAM compatibility lists)

  • Comment removed based on user account deletion

This is now. Later is later.

Working...