Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage IT

Huge Capacity HDDs Shine In Latest Storage Reliability Report But There's A Caveat (hothardware.com) 39

Hot Hardware reports: When it comes to mechanical hard disk drive (HDDs), you'd be very hard pressed to find any data on failure rates reported by any of the major players, such as Western Digital, Seagate, and the rest. Fortunately for us stat nerds and anyone else who is curious, the folks at cloud backup firm Backblaze frequently issue reliability reports that give insight into the how often various models and capacities give up the ghost. At a glance, Backblaze's latest report highlights that bigger capacity drives -- 12TB, 14TB, and 16TB -- fail less often than smaller capacity models. A closer examination, however, reveals that it's not so cut and dry.

[...] In a nutshell, Backblaze noted an overall rise in the annual failure rates (AFRs) for 2022. The cumulative AFR of all drives deployed rose to 1.37 percent, up from 1.01 percent in 2021. By the end of 2022, Backblaze had 236,608 HDDs in service, including 231,309 data drives and 4,299 boot drives. Its latest report focuses on the data drives. [...] Bigger drives are more reliable than smaller drives, case close, right? Not so fast. There's an important caveat to this data -- while the smaller drives failed more often last year, they are also older, as can be seen in the graph above. "The aging of our fleet of hard drives does appear to be the most logical reason for the increased AFR in 2022. We could dig in further, but that is probably moot at this point. You see, we spent 2022 building out our presence in two new data centers, the Nautilus facility in Stockton, California and the CoreSite facility in Reston, Virginia. In 2023, our focus is expected to be on replacing our older drives with 16TB and larger hard drives," Backblaze says.

This discussion has been archived. No new comments can be posted.

Huge Capacity HDDs Shine In Latest Storage Reliability Report But There's A Caveat

Comments Filter:
  • New drives haven't failed so much...yet. Maybe there is one simple trick that can prevent them from failing later? Maybe you'd be just as informed by clicking on the picture of the woman with the inflated chest at the bottom of the page.

    • by shanen ( 462549 ) on Tuesday January 31, 2023 @02:51PM (#63254595) Homepage Journal

      Doesn't seem like a constructive criticism nor like much of an attempted joke.

      But as regards the failure rate, it obviously depends on how you use them, and there is no way to cheat time in the direction required to get better long-term test data faster. We can slow things down, but we can't speed time up. If you are concerned about failures, then obviously you want to mirror your data.

      But if you just want to criticize the editor, then maybe you should wear the shoes for a while. My theory is that the Internet needs a whole lot more editing than it's getting these years. (And research is NOT the same thing as a quick search for "evidence" of whatever it is you want to believe.)

      • There is nothing constructive possible here. Much of the internet runs on this kind of thing -- a bunch of bored people, desperate for some "news", and a bunch of hungry "journalists" willing to say anything to get those people to view some ads that might help pay for their lunch. It's not healthy. So what can we do about it? People can't stop clicking. Nobody in the "free world" can agree on how to regulate it.

        Personally, I completely agree that being an editor is terrible job these days (if it ever w

        • by Anonymous Coward
          Backblaze regularly publishes a report which provides insight into HDD reliability across different manufacturers and models which is only possible with access to many drives in real-world usage. One is an anecdote, many is data. You thinking this is a fluff piece says more about you than about editors, readers or anyone else.
          • I'm familiar with Backblaze and their report and I have no problem with it. I do question some interpretations that have been made of it in the past, but even that's not my point here at all. Clickbait is about how and why you get to the story. Once you are there, the story is often of little value, period, or in this case just of no value to you at the moment, because you wanted to find out if and why the large HDDs were more reliable, but there's actually no real info on that.

        • by shanen ( 462549 )

          Now I think you're wandering into the area of dopamine manipulation. It's the key to addictive behaviors that are being harvested for profit. And I don't think you can accuse Slashdot of being profitable, so it doesn't apply here.

          My pick for a great breakthrough technology? A dopamine monitor that could help people know when they are being manipulated.

      • by necro81 ( 917438 )

        But as regards the failure rate, it obviously depends on how you use them, and there is no way to cheat time in the direction required to get better long-term test data faster. We can slow things down, but we can't speed time up.

        Accelerated Life Testing is a well-established discipline within engineering, grounded in rigorous statistics, and used in just about every industry.

        One of the best ways to speed up testing is to raise the operating temperature, sometimes to well outside of its rating. This me

        • by shanen ( 462549 )

          And I'm not disagreeing with any of that, but as you note, it isn't the same thing as the passage of time in the real world. They have to make all sorts of assumptions, though I think you left out the most important ones about future usage patterns of the DUT.

          I'm still having trouble dismissing it as a clickbait non-story, which was the premise of the thread. The spirit was willing, but the bait was weak? (Though my personal motivation lacking or weak? I'm writing this on an old machine with a hard disk tha

    • by Entrope ( 68843 )

      Maybe there is one simple trick that can prevent them from failing later? Maybe you'd be just as informed by clicking on the picture of the woman with the inflated chest at the bottom of the page.

      The one simple trick: Replaced your scantily clad hard drive with inflated capacity with a newer, more inflated one before the first one gets too old. Plan ahead and make an iron-clad recovery agreement to get your data out cleanly before you get too entangled with the old hard drive.

      Wait, I may have gotten those topics a little bit mixed up.

    • by gweihir ( 88907 )

      You need to look at a real failure curve some time. HDDs have the "bathtub" characteristic there.

      • Thanks, to you and the others who pointed this out. That is not uncommon, and I should have considered it. I will upgrade my opinion to "Still Clickbait" (what isn't?), but not a Non-story".

      • by pz ( 113803 )

        Except that Backblaze has good data that shows it isn't a bathtub, but more like a hockey stick. Infant mortality on new disk drives is largely a thing of the past.

        https://www.backblaze.com/blog... [backblaze.com]

        https://www.backblaze.com/blog... [backblaze.com]

        • by gweihir ( 88907 )

          Hmm. This would indicate most failures are not typical mechanical failures though. May also be due to the environment at Backblaze.

          But anyways, that is an interesting data-point. And yes, HDDs have gotten massively more reliable over the past two decades.

        • by Bert64 ( 520050 )

          Poor handling is likely a significant cause of early drive death...

          Backblaze will buy in bulk direct from the manufacturer, and they know how to handle and install drives because they do this every day.

          A random retail store may store the drives poorly, they may get dropped or have heavy items placed on top etc. Then a random guy buys it, takes it home in the back of his car and installs it without proper anti static protection etc.

          • by nasch ( 598556 )

            That reminds me of a story on TheDailyWTF about a drive that kept failing, and it turned out the construction guy was using the server as one end of a sawhorse and cutting boards on it with a power saw.

  • Older drives fail more often? Who knew?
    • Older drives fail more often? Who knew?

      Ah, to their credit, at least we the audience isn't being lied to about the reason. Or not being provided data at all, which is basically what every manufacturer of never-fail hard drives wants you to believe.

      I'll take the honesty and reality of "old shit fails faster" any day.

    • by AmiMoJo ( 196126 )

      There is a more important issue with newer drives that TFA fails to mention. Read/write speeds have not increased with capacity.

      When a drive dies and the RAID has to be rebuilt, it's going to take a hell of a long time with 20TB drives. With ZFS it's not uncommon for rebuilds to take several days. With SMR drives that can extend to weeks.

      If you actually need to array to rebuild in a reasonable amount of time, smaller drives are better.

  • So... HDD failure rate has gone up, but that's because they have a lot of old drives that are more likely to fail.

    Is this actually telling us anything at all? Are there any meaningful conclusions that can be drawn from this data set?

    • Personally, I think it's kind of remarkable they have so many drives that have lasted as long as they have (7+ years). My understanding has been--and confirmed by my own experience--that once a modern drive hits ~5 years old, its days are numbered. I wonder if a lot of their drives spend most of their time quiescent? If no one is reading or updating data on them for days at a time, there's no point to having them run. That's got to help with longevity.
      • Personally, I think it's kind of remarkable they have so many drives that have lasted as long as they have (7+ years). My understanding has been--and confirmed by my own experience--that once a modern drive hits ~5 years old, its days are numbered.

        I wonder if a lot of their drives spend most of their time quiescent? If no one is reading or updating data on them for days at a time, there's no point to having them run. That's got to help with longevity.

        Backblaze mostly avoids the worst thing you can do to a spinning-rust drive: Power-cycling.

        HDDs happily run for a long, long time if you keep them spinning. Just like a car engine suffers most of its wear during startup (before the oil pressure is up), HDDs suffer most wear during the power-down and power-up sequence.

        IMHO, that's the main reason why Backblaze gets to have so many geriatric drives. That, and the fact that they sit in stationary servers all the time.

  • Whodathunkit? Surely the older drives came in various sizes, and *that* would provide useful data for comparison and a reasonable projection for new drives in various sizes, but alas, no. Unless the summary is botched, this sounds like a pointless story as others are saying.

    • by suutar ( 1860506 )

      The actual BackBlaze reports do in fact go into that level of detail, as I recall. The story in HotHardware kind of glosses over it, though.

  • by ewhac ( 5844 ) on Tuesday January 31, 2023 @03:36PM (#63254733) Homepage Journal
    Is there a reliability breakdown on CMR versus SMR (conventional versus shingled) drives? The shingled recording technique always struck me as a flimsy hack -- clever, but probably more prone to failure.

    Also: Has anyone besides me noticed that manufacturer's claimed unrecoverable read error rates have basically not improved in the last several years, and are stuck at 10**14 bits (10**15 on "enterprise" drives)?

    • by slaker ( 53818 ) on Tuesday January 31, 2023 @04:00PM (#63254833)

      Glancing at the list of drives they're using, I don't see any SMR models.

      Anecdotally, I can say that I've had good luck with crappy de-shelled SMR drives, but I've also found that how well they behave in storage arrays depends a lot on the OS (they work better in storage spaces than zpools in my experience) and the brand (Seagate SMR drives tend to work better than WD). I only have 86 of them in service, having purchased 90 4-6TB drives four years ago, and that's not a huge sample.

      My datacenter guys have been selling me lightly used 16TB Enterprise drives and I'm happy to have them. SMR drives often top out at 25MB/sec transfer rates for extended data transfers and boy did that get old.

    • by suutar ( 1860506 )

      Backblaze doesn't use SMR drives [blocksandfiles.com] as it happens.

  • Remember after/during the hard drive shortage when there were entire models of 2TB drives that had failure rates over 30% in the first year and BackBlaze absolutely went off on them? I remember. Because it was on Slashdot. Also I absolutely roasted them in a live product presentation chat and they scrambled to lie/cover for it. Weirdly enough, I didn't win the giveaway.
  • And to top it off, High capacity HDDs like the 14~22 TB range are bought primarily by enterprises, with PROsumers a distant second.

    No wonder high capacity HDDs (and HDDs in general) are getting more reliable over time.

  • by lsllll ( 830002 ) on Tuesday January 31, 2023 @04:23PM (#63254925)

    I used to manage and service a Beowulf cluster with IBM GPFS for its data storage. During the 6 years after I installed 288 x 2TB WD Enterprise SATA drives, 2 failed for good and I replaced 4 because they were starting to get bad sectors. I don't know what their strategy for changing their drives is (failed vs. preventive), but if I go with "failed", then my rate was 0.1% per year.

    I just checked my own workstation because I remember I set up 3 x 1TB WD Enterprise SATA in RAID5. smart-ctl says they've been up for 99792 hours (11.4) years with 0 reallocated sectors for all 3. So I'm batting pretty well!

    After the customer stopped my service on their Coraid devices, they threw in consumer-grade drives in there because they were cheaper. Worse yet, they weren't really monitoring them either, and sure enough disaster followed and I was called and had to manually reassemble the containers and get their data off the containers that were failing because of flaky drives. My basic advice is to not use non-enterprise drives in a setting that they're going to be on 24x7.

    • BackBlazes argument is that they have enough drives and redundancy that the cost savings offsets any reliability issues. Some designs (eg GPFS) has different design requirements and thus the reliability needs different characteristics.

  • I always enjoy seeing these Backblaze reports, and it's especially nice to know larger drives can be trusted, as it's once again time to shift a whole lot of old data from me onto larger drives so I can combine drives... will probably just get a bunch of 14TB drives,.

  • by Anonymous Coward
    Their numbers on some of their tables are kind of screwy. They have drives from HGST on the tables with average lifetimes of 21 months, 38 months and 3.9 months, when HGST-branded drives haven't been manufactured since 2018. Also, there's a line for Hitachi drives with a total drive count (working and failed) of 95, but with 4.4 million drive days-- an average of 127 years life per drive. I think someone screwed up an Excel table somewhere.
    • by slaker ( 53818 )

      As I understand it, HGST is still a separate business unit with distinct manufacturing processes from WD itself. The models and standard designations still exist, only they are technically WD Ultrastar models now rather than HGST Ultrastar.

      If Backblaze wants to keep calling them HGST, I don't see why that's an issue.

"I've seen the forgeries I've sent out." -- John F. Haugh II (jfh@rpp386.Dallas.TX.US), about forging net news articles

Working...