Taking a Hard Look At SSD Write Endurance 267
New submitter jyujin writes "Ever wonder how long your SSD will last? It's funny how bad people are at estimating just how long '100,000 writes' are going to take when spread over a device that spans several thousand of those blocks over several gigabytes of memory. It obviously gets far worse with newer flash memory that is able to withstand a whopping million writes per cell. So yeah, let's crunch some numbers and fix that misconception. Spoiler: even at the maximum SATA 3.0 link speeds, you'd still find yourself waiting several months or even years for that SSD to start dying on you."
Our first age-related failure was a 2008 drive. (Score:5, Interesting)
The Intel X25's in my PC, from 2009, are still humming along nicely and my last benchmark produced the same results in 2012 as they did in 2010. But I've gone so far as to set environment variables for user temp files to a mechanical drive, internet temp files to a RAM drive and system temp files to a RAM drive, offsetting the wear leveling.
Re:Holy idiocy batman (Score:2, Interesting)
A quick glance at wikipedia tells me that you're being rather pessimistic...
"Most commercially available flash products are guaranteed to withstand around 100,000 P/E cycles before the wear begins to deteriorate the integrity of the storage. Micron Technology and Sun Microsystems announced an SLC NAND flash memory chip rated for 1,000,000 P/E cycles on 17 December 2008."
http://en.wikipedia.org/wiki/Flash_memory#Memory_wear [wikipedia.org]
Re:If SSd is nearly full? (Score:5, Interesting)
But if your SSD is nearly full with data that you never change, wouldn't all the writing happen in the small area that is left? This would significantly reduce lifetime.
I believe all the major brands actually move your data around periodically, which costs write cycles but is worth it to keep wear balanced.
Life is tricky for flash (Score:5, Interesting)
meaningful life specs are tough to come by for flash. Yes, as noted above, SLC NAND has a rated life of 100k erases/page on the datasheet, but that's really a guaranteed spec under all rated conditions, so in reality, it lasts quite a bit longer. If you were to write the same page once a second, you'd use it up in a bit more than a day.
However, in real life, the "failure" criteria is when a page written with a test pattern doesn't read back as "erased" in a single readback. Simple enough, except that flash has transient read errors: that is, you can read a page, get an error, read the exact same page again and not get the error. Eventually, it does return the same thing every time, but that's longer than the "first error".
There's also a very strong non-linear temperature dependence on life. Both in terms of cycles and just in terms of remembering the contents. Get the package above 85C and it tends to lose its contents (I realize that the typical SSD won't be hot enough that the package gets to 85C, although, consider the SSD in a ToughBook in Iraq at 45C air temp..)
In actual life, with actual flash devices on a breadboard in the lab at "room temperature", I've cycled SLC NAND for well over a million cycles (hit it 10-20 times a second for days) without failure. This sort of behavior makes it difficult to design meaningful wear leveling (for all I know, different pages age differently) and life specs, without going to a conservative 100k/page uniform standard, which, in practice, grossly understates the actual life.
What you really need to do is buy a couple drives and beat the heck out of them with *realistic* usage patterns.
Number crunching != empirical evidence (Score:0, Interesting)
In fact file systems need superblocks and they can't just evenly distribute everything across the platter. The superblock is obviously the first to go, so you'd need to cope with that by having various possible locations for it. Where do you store the location? In a superduperblock? How long does that last? Where do you store data on how many writes have hit each block? How many times do you overwrite that?
After all this basic housekeeping, maybe, you can spread everything else across the platter.
This calculation is best-case start to finish. Drives are not written with perfect evenness - that would be very very hard if not impossible to achieve. So you need actual figures for how well this can be done in practice. Any conclusions you make without that empirical data are likely to be overstated.
Re:If SSd is nearly full? (Score:5, Interesting)
SSD should work at maximum of 75% of their capacity... 50% or less is recommended
some chips try to move blocks to rotate the writes, have a lot of spare zones, so it can remap/use other sectors on write... but that is a problem, working in a full SSD will shorten its live
Re:100,000? (Score:5, Interesting)
Luckily, while he's about 30 times out for the write endurance on the bad side, he's about 100-1000 times out on the speed at which you're likely to ever write to the things, on the good side, so in reality, SSDs will last about 3-30 times longer than he's indicating in the article. The fact that he's discussing continuous writes at max sata 3 speed suggests that he's really concerned with big ass databases that are writing continuously, and use SLC NAND. The consumer case is in fact much better than that, even despite MLC/TLC.
Re:Number crunching != empirical evidence (Score:4, Interesting)
Which is why most SSD drives implement some kind of wear leveling. They will move the often written sectors around the physical storage space in an effort to keep the wear even.
Rotating media drives do similar things and can physically move "bad" sectors too, but this usually means you loose data. Many drives actually come from the factory with remapped sectors. You don't notice it because these sectors are already remapped on the drive onto the extra space the manufacturers build into the drive, but don't let you see.
Reminds me of when I interviewed with Maxtor, years ago. They where telling me that the only difference between their current top of the line storage (which was like 250G at the time) and their 40 Gig OEM drive was the controller firmware configuration and the stickers. Both drives came off the same assembly line and only the final drive power up configuration and test step was different, and then only in the values configured in the controller and what stickers got put on the drive. If you had the correct software, you could easily convert the OEM drive to the bigger capacity, by writing the correct contents to the right physical location on the drive. The reason they did this was it was cheaper than having to stop and retool the production line every time an OEM wanted 10,000 cheap drives.
I'm sure drive builders still do that sort of thing today. Set up a 3Tb drive line, then just down size the drives which are to be sold as 1Tb drives.
Re:Tried It - Disappointed (Score:4, Interesting)
Actually, better SSD controllers sense that a page has reached its rewrite limit. The end effect of this is that the size of the overprovisioned space gets reduced by one page. (The controller stops ever writing to the used-up page.) The write performance of the SSD degrades until it goes below a certain amount of overprovisioned space, at which point it refuses to write any more. The disk is still entirely readable, so it's a binary failure mechanism, but a pretty safe one.
Gradual failure over time means either you have a crap controller or that your electronics are failing in ways other than running out of write cycles.
Re:Holy idiocy batman (Score:5, Interesting)
RAM disks are cool and all, but except on live CDs they're usually unnecessary. The kernel's buffer cache and directory-name-lookup cache (in RAM) can often outperform RAM disks on second reads and writes.
(Claimer: I worked on file systems for HP-UX, and we measured this when we considered adding our internal experimental RAM FS to the production OS.)
Re:Holy idiocy batman (Score:5, Interesting)
Actually, NAND flash doesn't "die" when you try to do the N+1 erase-write cycle (it's cycles, not writes. A cycle consists of flipping bits from 1 to 0 (aka write), and then from 0 to 1 (aka erase)). In practically all controllers, you do partial writes. With SLC NAND, it's fairly easy - you can write a page at a time, or even half pages. MLC lets you do page at a time as well - given typical MLC "big block" NAND of 32 4k pages, a block can be written 32 times before it's erased (once per page - you cannot do less than a page at a time).
And... other dirty little secret - the quoted cycle life is guaranteed. It means your part will be able to be written and erased 3000 times. Most typically, they're an order of magnitude more conservative - so a 3000 cycle flash can really get you 30,000 with proper care and tolerance.
Of course, a really big problem with cheap SSDs is lame firmware because what you need is a good flash translation later (FTL) which does wear levelling, sector translations, etc. These things are VERY proprietary and HEAVILY patented. A dirt cheap crappy controller you might find on low end thumbdrives and memory cards may not even DO translation or wear levelling. The other problem is the flash translation table must be stored somewhere so the device can find your data (because of wear levelling, where your data is actually stored versus where your PC thinks it is different - again, the FTL handles this). For some things, it's possible to just scan the entire array and generate the table live, but generally it's impractical at the large scale because it requires time to perform the scan. So usually the table is stored in flash as well, which of course is not protected by the FTL. Depending on how things go, this part could corrupt itself easily leading to an unmountable device or basically, a dead SSD.
For some REAL analysis, some brave souls have been stressing cheap SSDs to their limits until failure - http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm [xtremesystems.org]
Some of those SSDs are actually still going strong.
The best bet is to buy from people who know what they're doing - the likes of Samsung (VERY popular with the OEM crowd - Dell, Lenovo, Apple, etc.), Toshiba, and Intel - who all make NAND memory and thus actually do have experience on how to best balance speed and reliability. Everyone else is just using the datasheet and just assembling them together like they would any other PC part.