Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

MIT's New File System Won't Lose Data During Crashes 168

Posted by Soulskill on Monday August 24, 2015 @08:54AM from the breaking-in-a-way-that-doesn't-break-other-things dept.

jan_jes sends news that MIT researchers will soon present a file system they say is mathematically guaranteed not to lose data during a crash. While building it, they wrote and rewrote the file system over and over, finding that the majority of their development time was spent defining the system components and the relationships between them. "With all these logics and proofs, there are so many ways to write them down, and each one of them has subtle implications down the line that we didn’t really understand." The file system is slow compared to other modern examples, but the researchers say their formal verification can also work with faster designs. Associate professor Nickolai Zeldovich said, "Making sure that the file system can recover from a crash at any point is tricky because there are so many different places that you could crash. You literally have to consider every instruction or every disk operation and think, ‘Well, what if I crash now? What now? What now?’ And so empirically, people have found lots of bugs in file systems that have to do with crash recovery, and they keep finding them, even in very well tested file systems, because it’s just so hard to do.”

This discussion has been archived. No new comments can be posted.

MIT's New File System Won't Lose Data During Crashes

Load All Comments

Search 168 Comments Log In/Create an Account

Comments Filter:

But is it useful? (Score:5, Interesting)

by Junta ( 36770 ) writes: on Monday August 24, 2015 @08:58AM (#50379191)

slow compared to other modern examples, but the researchers say their formal verification can also work with faster designs
If we can accept 'slow', it's not that difficult to build an always consistent filesystem. While they can formally verify a faster design should one exist, there remains the question of whether it's possible to be formally proven to be resilient to data loss while taking some of the utterly required performance shortcuts for acceptable performance. I suspect the answer is that some essential performance 'shortcuts' will fail that verification.

Share
twitter facebook
- Re: (Score:2)
  
  by FranTaylor ( 164577 ) writes:
  
  If we can accept 'slow', it's not that difficult to build an always consistent filesystem.
  
  citation required
  - Re:But is it useful? (Score:5, Funny)
    
    by Anonymous Coward writes: on Monday August 24, 2015 @09:25AM (#50379391)
    
    A sufficiently slow writing filesystem is indistinguishable from a read-only filesystem. Read-only filesystems are consistent. Therefore, a slow enough writing filesystem is consistent.
    For more serious analysis of crash-resistant write sequences, look elsewhere in this discussion.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by FranTaylor ( 164577 ) writes:
      
      what does this phrase even mean? why does speed or slowness matter? you can play the video of a disk disaster at any speed you want
      - Real-time (Score:2)
        
        by tepples ( 727027 ) writes:
        
        what does this phrase even mean? why does speed or slowness matter?
        Speed means the ability to finish reading and writing all data associated with a job before the job's soft real-time deadline has expired.
        
        Re: (Score:1)
        
        by FranTaylor ( 164577 ) writes:
        
        before the job's soft real-time deadline has expired.
        what does this have to do with anything?
        
        Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        If reading or writing files in a particular file system is slow enough that it makes applications painful to use, the file system won't pass into widespread use.
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
  - Re: (Score:2)
    
    by arglebargle_xiv ( 2212710 ) writes:
    
    If we can accept 'slow', it's not that difficult to build an always consistent filesystem.
    citation required
    Sheesh, do we have to do the Googling for you [wikipedia.org]?
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Well it won't lose data... (Score:2)

by frank_adrian314159 ( 469671 ) writes:

... until they find a logical flaw in their proofs or the bugs in mechanical verifier(s) that helped them prove the driver correct.
- Re: (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  Or – more likely – until they find a flaw in their assumptions, like a lower-level software stack that swears "yes, this data has been committed to storage" slightly before that's actually true.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  That is pretty unlikely, but the whole thing is a worthless stunt anyways. The problem is that they have to use some hardware model and that will have errors. Hence the assurances they claim are purely theoretical, and in practice their thing may well be less reliable than a well-tested file system with data-journal, like ext3.
- - - - Proving Coq itself (Score:1)
        
        by tepples ( 727027 ) writes:
        
        formal proof using Coq
        How do you prove Coq is correct, that it doesn't have a Ken Thompson bug (see "Reflections on Trusting Trust") causing a false positive on proof of Coq's own correctness? Is there an independently written implementation of Coq's proof language suitable for David A. Wheeler's "diverse double-compiling" method? And how do you prove that the hardware on which Coq is run doesn't have a flaw that affects Coq's correctness?
        
        Re:Proving Coq itself (Score:5, Interesting)
        
        by TheRaven64 ( 641858 ) writes: on Monday August 24, 2015 @10:37AM (#50379933) Journal
        
        If you actually want an answer, I suggest that you go and read the Coq papers. You might learn something.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by arglebargle_xiv ( 2212710 ) writes:
        
        How do you prove Coq is correct
        You use the formal proof assistant Pusi to prove it correct. So what you do is feed Coq into Pusi, and then nine months later the proof drops out.
Formally verified system fails (Score:3, Funny)

by Anonymous Coward writes: on Monday August 24, 2015 @09:00AM (#50379207)

MIT's new "crash-proof file system" crashed today amid accusations of bugs in the formal proof verification software used to formally verify it.
MIT are now working on a formal verification of the formal verification system, in order to avoid similar formal-verification-related problems in the future.

Share
twitter facebook
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  MIT's new "crash-proof file system" crashed today amid accusations of bugs in the formal proof verification software used to formally verify it.
  So the whole thing was a bit of a Coq-up?
Linux File Systems (Score:5, Interesting)

by Anonymous Coward writes: on Monday August 24, 2015 @09:04AM (#50379235)

I find some of the current file systems to be adequately reliable. Even their performance is acceptable. But, the Linux systems are lacking.
Is there a reliable Linux file system such as EXT4 that has an easy to use copy on write(CoW) feature to allow instant recovery of any file changed at any time?
rm ./test
restore --last test ./
dd if=/dev/random of=./test bs=1M count=10
restore --datetime test ./
Novell Netware FS did all this and more in 1995 FFS! Fifteen years later and Linux doesn't seem to be able to do this. NTFS doesn't seem to be able to do this either. Yet Novell is dead?

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Well, ex-Googler Kent Overstreet recently announced a COW filesystem on lkml:
  https://lkml.org/lkml/2015/8/21/22
  Not ready for production I would say, but looks interesting.
- Re:Linux File Systems (Score:5, Interesting)
  
  by swb ( 14022 ) writes: on Monday August 24, 2015 @09:22AM (#50379377)
  
  I still think Netware's filesystem permission model was better than anything out there now, at least for filesharing.
  The feature I miss the most is allowing traversal through a directory hierarchy a user has no explicit permissions for to get to a folder they do have permissions for. I find the workarounds for this in other filesystems to be extremely ugly.
  I think NDS was better in a lot of ways than AD, although it would have been nice if there had been something better than bindery mode for eliminating the need for users to know their fully qualified NDS name.
  I also kind of wish TCP/IP had used the network:mac numbering scheme that IPX used. The rest of IPX/SPX I don't need, but there'd be no talk of address exhaustion of IPv4 if that scheme had been adopted, little need for DHCP address assignment and the addressing scheme would scale to the larger broadcast domains enabled by modern switching (avoiding the need to renumber legacy segments completely when they exhausted a /24 space and expansion via mask reduction wasn't possible due to linear numbering on adjacent segments).
  
  Parent Share
  twitter facebook
  - chmod 751 some_directory (Score:5, Informative)
    
    by tepples ( 727027 ) writes: <tepples.gmail@com> on Monday August 24, 2015 @09:57AM (#50379631) Homepage Journal
    
    The feature I miss the most is allowing traversal through a directory hierarchy a user has no explicit permissions for to get to a folder they do have permissions for. I find the workarounds for this in other filesystems to be extremely ugly.
    In POSIX, that's represented in a directory's mode bits with octal digit 1 (4: list files, 2: create or delete files, 1: traverse). What do you find ugly about mode 751 (owner create or delete, owner and group list, world traverse)?
    I also kind of wish TCP/IP had used the network:mac numbering scheme that IPX used.
    It does now. An IPv6 address is divided into a 48- or 56-bit network, a 16- or 8-bit subnet, and a 64-bit machine identifier commonly derived from the MAC.
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by Actually, I do RTFA ( 1058596 ) writes:
        
        Well, for example, I can think of a situation for: create, NOT delete, NOT modifiy, NOT read. If there is a shared area where people are putting resumes, or other submissions. You don't want htem to affect or compete with one already there. Nor read the others.
        I can see a log file you can append-only to.
        There's a lot of interesting cases if the permissions are cut fine enough.
  - Re:Linux File Systems (Score:4, Interesting)
    
    by nbvb ( 32836 ) writes: on Monday August 24, 2015 @10:47AM (#50380011) Journal
    
    Bah, forget NetWare, VINES and StreetTalk did everything you ask for and then some way before NDS was even a thought.
    VINES' ACL's were beautifully granular ...
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by arglebargle_xiv ( 2212710 ) writes:
      
      Bah, forget this newfangled crap, Multics was doing this half a century ago [multicians.org] (which includes use of RAID tape drives, checkpointing, ACLs, and other recent innovations).
  - - Re: (Score:3)
      
      by swb ( 14022 ) writes:
      
      It's a kludge, though, in NTFS, in Netware it just worked.
      IPX/SPX as a layer 3 protocol isn't what I wanted, I wanted TCPv4 with a network prefix and the MAC as the node address. Clients could derive their address automatically from network traffic by picking up the network address from the wire and they already knew their MAC address.
      Although to be honest, IPX/SPX even as a secondary protocol wasn't that bad to support in a mixed environment. We had no issues with TCP, IPX *and* CrappleTalk on 512k frame
      - Re: (Score:2)
        
        by LordLimecat ( 1103839 ) writes:
        
        How is it a kludge? You have the option in NTFS of having that everyone:traverse permission propogating through the directory heirarchy, but it isnt required. It sounds like Netware just assumes that traversal rights are implicit, where there may be scenarios where it is not desired.
      - Re: (Score:2)
        
        by psmears ( 629712 ) writes:
        
        I wanted TCPv4 with a network prefix and the MAC as the node address. Clients could derive their address automatically from network traffic by picking up the network address from the wire and they already knew their MAC address.
        You mean more or less exactly what IPv6 does (in many cases)?
  - - Re: (Score:3)
      
      by LDAPMAN ( 930041 ) writes:
      
      It still is: https://dl.netiq.com/Download?... [netiq.com]
      I build multi-million user IAM systems on it for a living.
- Re:Linux File Systems (Score:4, Interesting)
  
  by Christian Smith ( 3497 ) writes: on Monday August 24, 2015 @09:32AM (#50379453) Homepage
  
  I find some of the current file systems to be adequately reliable. Even their performance is acceptable. But, the Linux systems are lacking.
  Is there a reliable Linux file system such as EXT4 that has an easy to use copy on write(CoW) feature to allow instant recovery of any file changed at any time?
  NILFS2 [kernel.org] provides continuous point in time snapshots, which can be selectively mounted and data recovered. Not quite as instant recovery as your use case examples, but it's only a few commands/wrapper scripts away.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  nilfs2 does.
- Re: (Score:2)
  
  by vovin ( 12759 ) writes:
  
  You are looking for NILFS2.
  Of course you use more disk because you nothing gets deleted...
- Re: (Score:2)
  
  by mattack2 ( 1165421 ) writes:
  
  Novell Netware FS did all this and more in 1995 FFS! Fifteen years later ...
  I think your Novell system needs a new clock battery or something...
- - Re: (Score:3)
    
    by Bengie ( 1121981 ) writes:
    
    A true versioning file system would crumble under an IO workload with many small updates. The problem with btrfs is you can't snapshot two subvolumes in sync with eachother. Snapshots are per subvolume. In ZFS, snapshots are at the pool level, allowing volumes to be in perfect sync.
    - - Re: (Score:2)
        
        by Bengie ( 1121981 ) writes:
        
        I think you mean "volume" instead of "filesystem". All snapshots are relative to the pool. You can create a snapshot of several volumes at once in perfect sync because the snapshot is actually at the pool level, but only attached to the relative volume that you're looking at. When you have mounted volumes inside of volumes, not only will the data of the current volume be part of the snapshot, but all of the data in the mounted volumes.
        
        BTRFS stores snapshots in the volume, ZFS stores them in the pool. If y
        
        Re: (Score:2)
        
        by Bengie ( 1121981 ) writes:
        
        I already said that. The snapshot is "attached" to the dataset, but the snapshot is scoped to the pool.
        
        Re: Linux File Systems (Score:2)
        
        by bill_mcgonigle ( 4333 ) * writes:
        
        what is the command to snapshot two zvols atomically?
        
        Re: (Score:2)
        
        by Bengie ( 1121981 ) writes:
        
        If you use a ZFS mount to mount a volume under a volume, and you snapshot the parent volume, you get the child volume snapshotted atomically. All datasets in ZFS share the same uberblock, so they can all be in perfect sync.
It's a matter of distance (Score:3)

by aglider ( 2435074 ) writes: on Monday August 24, 2015 @09:06AM (#50379247) Homepage

Not a matter of proof. The distance between perfect design and buggy implementation. IMHO.

Share
twitter facebook
Journaled File System? (Score:2)

by marciot ( 598356 ) writes:

i thought Journaled file systems already possessed this feature.
- Re: (Score:2)
  
  by FranTaylor ( 164577 ) writes:
  
  i thought Journaled file systems already possessed this feature.
  just like air bags and seat belts have eliminated all deaths on the road
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Is it really THAT hard? (Score:5, Interesting)

by ledow ( 319597 ) writes: on Monday August 24, 2015 @09:13AM (#50379299) Homepage

Write zero to a flag.
Write data to temporary area.
Calculate checksum and keep with temporary area.
When write is complete, signal application.
Copy data from temporary area when convenient.
Check checksum from temporary to permanent is the same.
Mark flag when finished.
If you crash before you write the zero, you don't have anything to write anyway.
If you crash mid-write, you've not signalled the application that you've done anything anyway. And you can checksum to see if you crashed JUST BEFORE the end, or half-way through.
If you crash mid-copy, your next restart should spot the temporary area being full with a zero-flag (meaning you haven't properly written it yet). Resume from the copy stage. Checksum will double-check this for you.
If you crash post-copy, pre-flagging, you end up doing the copy twice, big deal.
If you crash post-flagging, your filesystem is consistent.
I'm sure that things like error-handling are much more complex (what if you have space for the initial copy but not the full copy? What if the device goes read-only mid-way through?) but in terms of consistency is it really all that hard?
The problem is that somewhere, somehow, applications are waiting for you to confirm the write, and you can either delay (which slows everything down), or lie (which breaks consistency). Past that, it doesn't really matter. And if you get cut-off before you can confirm the write, data will be lost EVEN ON A PERFECT FILESYSTEM. You might be filesystem-consistent, but it won't reflect everything that was written.
Journalling doesn't need to be mathematically-proven, just logically thought through. But fast journalling filesystems are damn hard, as these guys have found out.

Share
twitter facebook
- Re:Is it really THAT hard? (Score:5, Insightful)
  
  by Guspaz ( 556486 ) writes: on Monday August 24, 2015 @10:37AM (#50379929)
  
  > When write is complete, signal application.
  How do you know the write was complete? Most storage hardware lies about completing the write. The ZFS folks found this out the hard way: their filesystem was supposed to survive arbitrary power failures, and on a limited set of hardware that was true. In reality, most drives/controllers say they've committed the write to disk when it's still in their cache.
  Any filesystem that claims to survive crashes needs to take into account that any write confirmation could be a lie, and that any data it has written in the past may still be in a volatile cache.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by fnj ( 64210 ) writes:
    
    You can't expect software to paper over BUSTED HARDWARE. If a disk drive flat out lies about status, expose the goddam manufacturer and sue him out of existence. If you think anyone can paper over the scenario you just outlined, then what about this - what about a disk drive that NEVER WRITES ANYTHING but lies and says everything is going hunky dory? Pretty damn sure there's nothing you can do in that scenario.
    I've heard that story about the "drives that lie about about write-to-physical-media-complete" man
  - - Re: (Score:2)
      
      by Guspaz ( 556486 ) writes:
      
      Then the MIT paper ignores reality, and is therefore useless.
      You can prove your system is technically correct all you want, but if it doesn't work in the real world (where most hardware lies about write commits), then it's going to work in theory and fail in practice.
- If it was easy wouldn't it already be done? (Score:3)
  
  by sjbe ( 173966 ) writes:
  
  Why am I hearing Jeremy Clarkson asking "how hard can it be?" just before utterly screwing something up?
  Perhaps there is just a tad more difficulty to it than you are considering?
  - - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - He is smart but plays dumb (Score:2)
        
        by sjbe ( 173966 ) writes:
        
        I can never tell if Clarkson is a smart person pretending to be stupid, or a stupid person pretending to be smart.
        I've heard from numerous movie directors that people who play dumb characters actually have to be quite smart. Clarkson isn't stupid (generally) and remember that Top Gear is/was a scripted show and people watch it primarily because it is funny. I think I remember Conan OBrian saying that a lot of comedy is throwing out your dignity and hoping you'll get it back.
- Re: (Score:2)
  
  by ShooterNeo ( 555040 ) writes:
  
  I think you're basically right. I read about an even simpler system. They wanted to prove if a robotic surgical arm, controlled by a multi-axis joystick, would always no matter what move only when the surgeon commanded it.
  So basically, you read the joystick position sensor for an axis. You multiply by a coefficient, usually less than 1. You send that number to the arm controller. Arm controller tries to move to that position.
  Smooth, linear, no discontinuities...you technically only need to check about
- - Re: (Score:2)
    
    by ledow ( 319597 ) writes:
    
    What on earth makes you think that any algorithm, proof or technique can account for hardware failure of any kind? That's what RAID, etc. are for and are still far from a guarantee.
    Plus, kind of the point of a checksum is to ensure the integrity (to a certain probability) of data. If either the checksum or data change, they will no longer match up - short of a billions-to-one random chance that you can't do anything about anyway. Incorporate the flag into the data that you checksum and that's covered.
    You
Not exactly new or news... (Score:3)

by bradgoodman ( 964302 ) writes: on Monday August 24, 2015 @09:15AM (#50379321) Homepage

No specifics. This has been done a million times with journaling filesystem (and block layers) - no idea why this is better or different. But what about disk failure? But what about data loss? But what about (undetected) data corruption (at the disk)? What about unexpected power hits that could drop a disk or tear a write? Not even going to get into snapshotting, disaster recovery, etc. There's a lot more to this than "surviving a crash".

Share
twitter facebook
- Re: (Score:2)
  
  by FranTaylor ( 164577 ) writes:
  
  But what about disk failure?
  
  this is like expecting your seat belt to keep you safe during the apocalypse
  - Re: (Score:2)
    
    by bradgoodman ( 964302 ) writes:
    
    No. It's not. You think [your favorite bank] puts all their financial data on a plain 'old off-the-shelf [Insert brand here] and assumes that it'll all be good? They use multi-million dollar systems which do mirroring, integrity checking, verification, etc. "High-end" storage and filesystems systems do things like verification and checking at multiple levels (end-to-end, drive, block, filesystem, array, etc) so a $100 disk drive doesn't corrupt data and take down a $100 billion dollar bank. As for the apoca
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  You need to read in-between the lines.
  
  "MIT's New File System Won't Lose Data During Crashes" can be read as "MIT's New File System Won't be at fault for lost data once committed during any interruption of writes"
  
  ZFS does the same thing, minus the proofs. If you do a sync write and ZFS says it completed, then that data is not going to be lost due to any fault of ZFS. But what if someone threw all of your harddrives into lava? Again, not the fault of ZFS. Same idea.
  
  Rule of thumb, if your FS needs FSCK,
- - Re: (Score:2)
    
    by FranTaylor ( 164577 ) writes:
    
    I'm wondering if their proof requires the disk to actually write the data in the same order as the filesystem sends write commands to the disk. This is not necessarily true for disks that support command queuing, and certainly not true for SSDs.
    RTFA:
    "You literally have to consider every instruction or every disk operation"
- - Re: (Score:2)
    
    by bradgoodman ( 964302 ) writes:
    
    That's not the point. Having a machine verifiable proof of a correct filesystem which would protect itself in a "crash", is still subject to problems beneath it, or problems other than "crashes". There are other filesystems that can do this (without the "machine verifiable proof" - and they are vulnerable in real-world scenarios- not because they "lack the proof" - but because there are a zillion other weak links in the chain).
Tough environments (Score:2)

by unencode200x ( 914144 ) writes:

I've seen RAID groups fail sort of violently (granted in some tough environments) where one disk crashed and so did the others next two it. Three out of five disks in a RAID 5 gone. Only option was backup. How would any filesystem survive that?
- Re: (Score:2)
  
  by FranTaylor ( 164577 ) writes:
  
  the stars will implode and the universe will come to an end, how would any filesystem survive that?
  - Re: (Score:2)
    
    by unencode200x ( 914144 ) writes:
    
    Exactly! Seriously, I've seen this. Being in IT for almost 20 years you see edge cases, dumb things, and just plain bad luck.
- Re: (Score:2)
  
  by TheDarkMaster ( 1292526 ) writes:
  
  I've seen RAID groups fail sort of violently (granted in some tough environments)
  
  Missile hit? :-D
  - Re: (Score:2)
    
    by Guspaz ( 556486 ) writes:
    
    You joke, but I've seen IOPS drop in a RAID array because somebody was talking loudly next to the server. It was kind of fun to shout at the server and watch the disk activity drop. For testing purposes, of course.
    - - Re: (Score:2)
        
        by Guspaz ( 556486 ) writes:
        
        Pretty much that. Ours were RAID arrays on Linux, and we induced it without being quite as close as he was, but pretty much the same deal. This was more than ten years ago, so maybe modern disks would handle it better, but it looks like a few years later that guy in your video was still seeing the same behaviour.
  - Re: (Score:2)
    
    by unencode200x ( 914144 ) writes:
    
    Nothing quite that interesting. One was in a school. Their server room was near the gym. The other was in a factory. The server room was in the middle of the shop.
    - Re: (Score:2)
      
      by david_thornley ( 598059 ) writes:
      
      My wife once worked on a system that was in the next room from the skate sharpener. She'd never seen flames coming out of a hard disk before.
      - Re: (Score:2)
        
        by unencode200x ( 914144 ) writes:
        
        Wow, that would be scary.
        
        I remember one time about 10 years ago we got a handful of new HP servers in and were going through the burn-in process. Quite literally, apparently, as one of them had a RAID controller who's capacitors exploded quite violently setting off fire alarms and making us run for fire extinguishers when when we fired it up. (pun intended..).
- Re: (Score:2)
  
  by fuzzyfuzzyfungus ( 1223518 ) writes:
  
  It sounds like a quip; but it's the truth: more redundancy. Nothing is going to allow a system(whether it is all handled by the filesystem or the work is divided between a RAID controller and a filesystem) to recover a chunk of data that the universe violently removes except another copy of that data or something from which it can be deterministically inferred(even if you accept the 'random number generator and a lot of patience' mode of data recovery, you still need to know when you've recovered the correc
- Re:Tough environments (Score:4, Interesting)
  
  by mlts ( 1038732 ) writes: on Monday August 24, 2015 @11:35AM (#50380397)
  
  I personally encountered a drive array driver cause an entire array to get overwritten by garbage. I was quite glad that I had tape backups of the computers and the shared array, so a recovery was fairly easy (especially with IBM sysback.)
  Filesystems are one piece of a puzzle, but an important one. If that array decided to just write some garbage in a few sectors, almost no filesystem would notice that, allowing propagating of corrupted data to backups. The only two that might notice it would be a background ZFS task doing a scrub and noticing a 64 bit checksum is off, or ReFS doing something similar. Without RAID-Z2, the damage can't be repaired... but it can be found and relevant people notified.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  Their term of "Crash" is different than yours. You assume "crash" means the Universe self-destructed. They just assume the writes were interrupted, like power failure or your kernel locked up, not your harddrives dying.
- Re: (Score:2)
  
  by Zak3056 ( 69287 ) writes:
  
  I've seen RAID groups fail sort of violently (granted in some tough environments) where one disk crashed and so did the others next two it. Three out of five disks in a RAID 5 gone. Only option was backup. How would any filesystem survive that?
  It is not the responsibility of the file system to maintain data integrity in the face of catastrophic failure of the underlying storage hardware.
  - Re: (Score:2)
    
    by unencode200x ( 914144 ) writes:
    
    Agreed. The article does talk about catastrophic hardware failures too. Just thought it seemed a bit misleading and scant on details is all.
Two General's Problem (Score:1)

by Anonymous Coward writes:

I think any file system could be imagined as a simple case of a database system. You "commit" a file change and you must be sure that the change is written to disk before proceeding.
So, any database system has a well know logical limitation named "Two General's Problem"
https://en.wikipedia.org/wiki/Two_Generals%27_Problem
The implication of this is that in a database system you cannot guarantee a fully automated recovery; always there is a remaining possibility that some changes should be roll
- Re: (Score:2)
  
  by Guspaz ( 556486 ) writes:
  
  You can never be sure that a write is committed to disk, because most hardware lies about that.
  - Re: (Score:2)
    
    by fnj ( 64210 ) writes:
    
    You can never be sure that a write is committed to disk, because most hardware lies about that.
    I suppose you can find authoritative references for that claim, complete with manufacturer names and drive model numbers?
Code not available, will it ever be? (Score:3)

by badger.foo ( 447981 ) writes: <peter@bsdly.net> on Monday August 24, 2015 @09:31AM (#50379437) Homepage

It's now August, the conference where they'll be presenting their work is in October, and the article is a tad short on specifics. They've done a formally verified formal verification of a filesystem. if it works, that's excellent news of course, but I'd wait until we have seen the thing work and with actual code to examine before making any comments or bets on how useful this is going to be. And this being an open source-oriented site, we should be asking whether the code will indeed be available under any kind of usable open source license.

Share
twitter facebook
MIT researchers live in ivory towers (Score:5, Insightful)

by alexhs ( 877055 ) writes: on Monday August 24, 2015 @09:42AM (#50379527) Homepage Journal

Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
If the hard drive firmware is not proven, this FS won't be any better than ZFS and others.
Writing safe file systems is the easy part (even trivial using synchronous writes, when you consider their design is "slow").
The impossible part is dealing with firmwares that are known to lie (including pretending than a write is synchronous): how could you not lose data if the drive didn't ever write the data to the platters in the first place ?

Share
twitter facebook
- Re: (Score:2)
  
  by KingMotley ( 944240 ) writes:
  
  Not impossible. Tell it to write to disk, wait for it to say it has. Then cut power to the drive, wait 30 seconds, reestablish power, then ask for the data back. If it isn't the same, repeat until it is. It'll be slow, and likely kill your drives, but you can be reasonably sure the drive did indeed write the data.
This is standard for mission-critical S/W (Score:2)

by sirwired ( 27582 ) writes:

The idea of evaluating each step of the program for "what happens if this fails" is standard software engineering technique for mission-critical software. (That't not to say it's always actually done, just that it is the standard.) This method is hardly revolutionary (or even evolutionary.)
- Re: (Score:2)
  
  by FranTaylor ( 164577 ) writes:
  
  slashdot has truly degenerated into a cesspool when you consider that this drivel is one of the more insightful posts
- - Re: (Score:2)
    
    by sirwired ( 27582 ) writes:
    
    The standard doesn't have a name; it's not some useless crap in an ISO catalog, it's just a way truly mission-critical software is built.
    I know that NASA has used it for the flight-control computers for the manned space-flight program, and IBM uses it for some of the more vital parts of their mainframe operating systems.
    I didn't say it was applicable to everything, just that as a software-engineering technique, it's not at all new.
What's their point? (Score:2)

by allo ( 1728082 ) writes:

Any journalling FS should provide a consistent state, checksumming FS like BTRFS or ZFS even provable.
So they won't lose data, which is written and in a consistent state, and unwritten data cannot be saved by definition.
Yeah, right. (Score:5, Interesting)

by dargaud ( 518470 ) writes: <slashdot2 AT gdargaud DOT net> on Monday August 24, 2015 @10:58AM (#50380083) Homepage

In the words of Knuth the Great: "Beware of bugs in the above code; I have only proved it correct, not tried it."
It reminds me of a story from the late 80s (?) at a tech conference. The makers of a real-time OS with real-time snapshots would periodically pull the plug on their systems, plug it back in and it would resume exactly what it was doing, to the delight and amazement of all the techies in the assistance. In the much larger and much more expensive booth in front of them was a richer vendor. The techies started coaxing them to do the same. After much hand wringing, they did, and after a very long rebuild time the system came back as a mess. Conclusion: the 1st vendor went out of business, the 2nd one is still very big.

Share
twitter facebook
- Key Logic's KeyKOS (Score:2)
  
  by Sits ( 117492 ) writes:
  
  I've just spent the past few minutes trying to (re)find this story. Is it the KeyKOS recovery speed [eros-os.org] story from 1990?
  - Re: (Score:2)
    
    by dargaud ( 518470 ) writes:
    
    Yes, that's the story, thank you.
- - Re: (Score:2)
    
    by LWATCDR ( 28044 ) writes:
    
    That is often true.
    x86 vs the 68k, MS-DOS vs Amiga, MS-DOS vs GEMTOS. and so on.
    marketing often beats better tech.
Formal proofs of software are useless (Score:2)

by jgotts ( 2785 ) writes:

Hi, MIT guys, formal proofs of filesystems are useless because you cannot incorporate physical systems into formal proofs. Real filesystems exist on real hardware.
I guarantee that your file system will fail if I start ripping cables out. A suitably strong EMP will take it out. In fact, I bet I could nuke your filesystem if I used my ham radio transceiver too close to the device. Other things that would destroy your filesystem include floods, earthquakes, and a lightning strike.
I began writing this by statin
Hardware Reliability (Score:2)

by herbierobinson ( 183222 ) writes:

As others have pointed out, the formal verification does make the software provably reliable, but does nothing to protect against hardware issues. Just as a datapoint, the Stratus VOS operating system has been checksumming at the block driver level since the OS was written in 1980. It has detected failures on every generation of hardware it has been used with since. Some of the failures we have seen: Undetected transient media errors (the error correction/checking isn't perfect); Flaky I/O busses; bugs
- Re: (Score:2)
  
  by wonkey_monkey ( 2592601 ) writes:
  
  I don't believe this one bit.
  That's why we have checksums.
  - Re: (Score:1)
    
    by FranTaylor ( 164577 ) writes:
    
    That's why we have checksums.
    checksums will only detect a single bit error, you've got to do better than that
    - Re: (Score:2)
      
      by wonkey_monkey ( 2592601 ) writes:
      
      checksums will only detect a single bit error
      Nope.
    - Re: Yeah rigth (Score:2)
      
      by BronsCon ( 927697 ) writes:
      
      Some checkums can only be guaranteed to catch a single bit error but, in practice, it is relatively rare that multiple-bit errors "line up" just right to create a correct checksum. There are better checksums available, as well, which will catch multiple-bit errors, as well as identify which segment of the data contains the error. Someone more familiar with the topic will have to name these, as I haven't studied them in years.
  - Re: (Score:2)
    
    by WaffleMonster ( 969671 ) writes:
    
    That's why we have checksums.
    There is no guarantee checksums will detect failure there is only a probability.
    - Re: (Score:2)
      
      by wonkey_monkey ( 2592601 ) writes:
      
      There is no guarantee that a Slashdot reader will detect a joke; there is only a probability.
      Did anyone get it?
- Re: (Score:2)
  
  by Nerrd ( 1094283 ) writes:
  
  rtfa - this is clearly a discussion about software and/or power failure, not hardware.
  - Re:Define "crash". (Score:4, Insightful)
    
    by ChumpusRex2003 ( 726306 ) writes: on Monday August 24, 2015 @03:47PM (#50382881)
    
    However, considering how many firmware bugs there are in modern SSDs and HDs which cause out-of-order writes, shorn writes, and various other random corruptions and bad behaviour during a power failure or system reset, what would be more interesting is how the file system could cope with one of those.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by gweihir ( 88907 ) writes:
      
      As the Linux FS developers found some time ago, many modern disks also blatantly lie about having flushed data to disk. With software, you can then only do heuristic things to reduce the impact, not more. This "magic" MIT FS cannot do any better, and hence is entirely superfluous.
      - Comment removed (Score:4, Insightful)
        
        by account_deleted ( 4530225 ) writes: on Monday August 24, 2015 @10:57PM (#50385351)
        
        Comment removed based on user account deletion
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        I completely agree.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  Beware of bugs in the above code; I have only proved it correct, not tried it.
  
  -- Donald Knuth.
  This new filesystem sounds like another case of this...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

But is it useful? (Score:5, Interesting)

Re: (Score:2)

Re:But is it useful? (Score:5, Funny)

Re: (Score:2)

Real-time (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Well it won't lose data... (Score:2)

Re: (Score:1, Interesting)

Re: (Score:2)

Proving Coq itself (Score:1)

Re:Proving Coq itself (Score:5, Interesting)

Re: (Score:2)

Formally verified system fails (Score:3, Funny)

Re: (Score:2)

Linux File Systems (Score:5, Interesting)

Re: (Score:2, Informative)

Re:Linux File Systems (Score:5, Interesting)

chmod 751 some_directory (Score:5, Informative)

Re: (Score:2)

Re:Linux File Systems (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Linux File Systems (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: Linux File Systems (Score:2)

Re: (Score:2)

It's a matter of distance (Score:3)

Journaled File System? (Score:2)

Re: (Score:2)

Re: (Score:2)

Is it really THAT hard? (Score:5, Interesting)

Re:Is it really THAT hard? (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

If it was easy wouldn't it already be done? (Score:3)

Re: (Score:2)

He is smart but plays dumb (Score:2)

Re: (Score:2)

Re: (Score:2)

Not exactly new or news... (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Tough environments (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Tough environments (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Two General's Problem (Score:1)

Re: (Score:2)

Re: (Score:2)

Code not available, will it ever be? (Score:3)

MIT researchers live in ivory towers (Score:5, Insightful)

Re: (Score:2)

This is standard for mission-critical S/W (Score:2)

Re: (Score:2)

Re: (Score:2)