Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Unix Operating Systems Software Technology

Large File Problems in Modern Unices 290

david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
This discussion has been archived. No new comments can be posted.

Large File Problems in Modern Unices

Comments Filter:
  • by CoolVibe ( 11466 ) on Sunday January 26, 2003 @10:59AM (#5161560) Journal
    The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.
  • by cheekyboy ( 598084 ) on Sunday January 26, 2003 @11:05AM (#5161596) Homepage Journal
    I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig

    I can just laugh at them now...

  • by cyber_rigger ( 527103 ) on Sunday January 26, 2003 @11:10AM (#5161622) Homepage Journal
    --Bill Gates
  • by wowbagger ( 69688 ) on Sunday January 26, 2003 @11:11AM (#5161629) Homepage Journal
    We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)

    However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.

    I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."

    Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.

    Let's just get it over with.
    • And that big y2k problem that was supposed to bring down mankind? How many years did it take to fix that? I very much doubt we started in 1965 ;)

      Prediction: First distro to "suck it up" will be around 2035 or so. Personally, I think this is so far down on the priority list as you can get. Besides, with open source, is there really that problematic to grep the source for "time_t" and fix it? I don't think so.

      • Re:Only 35 years... (Score:3, Informative)

        by Dan Ost ( 415913 )
        For most programs, it would require little more
        than to change the typedef that defines __time_t
        in bits/types.h.

        For stupidly written programs that assume the
        size of __time_t or that use __time_t in unions,
        each will need to be addressed individually to
        make sure things still work correctly.
      • The FreeBSD folks have already done a considerable amount of work on this, even to the point of making time_t 64 bits for both kernel and userland and testing for issues. Enough is known that the main worry now is how to handle the change in ports, some of which need a fair amount of work to move away from 32-bit time_t. But at the rate things are going, I'd expect that they will make the transition to 64-bit time_t for FreeBSD 6.0. I've no idea how they will handle the legacy issues (ports and pre-6.0 binaries) though.

    • First of all, it's a Y2038 problem rather than a Y2106 problem because time_t is signed in many places. Simply switching to an unsigned time_t (who uses time_t to represent pre-1970 values?) will buy us an extra 68 years with minimal application grief, but the underlying problem will still be there.

      It boggles my mind that Sun, for example, went to the trouble of building a whole host of interfaces and a porting process for 64-bit file offsets (see the lf64 and lfcompile64 manpages on Solaris) and yet they didn't bother to increase the size of time_t at the same time. If everyone is going to be recompiling their apps anyway, why not fix it all in one go?

      On the application side, it should be noted that this isn't a problem for code written in Java, whose equivalent of time_t is already 64-bit (in milliseconds, granted, but that only eats about 10 of the extra 32 bits.) Obviously the Java VM won't be able to make up for the underlying OS not supporting large time values, but at least the applications won't have to change.

      First one to start whining about Java's year-584544016 problem gets whacked with a wet noodle.

  • by pariahdecss ( 534450 ) on Sunday January 26, 2003 @11:12AM (#5161633)
    So my wife says to me, "Honey, do I look fat in this filesystem ?"
    I replied, "Sweetie, I married you for your trust fund not your cluster size."
  • AIX... (Score:4, Informative)

    by cshuttle ( 613776 ) on Sunday January 26, 2003 @11:18AM (#5161665)
    We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present
    • by n3m6 ( 101260 )
      whenever something like this comes up. somebody just has to say "we dont' have a problem, we use X"

      that's just so lame. we have XFS and JFS. you can keep your AIX and your expensive hardware with you.

  • by alen ( 225700 ) on Sunday January 26, 2003 @11:19AM (#5161672)
    On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
  • by Anonymous Coward on Sunday January 26, 2003 @11:22AM (#5161690)
    It has a nice small 1gb filesystem limit. I have partitioned my hard disk in to 64 little chunks and it runs very slowly, and unstabilly, but its completley open source and im happy.
  • I just wonder why we don't learn from past (limits) and remove this limits "forever". E.g. 1 month ago I recieved question of possibility building 10 TB Linux cluster (physics are crazy ;-)).

    There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.

    Just a crazy idea ;-)
    • there is no spoon and there is always a limit.

      the problem is where its sticking at . ;)
    • There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

      it either

      A) Wastes Memory Space
      B) Wastes Code Space
      C) Wastes Pointer Space
      D) Or Violates some other tenant the programmer believes

      So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.

      And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.

      Then, one comes along and blows right through it.

      I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:

      A) Database Servers
      B) Video Streaming Servers
      C) Video Editing Workstations
      D) Photo Editing Workstations
      E) Next Big Thing (tm) that hasn't come out yet.

      • There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

        We have code for infinite precision integers. The problem is, if it were used for filesystem code, you still couldn't do real-time video or DVD burning, because the computer would be spending too long handling infinite precision integers.

        As long as you're careful with it, setting a "really huge" number, and fixing it when you reach that limit is usually good enough.
  • 1) Splitting up a big file turns an elegant solution into a an inelegant nightmare.

    2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.

    3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!

  • I remember reading in the BeOS Bible that the BeOS filesystem could contain files as large as 18 petabytes. Makes you wonder two things: What's the biggest filesystem that you could use with a BeOS machine? and Why don't other OSs have filesystem like this. Espcecially with those awesome extended attributes. I weep for the loss of the BeOS filesystem...
  • To enable LFS (Large File Support) in glibc (which not all filesystems support), you need to recompile your application with

    This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2

    Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.

  • Error Prevention (Score:3, Interesting)

    by Veteran ( 203989 ) on Sunday January 26, 2003 @12:13PM (#5161941)
    One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.

    The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.

    That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.
    • The next significant problem with time will come in the year 9999, when the four digit field that lazy programmers have used for thousands of years overflows. Didn't they learn their lessons the first time around?

      Digital took a bug report on this for Vax/VMS and promised a fix, some time in a future release.
    • Re:Error Prevention (Score:3, Interesting)

      by Thing 1 ( 178996 )
      One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.
      Anyone ever thought of a variable-bit filesystem?

      Start with 64-bit, but make it 63-bit. If the 64th bit is on, then there's another 64-bit value following which is prepended to the value (making it a 126-bit address -- again, reserve one bit for another 64-bit descriptor).

      Chances are it won't ever need the additional descriptors since 64-bits is a lot, but it would solve the problem once-and-for-all.

  • by haggar ( 72771 ) on Sunday January 26, 2003 @12:23PM (#5162003) Homepage Journal
    I had a problem with HP-UX apparently not wanting to transfer via NFS (when the NFS server is on HP-UX 11.0) files larger than 2GB. I had to backup a Solaris computer's hard disk using DD across NFS. This usually worked when the NFS server is Solaris. However, last friday it failed, when the server was setup on HP-UX. I had to resort to my little Blade 100 as the NFS server, and I had no problems with it.

    I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...

    How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit /etc/exports and then restart NFS daemon (or send SIGHUP)?
    • the command that is equivalent to 'share' is 'exportfs', it can usually be found in /usr/sbin/.

      It allows you to push NFS exports to the kernel and nfsd without having to edit /etc/exports. Thus, they do not persist across reboots. However, you cannot use exportfs until nfsd is running, and nfsd will auto kill itself if /etc/exports is completely empty. So you must share at least 1 directory tree in /etc/exports before you can use exportfs.

      I believe Solaris has this same problem with share though. I don't remember these days, it's been a while since my SCSA cert. (Heh, i guess that's what man pages are for :)
      • Thanks.
        And no, Solaris doesn't have this kind of problem. In Solaris, you have (a more general) /etc/dfs/* for sharing filesystems. Even if there is no fs shared in /etc/dfs/dfstab, nfsd and mountd will happily run. This autokill thing is really stupid.
      • Oh yeah, so how does Linux cope with > 2 GB files transferred via NFS TO a Linux server? So far, only Solaris seem to support our solution. I have not tried Linux because the test takes some relatively considerable time, and if large files aren't supported to be transferred via NFS, I better not even try.
  • ... 64-bit addressing before thinking this through. I couldn't see the significant advantage for more than a very tiny fraction of apps in being able to address more than a few gigabytes.

    Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).

    When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.

    I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.
  • by mauriceh ( 3721 ) <`maurice' `at' `'> on Sunday January 26, 2003 @01:45PM (#5162431) Homepage
    A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
    Many servers now have the physical capacity of over 2TB on a filesystem storage device.
    Unfortunately this is still a very significant limitation.
    This problem is much more commonly encountered than file size limitations.
  • 18 EXAbytes file sizes, real journals, life queries...
  • The "l" in lseek() (Score:4, Informative)

    by edhall ( 10025 ) <> on Sunday January 26, 2003 @03:26PM (#5163004) Homepage

    Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)

    When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).


Today is a good day for information-gathering. Read someone else's mail file.