Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Unix Operating Systems Software Technology

Large File Problems in Modern Unices 290

david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
This discussion has been archived. No new comments can be posted.

Large File Problems in Modern Unices

Comments Filter:
  • by xintegerx ( 557455 ) on Sunday January 26, 2003 @10:59AM (#5161563) Homepage
    Question answered, move along, nothing to see here :)
  • Re:Why large files (Score:3, Informative)

    by voodoopriestess ( 569912 ) on Sunday January 26, 2003 @11:01AM (#5161570) Homepage
    Databases, Movie files, Backup files (think dumps to tapes). Animations, 3D modelling.... Lots of things need a > 2GB file size. Iain
  • Re:Why large files (Score:4, Informative)

    by hbackert ( 45117 ) on Sunday January 26, 2003 @11:02AM (#5161586) Homepage

    vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.

    Feels like 64kbyte segments somehow...and I really don't want to have those back.

  • by wowbagger ( 69688 ) on Sunday January 26, 2003 @11:11AM (#5161629) Homepage Journal
    We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)

    However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.

    I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."

    Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.

    Let's just get it over with.
  • Re:Unices? (Score:3, Informative)

    by moonbender ( 547943 ) <moonbender AT gmail DOT com> on Sunday January 26, 2003 @11:13AM (#5161640)
    Yes. Just like "matrices" is the plural of "matrix". Not that the words have a similar etymology - according to dictionary.com [reference.com] it's, in the authors' words, "A weak pun on Multics".
  • Funny...in AIX... (Score:4, Informative)

    by cshuttle ( 613776 ) on Sunday January 26, 2003 @11:18AM (#5161665)
    We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html
  • Re:huh? (Score:2, Informative)

    by JanneM ( 7445 ) on Sunday January 26, 2003 @11:21AM (#5161685) Homepage
    Because the sentences mean different things.

    "It is an interesting problem that some distro-compilers have to face."

    talks about the problem facing distro compilers, whereas

    "It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."

    Talks about the article adressing these problems. /Janne
  • by CrudPuppy ( 33870 ) on Sunday January 26, 2003 @11:42AM (#5161776) Homepage
    the datafile size averages 8GB in the warehouse.
  • by sqrlbait5 ( 67782 ) on Sunday January 26, 2003 @11:46AM (#5161810) Homepage
    Yeah, but if you're using NTFS, where there doesn't appear to be a max file size, you still get the 2GB limit on Outlook files. Every damn version of Outlook has had this 2GB limit, but OutlookXP doesn't actually fix the problem, just warns the user at 1.87GB. We have people hitting their limit all the time at work, but that's because they like to send artwork and whatnot and not clear out their folders.
  • by topologist ( 644470 ) on Sunday January 26, 2003 @11:58AM (#5161871)
    To enable LFS (Large File Support) in glibc (which not all filesystems support), you need to recompile your application with
    -D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE

    This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2

    Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.

  • Yep... (Score:3, Informative)

    by Kjella ( 173770 ) on Sunday January 26, 2003 @12:06PM (#5161905) Homepage
    Some numbers for *uncompressed* video:

    NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
    PAL/YUV2/stereo: ~125gb for same

    HTDV/surround: ~908gb for same

    With huffyuv (very low CPU usage, lossless) you should be able to cut that by a factor of 2-3. But it's still *huge*

    Kjella
  • Re:BeOS Filesystem (Score:5, Informative)

    by Yokaze ( 70883 ) on Sunday January 26, 2003 @12:06PM (#5161912)
    Mine is bigger than yours :)

    Linux XFS [sgi.com]: 9 exabytes

    Also supports extended attributes [bestbits.at].

  • by Anonymous Coward on Sunday January 26, 2003 @12:17PM (#5161964)
    Other filesystems don't either :

    http://www.sgi.com/software/xfs/techinfo.html

    "Max. File Size
    Designed to scale to 9 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 2 TB Max File Size. Solaris and Windows NT undergoing scalability testing"

    "Max. File System Size
    Designed to scale to 18 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 500 file systems of 2 TB each. Solaris and Windows NT undergoing scalability testing."

    Unfortunately, it's not just a problem with the filesystem, but also and most often a problem with the applications. So, AIX does have this problem just as much as any other. Unless you've tested all the applications available for AIX.
  • PAL & NTSC (Score:3, Informative)

    by Kjella ( 173770 ) on Sunday January 26, 2003 @12:30PM (#5162059) Homepage
    PAL: Max 720x576x25fps interlaced (50 Hz)
    NTSC: Max 640x480x29.97fps interlaced (60 Hz)

    No, the don't have same frequency, nor scanlines. Some european TVs will take PAL-60, like PAL only at 60Hz though. Also I don't think the color space works in the same way, but not sure about that one. That was why I used YUV2 (16bit) for both.

    Kjella
  • Re:Only 35 years... (Score:3, Informative)

    by Dan Ost ( 415913 ) on Sunday January 26, 2003 @01:26PM (#5162337)
    For most programs, it would require little more
    than to change the typedef that defines __time_t
    in bits/types.h.

    For stupidly written programs that assume the
    size of __time_t or that use __time_t in unions,
    each will need to be addressed individually to
    make sure things still work correctly.
  • The "l" in lseek() (Score:4, Informative)

    by edhall ( 10025 ) <slashdot@weirdnoise.com> on Sunday January 26, 2003 @03:26PM (#5163004) Homepage

    Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)

    When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).

    -Ed
  • by Anonymous Coward on Sunday January 26, 2003 @05:05PM (#5163448)
    What happens when you need more than 2^64 bytes storage? Cheat with granularity? The same problem still exists and isn't solved. Your train of thought is the same which allowed 32-bits to be used in the first place. Recurssive expansion would be the only real solution.
  • by Ben Hutchings ( 4651 ) on Sunday January 26, 2003 @09:29PM (#5164597) Homepage
    No, the type of time_t - time_t must be signed. That doesn't imply that time_t must be signed. For example, (unsigned int) - (unsigned int) is int, not unsigned int.

    Wrong. The C99 standard says in section 6.3.1.8 paragraph 1:

    Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a
    common real type for the operands and result. For the specified operands, each operand is converted, without change of type domain, to a type whose corresponding real type is the common real type. Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result, whose type domain is the type domain of the operands if they are the same, and complex otherwise.

    Here, the common real type is unsigned int, and the description of the addition and subtraction operators (section 6.5.6) does not specify a different type for the result when both operands have arithmetic type.

    If you disagree, please cite relevant parts of the standard to support your case.

  • by mauriceh ( 3721 ) <mhilarius@@@gmail...com> on Monday January 27, 2003 @08:38AM (#5166665)
    Roughly 50% of of the servers we build at present have over 1TB of storage.
    Roughly 30% have over 2TB.

    With a 3Ware 7500-12 IDE RAID card and 11x200GB disks we hit 2.1TB.

    This costs about $6,000 in a server, so is a fairly popular option.

    Next month Maxtor ships their 300GB drives (MAYBE, Maxtor have been lying about their release schedules lately). Once that happens, it will be a very common problem.

The one day you'd sell your soul for something, souls are a glut.

Working...