Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Unix Operating Systems Software Technology

Large File Problems in Modern Unices 290

david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
This discussion has been archived. No new comments can be posted.

Large File Problems in Modern Unices

Comments Filter:
  • Re:Why large files (Score:5, Insightful)

    by Big Mark ( 575945 ) on Sunday January 26, 2003 @11:01AM (#5161571)
    Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.

    And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.

    -Mark
  • Re:Why large files (Score:2, Insightful)

    by Ogion ( 541941 ) on Sunday January 26, 2003 @11:01AM (#5161572)
    Ever heard of something like movie-editing? You can get huge files really fast.
  • by CrudPuppy ( 33870 ) on Sunday January 26, 2003 @11:04AM (#5161591) Homepage
    my data warehouse at work is 600GB and grows at a rate of 4GB per day.

    the production database that drives the sites is like 100GB

    welcome to last week. 2GB is tiny.
  • by cheekyboy ( 598084 ) on Sunday January 26, 2003 @11:05AM (#5161596) Homepage Journal
    I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig

    I can just laugh at them now...

  • by hector13 ( 628823 ) on Sunday January 26, 2003 @11:07AM (#5161608)
    my data warehouse at work is 600GB and grows at a rate of 4GB per day. the production database that drives the sites is like 100GB welcome to last week. 2GB is tiny.
    And you store this "production database" as one file? didn't think so (or atleast I hope you don't).

    I am not agreeing (or disagreeing) with the original post, but having a database > 2 GB has nothing to do with having a single file over 2 GB. A db != a file system (except for MySQL perhaps).

  • Re:Why large files (Score:3, Insightful)

    by Idaho ( 12907 ) on Sunday January 26, 2003 @11:08AM (#5161612)
    Can anyone give a good reason for needing files larger than 2gb?

    I can think of some:

    • A/V streaming/timeshifting
    • Backups of large filesystems (since there exist 320 GB harddisks now, I don't think I should create 160 .tgz files just to back it up, do I?)
    • Large databases. E.g. the slashdot posts table will be easily >2 GB, or so I'd guess. Should the DB cut it in two (or more) files, just...because the OS doesn't understand files >2 GB? I don't think so...

    And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.

  • by KDan ( 90353 ) on Sunday January 26, 2003 @11:17AM (#5161660) Homepage
    Two words:

    Video Editing

    Daniel
  • by alen ( 225700 ) on Sunday January 26, 2003 @11:19AM (#5161672)
    On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
  • by Anonymous Coward on Sunday January 26, 2003 @11:29AM (#5161713)
    Many large-scale computing projects easily generate hundreds of gigabytes and even terabytes of data. They are writing to RAID systems and even parallel file systems to improve their IO.

    Think beyond the little toy that you use. These projects are using Unix (Solaris, Linux, BSD and even MacOSX) on clusters of hundreds or thousands of nodes.
  • by Anonymous Coward on Sunday January 26, 2003 @11:30AM (#5161720)
    the use of large files tempts users to store all kinds of redundant, reducible, linear and irrelevant data wasting storage space and I/O time

    As opposed to a million 4k files that are each 1k of header?
  • by cvande ( 150483 ) <craig.vandeputteNO@SPAMgmail.com> on Sunday January 26, 2003 @11:30AM (#5161722)
    In a world everything is small and manageable. Unfortunately, some databases need tables BIGGER than 2gb. Even splitting that table into multiple files still finds you with files larger than two gb. Try adding more tables? OK. Now they've grown to over 2gb and the more tables the more complicated everthing gets. I still need to back these suckers up and a backup vendor that I won't name can't help me because their software wasn't large file (for Linux) ready. So let's get into the game with this and make it the default so we don't need to worry about these problems in the future. Linux IS an enterprise solution.....(my $.02)
  • Re:Why large files (Score:2, Insightful)

    by benevold ( 589793 ) on Sunday January 26, 2003 @11:31AM (#5161723) Homepage Journal
    We use a Unidata database here for an ERP system, each database is more than 2gb a piece (more like 20 gb) of relatively small files, when the directories are tarred for backup reasons they are usually over 2gb which means that gzip won't compress them. Unless I'm missing something I don't see an alternative for files large than 2gb in this case. Sure on the personal computing level the closest thing you probably get is ripping DVD's but there are other things out there, and I realize this is tiny in comparison to some places.
  • Re:Why large files (Score:4, Insightful)

    by kasperd ( 592156 ) on Sunday January 26, 2003 @11:35AM (#5161742) Homepage Journal
    The seek times alone withinr these files must be huge

    Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.

    If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.
  • by costas ( 38724 ) on Sunday January 26, 2003 @11:42AM (#5161774) Homepage
    Maybe in your problem domain that's true. I work with retailer data mines and we've hit the 2GB file limit, oh, 4-5 yrs ago? We've been forced to partition databases causing maintainance issues, scalability issues, and the like, just because of the size of a B-tree index.

    True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).

    I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.

    Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.
  • Re:MOD UP (Score:2, Insightful)

    by DAldredge ( 2353 ) <SlashdotEmail@GMail.Com> on Sunday January 26, 2003 @11:54AM (#5161854) Journal
    Thats not why he wanted you to call him. If he answered your questions via email there would have been a record of what he had said.
  • by kasperd ( 592156 ) on Sunday January 26, 2003 @12:06PM (#5161910) Homepage Journal
    2GB in a year or less.

    They probably don't write emails but instead write Word documents and attach them to empty emails.
  • by n3m6 ( 101260 ) <abdulla.faraz@NOspam.gmail.com> on Sunday January 26, 2003 @12:11PM (#5161936) Homepage Journal
    whenever something like this comes up. somebody just has to say "we dont' have a problem, we use X"

    that's just so lame. we have XFS and JFS. you can keep your AIX and your expensive hardware with you.

    thanks.
  • by OS24Ever ( 245667 ) <trekkie@nomorestars.com> on Sunday January 26, 2003 @12:18PM (#5161970) Homepage Journal
    There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

    it either

    A) Wastes Memory Space
    B) Wastes Code Space
    C) Wastes Pointer Space
    D) Or Violates some other tenant the programmer believes

    So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.

    And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.

    Then, one comes along and blows right through it.

    I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:

    A) Database Servers
    B) Video Streaming Servers
    C) Video Editing Workstations
    D) Photo Editing Workstations
    E) Next Big Thing (tm) that hasn't come out yet.

  • by Anonymous Coward on Sunday January 26, 2003 @04:02PM (#5163187)
    > limits on things so high that you can never reach them in the practical world.

    The 2 GByte limit came from a time when 14 inch disks held 30 MByte and disk space and RAM was too precious to waste an extra 32 bits when these would always be all zero for the forseeable futute.

    The concept of a hard drive that was as large as 2 GByte was just silly - it would fill the whole computer room, and in any case this is a limit on each file, not on the file system.

  • by dvdeug ( 5033 ) <dvdeug&email,ro> on Sunday January 26, 2003 @10:10PM (#5164744)
    There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

    We have code for infinite precision integers. The problem is, if it were used for filesystem code, you still couldn't do real-time video or DVD burning, because the computer would be spending too long handling infinite precision integers.

    As long as you're careful with it, setting a "really huge" number, and fixing it when you reach that limit is usually good enough.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...