Large File Problems in Modern Unices 290
david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Not really that groundbreaking... (Score:5, Interesting)
Re:Not really that groundbreaking... (Score:3, Funny)
17,179,869,184 gigabytes ought to be enough for ANYBODY!
--jeff++
Its funny how some lamers dont listen... (Score:3, Insightful)
I can just laugh at them now...
Re:Its funny how some lamers dont listen... (Score:2)
640 K ought to be enough for anybody (Score:3, Funny)
It will happen with time_t, too (Score:5, Informative)
However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.
I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."
Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.
Let's just get it over with.
Only 35 years... (Score:2)
Prediction: First distro to "suck it up" will be around 2035 or so. Personally, I think this is so far down on the priority list as you can get. Besides, with open source, is there really that problematic to grep the source for "time_t" and fix it? I don't think so.
Kjella
Re:Only 35 years... (Score:3, Informative)
than to change the typedef that defines __time_t
in bits/types.h.
For stupidly written programs that assume the
size of __time_t or that use __time_t in unions,
each will need to be addressed individually to
make sure things still work correctly.
Re:Only 35 years... (Score:2)
The FreeBSD folks have already done a considerable amount of work on this, even to the point of making time_t 64 bits for both kernel and userland and testing for issues. Enough is known that the main worry now is how to handle the change in ports, some of which need a fair amount of work to move away from 32-bit time_t. But at the rate things are going, I'd expect that they will make the transition to 64-bit time_t for FreeBSD 6.0. I've no idea how they will handle the legacy issues (ports and pre-6.0 binaries) though.
Re:It will happen with time_t, too (Score:2)
It boggles my mind that Sun, for example, went to the trouble of building a whole host of interfaces and a porting process for 64-bit file offsets (see the lf64 and lfcompile64 manpages on Solaris) and yet they didn't bother to increase the size of time_t at the same time. If everyone is going to be recompiling their apps anyway, why not fix it all in one go?
On the application side, it should be noted that this isn't a problem for code written in Java, whose equivalent of time_t is already 64-bit (in milliseconds, granted, but that only eats about 10 of the extra 32 bits.) Obviously the Java VM won't be able to make up for the underlying OS not supporting large time values, but at least the applications won't have to change.
First one to start whining about Java's year-584544016 problem gets whacked with a wet noodle.
Re:Needs to be signed... (Score:3, Informative)
Wrong. The C99 standard says in section 6.3.1.8 paragraph 1:
Here, the common real type is unsigned int, and the description of the addition and subtraction operators (section 6.5.6) does not specify a different type for the result when both operands have arithmetic type.
If you disagree, please cite relevant parts of the standard to support your case.
A woman's perspective . . . (Score:5, Funny)
I replied, "Sweetie, I married you for your trust fund not your cluster size."
Funny...in AIX... (Score:4, Informative)
Re:Funny...in AIX... (Score:3, Insightful)
that's just so lame. we have XFS and JFS. you can keep your AIX and your expensive hardware with you.
thanks.
Have you ever seen some people's email? (Score:5, Insightful)
Re:Have you ever seen some people's email? (Score:5, Funny)
Re:Have you ever seen some people's email? (Score:2, Informative)
Re:Have you ever seen some people's email? (Score:2)
Re:Have you ever seen some people's email? (Score:3, Insightful)
They probably don't write emails but instead write Word documents and attach them to empty emails.
Re:Have you ever seen some people's email? (Score:2)
Just saying, is all.
Re:MOD UP (Score:2, Insightful)
Re:RTFPP (Score:2)
Switch to gnu/hurd (Score:3, Funny)
Re:Switch to gnu/hurd (Score:2)
Why not to learn from past? (Score:2)
There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.
Just a crazy idea
Re:Why not to learn from past? (Score:2)
the problem is where its sticking at .
It's all about efficiency. (Score:3, Insightful)
it either
A) Wastes Memory Space
B) Wastes Code Space
C) Wastes Pointer Space
D) Or Violates some other tenant the programmer believes
So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.
And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.
Then, one comes along and blows right through it.
I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:
A) Database Servers
B) Video Streaming Servers
C) Video Editing Workstations
D) Photo Editing Workstations
E) Next Big Thing (tm) that hasn't come out yet.
Re:It's all about efficiency. (Score:3, Insightful)
We have code for infinite precision integers. The problem is, if it were used for filesystem code, you still couldn't do real-time video or DVD burning, because the computer would be spending too long handling infinite precision integers.
As long as you're careful with it, setting a "really huge" number, and fixing it when you reach that limit is usually good enough.
The O/S should do it and do it well. (Score:3, Interesting)
2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.
3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!
BeOS Filesystem (Score:2)
Re:BeOS Filesystem (Score:5, Informative)
Linux XFS [sgi.com]: 9 exabytes
Also supports extended attributes [bestbits.at].
Somewhat cumbersome, even on Linux (Score:2, Informative)
-D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE
This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2
Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.
Error Prevention (Score:3, Interesting)
The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.
That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.
Re:Error Prevention (Score:2)
Digital took a bug report on this for Vax/VMS and promised a fix, some time in a future release.
Re:Error Prevention (Score:3, Interesting)
Start with 64-bit, but make it 63-bit. If the 64th bit is on, then there's another 64-bit value following which is prepended to the value (making it a 126-bit address -- again, reserve one bit for another 64-bit descriptor).
Chances are it won't ever need the additional descriptors since 64-bits is a lot, but it would solve the problem once-and-for-all.
I can't believe this...superSynchronicity??? (Score:3, Interesting)
I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...
How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit
Re:I can't believe this...superSynchronicity??? (Score:2)
It allows you to push NFS exports to the kernel and nfsd without having to edit
I believe Solaris has this same problem with share though. I don't remember these days, it's been a while since my SCSA cert. (Heh, i guess that's what man pages are for
Re:I can't believe this...superSynchronicity??? (Score:2)
And no, Solaris doesn't have this kind of problem. In Solaris, you have (a more general)
Re:I can't believe this...superSynchronicity??? (Score:2)
Admittedly, I had problems with the need for... (Score:2)
Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).
When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.
I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.
Large filesystem lack more of a problem (Score:3, Interesting)
Many servers now have the physical capacity of over 2TB on a filesystem storage device.
Unfortunately this is still a very significant limitation.
This problem is much more commonly encountered than file size limitations.
I miss BeFS... (Score:2)
*SOB*
J.
The "l" in lseek() (Score:4, Informative)
Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)
When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).
Re:Why large files (Score:3, Funny)
Re:Why large files (Score:3, Funny)
640K, on the other hand, should be enough for anyone...
-Mark
Re:Why large files (Score:2)
data warehouse, and any database for that matter (Score:5, Insightful)
the production database that drives the sites is like 100GB
welcome to last week. 2GB is tiny.
Re:data warehouse, and any database for that matte (Score:2, Insightful)
I am not agreeing (or disagreeing) with the original post, but having a database > 2 GB has nothing to do with having a single file over 2 GB. A db != a file system (except for MySQL perhaps).
Re:data warehouse, and any database for that matte (Score:2, Informative)
row partitions (Score:2)
However, I would recommend to stay away from > 2GB files in database environment. Even if your FS supports large files, you still loose performance on "double-driver": first your kernel provedes a partition, than it provides a file-system over it. But if you need so big files, why would you need file-system? Just use row partitions!
Of course you still need large files for video, but massive concurrent preformance overhead is not a typical problem in such case.
Re:row partitions (Score:2)
Now it's possible that somehow you have a very good knowledge of your application-specific disk usage pattern and can get a speed up that outweighs user-mode overhead, system swapping your buffers in and out of memory and so on. In this case, you better use a dedicated disk rather than just a partitition. Otherwise, your I/O scheduling code will have interesting interactions with system's swapfile and other normal filesystem activity.
Even then you run a risk that OS code will one day improve and outperform your homegrown changes. Most programmers are better off just tuning their code to work well with OS native filesystem, virtual memory and so on.
video, mp3's, even dvds are beyond 2gb (Score:2, Informative)
Re:Why large files (Score:3, Informative)
Re:Why large files (Score:5, Insightful)
And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.
-Mark
Yep... (Score:3, Informative)
NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
PAL/YUV2/stereo: ~125gb for same
HTDV/surround: ~908gb for same
With huffyuv (very low CPU usage, lossless) you should be able to cut that by a factor of 2-3. But it's still *huge*
Kjella
PAL & NTSC (Score:3, Informative)
NTSC: Max 640x480x29.97fps interlaced (60 Hz)
No, the don't have same frequency, nor scanlines. Some european TVs will take PAL-60, like PAL only at 60Hz though. Also I don't think the color space works in the same way, but not sure about that one. That was why I used YUV2 (16bit) for both.
Kjella
Re:Why large files (Score:2, Insightful)
Re:Why large files (Score:5, Interesting)
Re:Why large files (Score:3, Interesting)
Re:Why large files (Score:4, Informative)
vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.
Feels like 64kbyte segments somehow...and I really don't want to have those back.
64KB memory segments (Score:2)
Oh the fond memories
Daniel
Re:Why large files (Score:2)
Re:Why large files (Score:3, Insightful)
I can think of some:
And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.
Re: (Score:2)
Q: Why large files? A: Disk images too (Score:2, Interesting)
I have most all of my older system images available to inspect. The loopback devices under Linux are tailor made for this type of thing.
I am puzzled as to why you mention the seek times. Surely you would agree that the seek time should be only inversely geometrically related to size, the particular factors depending on the filesystem. Any deviation from the theoretical ideal is the fault of a particular OS's implementation. My experience is that this is not significant.
(user dmanny on wife's machine, ergo posting as AC)
Re:Why large files (Score:3, Interesting)
Can anyone give a good reason for needing files larger than 2gb?
Forensic analysis of disk images. And yes, from experience I can tell you that half the file tools on RedHat (like, say, Perl) aren't compiled to support >2GB files.
Re:Why large files (Score:2, Insightful)
Re:Why large files (Score:2)
Re:Why large files (Score:4, Insightful)
Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.
If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.
Re:Why large files (Score:2)
Anyway those using a M$ OS which does not support NTFS are fooling themselves. If you are using some form of windows prior to Windows 2000, then you are getting a terrible experience which is nothing like the real OS -- NT. NTFS is a pretty good filesystem with journaling, ACLs, and implicit support for encryption and compression. Fat32 is shite.
Re:Why large files (Score:2)
Re:Why large files (Score:2)
Yes. Sometimes you need to store a lot of data. Even DVD's has 4.3 GB of data these days. But that's not even much compared to the amount of data we handle in seismic research. I would believe astronomists, particle physicists and a lots of other people also routinely handle ridiculous amounts of data.
By the way, in producing the DVD, you would naturally work with uncompressed data. How would you handle that?
The seek times alone withinr these files must be huge, and it smacks a bit of inefficienecy
And because it is inefficient, we should not support it? As a matter of fact, any file larger than one disk-block is inefficient. Maybe we should stop supporting that as well?
sure its just as bad to have an app use hundreds of say 4kb files or so, but two GIGABYTES???
As I've said, it's not really that much, depending on the application.
Re:Why large files (Score:3, Interesting)
The real issue we ran up against was compression... we wanted to have the original and interm data files available on-disk for awhile in case of reprocessing. The processing would generally take up 10x as much space as the original data file, so you compressed everything. Except that gzip can't handle files >2GB (at the time an alpha could, but we didn't want to touch it). Nor can zip. So we had to use compress. Yay. (bzip could handle it, but was decided against by the powers that be).
Compression of large files is still an issue, unless you want to split them up. Unless you download a beta version gzip still can't handle it. As I understand it zip won't ever be able to do it. There are some fringe compressors that can handle large files, but, well, they're fringe.
Re:gzip handles large files fine (Score:2)
Re:Why large files (Score:2)
Re:Why large files (Score:2)
The database files themselves, in the system.
Re:Why large files (Score:2)
A better question is, Who doesn't need largefile support?
As for the seek time...not everything is accessed like a random access file. I imagine that the backup data will be read in sequentially. The video file would mostly be handed sequentially other than when jumping to a chapter fast forwarding or reversing.
Re:Why large files (Score:2)
Video/movie files, for one thing. Even compressed (eg DV or MPEG) those things are huge. A 2 GB file at professional DV compression (50 Mb/sec) is about 4 minutes worth. (DV is similar to MJPEG, so it's still lossy. Uncompressed or unlossy compressed video (critical for machine vision or image analysis apps) chews even more space.
I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.
Other fields also use huge data sets - seismic data analysis for example. Filesystems designed for supercomputer clusters (eg PVFS) have unlimited size on the total filesystem (tens of terabytes is not unusual) although the individual file size may still be limited by the underlying OS or hardware word size.
Then there's creating a
Re:Why large files (Score:2)
Depends on how your inodes are laid out, how big you have to get for triple indirect blocks, etc.
Shouldn't be any worse (and maybe better) than trying to seek through an equivalent collection of smaller files -- you've got to do all those directory searches, etc. (Exact comparisons will depend greatly on the filesystem and parameters chosen when the FS was created.)
Re:Why large files (Score:2)
making a movie will be larger then that.
I guess a lot of the editing would probably be done scen by scene, and then you could on the fly merge and compress them so that at no point you use more then 2gb, but it seems that if you make a 2 hour dvd it would be nice to keep the 4gb image file on your hardrive if you planned to reburn it.
Not a scattering of scenes that it would recreate the image on the fly.
It is kind of a dumb question when we have computers being marketed as home dvd makers why would be need that big of a file.
Re:Why large files (Score:5, Interesting)
Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?
Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?
Right...
Wrong. (Score:2)
Re:Unices? (Score:3, Informative)
Re:Unices? (Score:2)
Getting back on topic, maybe the plural for Unix should be Unixen, like the plural for Vax is Vaxen?
Re:Wrong point of view. (Score:5, Insightful)
Video Editing
Daniel
Cripes! (Score:2)
I didn't realize Daniel was so big, though.
Has he considered going lossy?
three words (Score:2)
Re:Wrong point of view. (Score:5, Funny)
Re:Wrong point of view. (Score:5, Insightful)
Re:Wrong point of view. (Score:5, Insightful)
True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).
I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.
Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.
Please mod parent up. (Score:2)
Re:Wrong point of view. (Score:5, Interesting)
> fragmentation: large files increase to fracmentation of most file systems
What kind of fragmentation?
Small files lead to more internal fragmentation.
Large files are more likely to consist of more fragments, but when splitting this data into small files, those files are fragments of the same data.
>entropy pollution
What kind of entropy? Are you speaking of compression algorithms?
Compression ratios are actually better with large files than small files, because similarities between files across file-boundaries can be found. Therefor, gzip(bzip2) compresses a single large tar-file. (Simple test, try zip on many files and then zip without compression and subsequent compression on the resulting file).
>data pollution
How should limiting file size improve that situation? Then, people tend to store data in lot of small files. What a success. People will waste space, whether there is a file size limit or not.
>These limits are there for very good reasons and in my opinion they are even much to big.
Actually, they are there for historical reasons.
And should a DB spread all its tables over thousands of files instead of having only one table in one file and mmapping this single file into memory? Should a raw video stream be fragmented into several files to circumvent a file limit?
>[...] original K&R Unix [...] was much faster than modern systems
Faster? In what respect?
Re:Wrong point of view. (Score:3, Interesting)
Sure spliting data into a lot of smaller files is going to reduce the fragmentation slightly, but it is not going to improve your performance. Because the price of accessing different files is going to be higher than the price of the fragmentation.
In the next two arguments you managed to make two opposite statements both incorrect. That is actually quite impressive.
First you say large files increase the entropy of the data stored on the disk. Which is wrong as long as you compare to the same data stored in diffeerent files. Of course if the number of files on the disk is constant smaller files will lead to less entropy, but most people actually want to store some data on their disks.
Then you say large files are highly redundant, which is the opposite of having a large entropy as claimed in your previous argument. And in reality the redundancy does not tend to increase with filesize, but might of course depend on the format of the file.
All in all you are saying that people shouldn't store many data on their disks, and the little data they do store should be as compact as possible, while still allowing it to be compressed even further when doing backups. You might as well have said people shouldn't use their disks at all.
Finally claiming older Unix versions were faster is ridiculous, first of all they ran on different hardware. And surely on that hardware they were slower than todays systems. And even if you managed to port an ancient Unix version to modern hardware, I'm sure it wouldn't beat modern systems in todays tasks. Which DVD player would you suggest for K&R Unix?
Re:Wrong point of view. (Score:2)
You are a troll. It is not up to administrators to decide how big a file needs to be. I do scientific research and deal regularly with datasets larger than 300GB. Single files often in the range of 2GB-10GB. For me to split up my data would create an enormous headache, and would be very slow.
-Sean
Re:Wrong point of view. (Score:2)
Is it just me, or is Slashdot getting much less informed as the user count continues to increase ?
Re:Wrong point of view. (Score:2)
It's not just you.
Re:Wrong point of view. (Score:2)
We have conquered this problem before, by redesigning filesystems to allow files bigger than segments, and we can conquer it again by allowing files bigger than the addressable range of a 32-bit processor's full word.
Re:huh? (Score:2, Informative)
"It is an interesting problem that some distro-compilers have to face."
talks about the problem facing distro compilers, whereas
"It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Talks about the article adressing these problems.
Re:huh? (Score:2, Interesting)
"of the kinds" really adds nothing to the meaning here, nor does "have to"
Thus we have:
The same sentence, but much cleaner!
Thanks! I'll be here all week.
Re:How large are we talking? (Score:2)
Hmm, the programmers seemed to store the information in an int, so by allocating 2 MB of memory (through Matlab, zeros(10000,10000) is quite a chunk), I could finally convince the application that I did not have negative memory, but actually enough to display the movie.
But then the video was lame.