Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software IT

Exhaustive Data Compressor Comparison 305

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."
This discussion has been archived. No new comments can be posted.

Exhaustive Data Compressor Comparison

Comments Filter:
  • by xxxJonBoyxxx ( 565205 ) on Sunday April 22, 2007 @10:17PM (#18836243)
    Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.
  • Re:duh (Score:3, Insightful)

    by timeOday ( 582209 ) on Sunday April 22, 2007 @10:17PM (#18836259)
    So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.
  • Not really (Score:4, Insightful)

    by Toe, The ( 545098 ) on Sunday April 22, 2007 @10:19PM (#18836265)
    Not every software achieves maximum efficiency. It is perfectly imaginable that a compressor could be slow and bad. It is nice to see that these compressors did not suffer that fate.
  • by Anonymous Coward on Sunday April 22, 2007 @10:23PM (#18836291)
    These two formats are still widely used out there, and why are we compressing MP3's?
  • by Nogami_Saeko ( 466595 ) on Sunday April 22, 2007 @10:24PM (#18836305)
    Nice comparison, but there's really only two that matter (at least on PCs):

    ZIP for cross-platform compatibility (and for simplicity for less technically-minded users).

    RAR for everything else (at 3rd in their "efficiency" list, it's easy to see why it's so popular, not to mention ease of use for splitting archives, etc).
  • Poor article. (Score:5, Insightful)

    by FellowConspirator ( 882908 ) on Sunday April 22, 2007 @10:24PM (#18836307)
    This is a poor article on several points. First, the entropy of the data in the files isn't quantified. Second, the strategy used for compression isn't described at all. If WinRK compresses so well on very high entropy data, there must be some filetype specific strategies used.

    Versions of the programs aren't given, nor the compile-time options (for the open source ones).

    Finally, Windows Vista isn't a suitable platform for conducting the tests. Most of these tools target WinXP in their current versions and changes to Vista introduced systematic differences in very basic things like memory usage, file I/O properties, etc.

    The idea of the article is fine, it's just that the analysis is half-baked.
  • Re:duh (Score:2, Insightful)

    by kabeer ( 695592 ) on Sunday April 22, 2007 @10:26PM (#18836313)
    Compressing the article into that statement would technically be classed as a lossy compression e.g. jpeg.
  • by mochan_s ( 536939 ) on Sunday April 22, 2007 @10:26PM (#18836315)

    What's the point of compressing JPEG,MP3,DivX etc since they already do the compression? The streams are close to random (with max information) and all you could compress would be the headers between blocks in movies or the ID3 tag in MP3.

  • 7zip (Score:5, Insightful)

    by Lehk228 ( 705449 ) on Sunday April 22, 2007 @10:33PM (#18836357) Journal
    7-zip cribsheet:

    weak on retarded things to zip like WAV files (use FLAC) mp3's, jpegs and divx movies.

    7zip does quite well in documents (2nd) and ebooks (2nd) 3rd on MPEG video, 2nd in PSD

    also i expect 7zip will improve in higher end compressions settings, when possible i give it hundreds of megs and unlike commercial apps 7zip can be configured well into the "insane" range
  • by Repton ( 60818 ) on Sunday April 22, 2007 @11:09PM (#18836609) Homepage

    See also: the Archive Comparison Test [compression.ca]. Covers 162 different archivers over a bunch of different file types.

    It hasn't been updated in a while (5 years), but have the algorithms in popular use changed much? I remember caring about compression algorithms when I was downloading stuff from BBSs at 2400 baud, or trading software with friends on 3.5" floppies. But in these days of broadband, cheap writable CDs, and USB storage, does anyone care about squeezing the last few bytes out of an archive? zip/gzip/bzip2 are good enough for most people for most uses.

  • Re:Poor article. (Score:5, Insightful)

    by RedWizzard ( 192002 ) on Sunday April 22, 2007 @11:10PM (#18836615)
    I've got some more issues with the article. They didn't test filesystem compression. This would have been interesting to me because often the choice I make is not between different archivers, but between using an archiver or just compressing the directory with NTFS' native compression.

    They also focused on compression rate when I believe they should have focused on decompression rate. I'll probably only archive something once, but I may read from the archive dozens of times. What matters to me is the trade-off between space saved and extra time taken to read the data, not the one-off cost of compressing it.

  • by athakur999 ( 44340 ) on Sunday April 22, 2007 @11:53PM (#18836831) Journal
    Even it the amount of additional compression is insignificant, ZIP, RAR, etc. are still very useful as container formats for MP3, JPG, etc. files since it's easier to distribute 1 or 2 .ZIP files than it is 1000 individual .JPG files. And if you're going to package up a bunch of files into a single file for distribution, why not use the opportunity to save a few kilobytes here and there if it doesn't require much more time to do that?

  • Re:duh (Score:2, Insightful)

    by cbreaker ( 561297 ) on Monday April 23, 2007 @12:09AM (#18836937) Journal
    Hell yea. Although ARJ had slightly better compression, it allowed for *gasp* two files in the archive to be named the same!

    Now a days it's all RAR for the Usenet and Torrents and such. RAR is really great but it's piss slow compressing anything. It's just so easy to make multipart archives with it.

    I really wish Stuffit would go away ..
       
  • by slim ( 1652 ) <john.hartnup@net> on Monday April 23, 2007 @12:22AM (#18836997) Homepage

    "the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding."
    Whew, that makes me feel a bit dirty: detecting a file format an applying special rules. It's a bit like firewalls stepping out of their network-layer remit to mess about with application-layer protocols (e.g. to make FTP work over NAT).

    Still, in both cases, it works; who can argue with that.
  • by maxume ( 22995 ) on Monday April 23, 2007 @12:38AM (#18837085)
    7zip, and thus any format it supports, is as reliable as sourceforge. It's not a guarantee, but it isn't exactly 'you never know' territory either.
  • UHA (Score:3, Insightful)

    by dj245 ( 732906 ) on Monday April 23, 2007 @12:39AM (#18837101) Homepage
    Theres also another, rather uncommon format that wasn't tested that is somewhat important. UHARC- File extension UHA. It is dog slow, but offers better compression than probably any of the others. It is still used by software pirates with their custom install scripts, and I have seen it in official software install routines as well.

    You can keep Rar and zip and toss out the others, but the UHA extension (or a dummy extension) will probably exist on your computer at some point in time.
  • by Jeff DeMaagd ( 2015 ) on Monday April 23, 2007 @12:44AM (#18837125) Homepage Journal
    RAR irritates me though. It's rare enough that I usually have to dig up a decompresser for it and install it special for just one file and then I never use it again. I just don't like having to deal with files that require me to install new software just so I can use that one file. In that vein, I really don't think the article is relevant. I certainly won't use novelty file formats unless it looks like it has "legs". It's not like I want to make a file that becomes useless when the maintainer of the decompression utility loses interest and it goes away.
  • Re:duh (Score:3, Insightful)

    by AncientPC ( 951874 ) on Monday April 23, 2007 @01:04AM (#18837185)
    They do test between different comparison levels. [techarp.com] The problem is they haven't posted any of the results yet which makes this article incomplete and useless.
  • by Spikeles ( 972972 ) on Monday April 23, 2007 @01:19AM (#18837281)

    maximumcompression.com is an excellent site but it just compares compression ratio, not speed. Hence for some people, it's of limited use.
    See this page? http://www.maximumcompression.com/data/summary_mf. php [maximumcompression.com]
    What are the headers along the top? let's see..

    Pos, Program, Switches used, TAR, Compressed, Compression, Comp time, Decomp time, Efficiency


    OMG!.. is that a "time".. as in speed column i see there?
  • by sofar ( 317980 ) on Monday April 23, 2007 @02:18AM (#18837555) Homepage

    The article conveniently forgets to mention whether the conpression tools are cross-platform (OSX, Linux, BSD) and/or open source or not.

    That makes a lot of them utterly useless for lots of people. Yet another windows-focussed review, bah.
  • by rdebath ( 884132 ) on Monday April 23, 2007 @03:38AM (#18837889)
    Because NTFS filesystem compression is horrible.
    It has poor compression and slows down the filesystem viciously, mostly due to fragmentation; I've see 200000 fragments in a single file!
    I think the compression algoritim it uses is ZLW, you're lucky to get 1.5:1 in the best cases.

    There are other issues, like a 20Gb compressed file giving fake disk errors (on a drive with 40Gb of free space) but generally the poor compression and performance is enough to ensure that you don't want to use it.

  • SMP hardware? (Score:3, Insightful)

    by MrNemesis ( 587188 ) on Monday April 23, 2007 @06:16AM (#18838415) Homepage Journal
    I only skimmed the article but what with all the hullabaloo about dual/quad core chips, why didn't they use "exhaustive" as an excuse to check out the parallelisability (if that's a word) of each compression algorithm? IIRC they didn't list the hardware they used or any of the switches they used, which is a glaring omission in my book.

    Of all the main compression utils I use, 7-zip, RAR and bzip2 (in the form of pbzip2) all have modes that will utilise multiple chips, often giving a pretty huge speedup in compression times. I'm not aware of any SMP branches for gzip/zlib but seeing as it appears to be the most efficient compressor by miles it might not even need it ;)

    It's mainly academic for me now though anyway, since almost all of the compression I use is inline anyway, either through rsync or SSH (or both). Not sure if any inline compressors are using LZMA yet, but the only time I find myself making an archive is for emailing someone with file size limits on their mail server. All of the stuff I have at home is stored uncompressed because a) 90% of it is already highly compressed and b) I'd rather buy slightly bigger hard drives that attempt to recover a corrupted archive a year or so down the line. Mostly I'm just concerned about decompression time these days.
  • by ror ( 1068652 ) on Monday April 23, 2007 @07:07AM (#18838631)
    In it's efficiency graphs they order the negative scoring ratios wrong! Afterall, they considering something that adds 1MB in 2 seconds to be worse than one that increases the size by 1MB in 2 minutes. So doing the same thing *slower* actually ranks it ABOVE the other one. Plus, what matters, even for large files, is NOT the time for compression. What you REALLY want to compare is the ratio and the time for EXTRACTION on those settings. Any file will be compressed once, decompressed thousands of times. A minute longer to produce means little. A minute longer to extract for everyone extracting it matters a lot.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...