Exhaustive Data Compressor Comparison - Slashdot

Please create an account to participate in the Slashdot moderation system

×

Exhaustive Data Compressor Comparison 305

Posted by kdawson on Sunday April 22, 2007 @10:10PM from the pick-one-smaller-or-faster dept.

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."

This discussion has been archived. No new comments can be posted.

Exhaustive Data Compressor Comparison

Search 305 Comments Log In/Create an Account

Comments Filter:

Re:Screw speed, size reduction: gimme compatibilit (Score:2, Interesting)

by NMerriam ( 15122 ) writes: <NMerriam@artboy.org> on Sunday April 22, 2007 @10:51PM (#18836511) Homepage

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.

I have to admit I switched over/back to ZIP about a year ago for everything for exactly this reason. yeah, it meant a lot of my old archives increased in size (sometimes by quite a bit), but knowing that anything anywhere can read the archive makes up for it. ZIP creation and decoding is supported natively by Mac and Windows and most Linux distros right from the GUI, so it makes it brain-dead simple to deal with.

Parent Share
twitter facebook
Pizzachish: setting a new standard in languages (Score:2, Interesting)

by pizzach ( 1011925 ) writes: <pizzachNO@SPAMgmail.com> on Sunday April 22, 2007 @11:24PM (#18836693) Homepage

I have been thinking about creating a new language with about 60 or so words. The idea is that you don't need a lot of words when you can figure out the meaning by context. Strong points are that the language would be very easy to pick up, and you would get that invigorating feeling of talking like a primitive cave man.

As an example of the concept, we have the words walk and run. They are a bit too similar to be worth wasting one of our precious few 60 words. Effectively, one could be dropped with have the other taking on a broader meaning without any real repercussions. The words sit and shit are also fairly similar. When you have a guest over, you can say something like, "Please, shit down." Because of context, it would be all okay. Just remember, there is a difference between shitting on the toilet and shitting in the toilet.

Parent Share
twitter facebook
Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)

by trytoguess ( 875793 ) writes: on Sunday April 22, 2007 @11:25PM (#18836701)

Er... did ya check out the comparisons? As you can see here here [techarp.com] jpeg at least can be compressed considerably with Stuffit. According to this [maximumcompression.com] the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.

Parent Share
twitter facebook
Re:Skip the blogspam (Score:3, Interesting)

by _|()|\| ( 159991 ) writes: on Monday April 23, 2007 @12:18AM (#18836979)
After scanning MaximumCompression's results [maximumcompression.com] (sorted by compression time) the last time one of these data compression articles hit Slashdot, I gained a newfound appreciation for ZIP and gzip:
- they compress significantly better than any of the faster (and relatively obscure) programs
- the programs that compress significantly better take more than twice as long
- they're at the front of the pack for decompression time
If you have a hard limit, like a single CD or DVD, then the extra time is worth it. Otherwise, look no further than the ubiquitous ZIP.
Parent Share
twitter facebook
Agreed completely. (Score:5, Interesting)

by Kadin2048 ( 468275 ) writes: <slashdot.kadin@xox y . net> on Monday April 23, 2007 @12:28AM (#18837027) Homepage Journal

Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.

Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.

Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.

It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).

Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.

Parent Share
twitter facebook
Re:duh (Score:5, Interesting)

by Firethorn ( 177587 ) writes: on Monday April 23, 2007 @12:37AM (#18837079) Homepage Journal

Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.

Missing part 3 of 10? No problem!

Of course, I'm a holder of a license for Rar from way back when. I like it.

Parent Share
twitter facebook
Re:Pizzachish: setting a new standard in languages (Score:3, Interesting)

by wall0159 ( 881759 ) writes: on Monday April 23, 2007 @12:38AM (#18837089)

you might be interested in this:

http://www.tokipona.org/ [tokipona.org]

Parent Share
twitter facebook
Didn't have Tridge's rzip... (Score:3, Interesting)

by agristin ( 750854 ) writes: on Monday April 23, 2007 @01:06AM (#18837191) Journal

Andrew Tridgell's rzip wasn't on there either.

http://samba.org/junkcode/ [samba.org]

Tridge is one of the smart guys behind samba. And rzip is pretty clever for certain things. Just ask google.

Share
twitter facebook
Re:What's the point of compressing JPEG,MP3,DivX e (Score:5, Interesting)

by kyz ( 225372 ) writes: on Monday April 23, 2007 @06:36AM (#18838493) Homepage

While the main thrust of JPEG is to do "lossy" compression, the final stage of creating a JPEG is to do lossless compression on the data. There are two different official methods you can use: Huffman Coding and Arithmetic Coding.

Both methods do the same thing: they statistically analyse all the data, then re-encode it so the most common values are encoded in a smaller way than the least common values.

Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.

However, Huffman is NOT patented, while most forms of arithmetic coding, including the one used in the JPEG standard, ARE patented. The authors of Stuffit did nothing special - they just paid the patent fee. Now they just unpack the Huffman-encoded JPEG data and re-encode it with arithmetic coding. If you take some JPEGs that are already compressed with arithmetic coding, Stuffit can do nothing to make them better. But 99.9% of JPEGs are Huffman coded, because it would be extortionately expensive for, say, a digital camera manufacturer, to get a JPEG arithmetic coding patent license.

So Stuffit doesn't have remarkable code, they just paid money to get better compression that 99.9% of people specifically avoid because they don't think it's worth the money.

Parent Share
twitter facebook
What are they actually measuring? (Score:3, Interesting)

by Marcion ( 876801 ) writes: on Monday April 23, 2007 @07:55AM (#18838779) Homepage Journal

The article seems to be measuring the compression speed of each program with its native algorithm, it would have been better to do a set of programs with each algorithm first. As the article is comparing two variables at once, how good the algorithm is and how good the implementation in that program is, the results are slightly meaningless.

Having said that, do I really care in practice that much about if algorithm A is 5% faster than algorithm B? I personally do not, I care if the person receiving them can open them. So the second problem with the article is that it is one computer user on his own, in the real world you would just distribute .zip and .tar.gz because you know people will be able to open them. Proprietary algorithm X may be really efficient but if no one can open it, who cares?

Parent Share
twitter facebook
LZMA is used in 7-zip (Score:1, Interesting)

by Anonymous Coward writes: on Monday April 23, 2007 @12:35PM (#18841809)

Its true that LZMA often flies below the radar and not many people are aware of it (just try Googling for it, or looking for research papers about it--there's not much).

However, it is the algorithm used in 7-Zip. It is represented in this test.

Speaking as a person with interest in 64K intros, LZMA is an awesome, awesome algorithm if you need fast decompression and *small decompression code*. A carefully hand-tuned implementation of an LZMA decompressor would be less than 2K of assembly code, and could perhaps be crammed into 1K by a sufficiently clever hacker. This is an order of magnitude smaller than most algorithms that can give comparable compression performance.

The high compression of LZMA comes from combining two basic, well-proven compression ideas: sliding dictionaries i.e. LZ77/78, and markov models (i.e. the thing used by every compression algorithm that uses an arithmetic encoder or similar order-0 entropy coder as its last stage). LZMA is awesome because the contexts used in its model are segregated according to what the bits are used for. Folding that knowledge right into the model results in a simple but very effective compression scheme.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

692 commentsMen Overran a Job Fair For Women In Tech
512 commentsGoogle Workers Protest Cloud Contract With Israel's Government
472 commentsHow Electric Cars are Already Upending America
446 commentsInternet Access in Gaza is Collapsing as ISPs Fall Offline
424 commentsConservatives Bombarded With Facebook Misinformation Far More Than Liberals In 2020 Election, Study Suggests

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker