Forgot your password?
typodupeerror
Software IT

A Look at Data Compression 252

Posted by ScuttleMonkey
from the it's-backup-time dept.
With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."
This discussion has been archived. No new comments can be posted.

A Look at Data Compression

Comments Filter:
  • Speed (Score:3, Insightful)

    by mysqlrocks (783488) on Monday December 26, 2005 @02:37PM (#14340361) Homepage Journal
    No talk of the speed of compression/decompression?
    • Re:Speed (Score:5, Informative)

      by sedmonds (94908) on Monday December 26, 2005 @02:42PM (#14340397) Homepage
      Seems to be a compression speed section on page 12 - Aggregated Results. Ranging from gzip really fast, to winrk really slow.
      • Re:Speed (Score:5, Insightful)

        by Luuvitonen (896774) on Monday December 26, 2005 @03:52PM (#14340772)
        3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?
        • Re:Speed (Score:3, Interesting)

          by Karma Farmer (595141)
          3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?

          If your file starts out as 250 mb, it might be worth it. However, if you start with a 2.5 gb file, then it's almost certainly not -- especially once you take the closed-source and undocumented nature of the compression algorithm into account.
           
          /not surprisingly, the article is about 2.5 gb files
          • Re:Speed (Score:5, Interesting)

            by moro_666 (414422) <.kulminaator. .at. .gmail.com.> on Monday December 26, 2005 @05:39PM (#14341269) Homepage
            if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop.

            another case is if you only have 100 megabytes you can use and only a zzzxxxyyy archiver can compress it into the 100mb while gzip -9 leaves you with 102mb.

            so it really depends if you need it or not. sometimes you need it, mostly you don't.

            but bashing on the issue "like nobody ever needs it" is certainly wrong.
            • "if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop."

              When you're talking about data files on the order of 2.5 GB, someone is going to find ANY solution other than GPRS. When you're talking about GPRS, even transatlantic sneakernet would be faster (and cheaper).

              Plus many providers offer unlimited plans at higher monthly costs. (I know every US-based provider has unlimited data plans for under $10
              • Re:Speed (Score:3, Funny)

                by Karma Farmer (595141)
                But, if you were using mobile phones to transfer a 2.5 GB file between two seperate windows-only PCs, and you were willing to initiate a $10,000 dollar, 10 day file transfer using a proprietary windows-only compression scheme without any type of error correction or partial restart, then I agree that WinRK would be the best choice.
              • Re:Speed (Score:3, Funny)

                by Nutria (679911)
                When you're talking about GPRS, even transatlantic sneakernet would be faster (and cheaper).

                "Never underestimate the bandwidth of a stationwagon full of tapes."

                or the updated "Never underestimate the bandwidth of a 747 filled with DVDs".

                Or the even more updated "Never underestimate the bandwidth of a 747 filled with 500GB HDDs".
        • Re:Speed (Score:3, Informative)

          by Wolfrider (856)
          Yah, when I'm running backups and it has to Get Done in a reasonable amount of time with decent space savings, I use
          gzip -9. (My fastest computer is 900MHz AMD Duron.)

          For quick backups, gzip; or gzip -6.

          For REALLY quick stuff, gzip -1.

          When I want the most space saved, I (rarely) use bzip2 because rar, while useful for splitting files and retaining recovery metadata, is far too slow for my taste 99% of the time.

          Really, disk space is so cheap these days that Getting the Backup Done is more important than sav
          • If you need backup, go with bzip2. It also supports the -1 to -9 flags. AND it has error recovery, while gzip does not. One byte gone wrong and your gzip backup is toast.
    • Re:Speed (Score:3, Insightful)

      by Anonymous Coward
      No talk of the speed of compression/decompression?

      Exactly! We compress -terabytes- here at wr0k, and we use gzip for -nearly- everything (some of the older scripts use "compress", .Z, etc.)

      Why? 'cause it's fast. 20% of space just isn't worth the time needed to compress/uncompress the data. I tried to be modern (and cool) by using bzip2, yes, it's great, saves lots of space, etc., but the time required to compress/uncompress is just not worth it. ie: if you need to compress/decompress 15-20gigs per day, bzip
    • Re:Speed (Score:3, Insightful)

      by Arainach (906420)
      The Article Summary quoted is completely misleading. The most important graph is the final one on page 12, Compression Efficiency, where gzip is once again the obvious king. Sure, WinRK may be able to compress decently, but it takes an eternity to do so and is impractical for every-day use, which is where routines like gzip and ARJ32 come in - incredible compression for the speed in which they can operate. Besides - who really needs that last 54MB in these days of 4.9GB DVDs and 160GB Hard Drives?
      • Well, if you're right at the barrier of the capcity of a DVD disc, 54MB may matter.

        That said, chances are that in such situations you're just going to be better off figuring a way to span multiple DVDs, especially given that while increasing compression might be enough for you today, chances are that you're going to exceed the capacity of that single DVD soon no matter what compression technique you'll use.
  • by bigtallmofo (695287) on Monday December 26, 2005 @02:39PM (#14340369)
    For the most part, the summary of the article seems to be the more time that a compressing application takes to compress your files, the smaller your files will be after compressing.

    The one surprising thing I found in the article was that two virtually unknown contenders - WinRK and Squeez did so well. One disappointing obvious follow-up question would be how more well-known applications such as WinZip or WinRAR (which have a more mass-appeal audience) stack up against them with their configurable higher-compression options.

    • Speaking of unknown compression programs, does anyone remember OWS [faqs.org]?

      I had a good laugh at that one when I figured out how it worked, way back in the BBS days.
    • For the most part, the summary of the article seems to be the more time that a compressing application takes to compress your files, the smaller your files will be after compressing.

      Not only time, but also how much memory the algorithm uses, though the author did not mention how much space each algorithm uses. gzip, for instance, does not use much, but others, like rzip ( http://rzip.samba.org/ [samba.org]) uses alot. rzip may use up to 900MB during compression.

      I did a test with compressing a 4GB tar archive with

    • by Rich0 (548339) on Monday December 26, 2005 @03:22PM (#14340630) Homepage
      If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.

      So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.

      If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.

      Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.

      With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of .wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...
  • I always wanted to know how Compressia ( http://www.compressia.com/ [compressia.com] ) works. It uses some form of distance coding, but information about it is quite rare.
  • WinRK is excellent (Score:5, Interesting)

    by drsmack1 (698392) * on Monday December 26, 2005 @02:40PM (#14340376)
    Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.
  • Nice Comparison... (Score:5, Insightful)

    by Goo.cc (687626) * on Monday December 26, 2005 @02:43PM (#14340398)
    but I was surprised to see that the reviewer was using XP Professional Service Pack 1. I actually had to double check the review date to make sure that I wasn't reading an old article.

    I personally use 7-Zip. It doesn't perform the best but it is free software and it includes a command line component that it nice for shell scripts.
    • On UNIX systems at least the LZMA codec is excellent - it regularly achieves better ratios than bzip2, and is very fast to decompress. For many applications, decompression speed is more important than compression speed and the LZMA dictionary appears to fit inside the CPU cache, as it beats out bzip2 handily even though it's doing more work.

      There are better compressors out there, in particular PPM codecs can achieve spectacular ratios, but as they're very slow to both compress and decompress they're usefu

      • by Anonymous Coward
        I have ported ppmd to a nice pzip style utility and a pzlib style library. Find it at http://pzip.sf.net/ [sf.net]

        Speed is better than bzip2 and compression is top class, beaten only by 7zip and LZMA compresserors (which require much more speed and memory). Problem is that decompression is the same speed as the compression, unlike bzip2/gzip/zip where the decompression is much faster

        The review quoted above is totally useless because 7zip for example uses a 32Kb dictionary. Given a 200Mb dictionary it really start
  • Windows only (Score:3, Interesting)

    by Jay Maynard (54798) on Monday December 26, 2005 @02:45PM (#14340407) Homepage
    It's a real shame that 1) the guy only did Windows archivers, and 2) SBC Archiver is no longer in active development, closed source, and Windows-only.
  • Actually (Score:5, Interesting)

    by Sterling Christensen (694675) on Monday December 26, 2005 @02:45PM (#14340413)
    WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.
  • by derek_farn (689539) <derek AT knosof DOT co DOT uk> on Monday December 26, 2005 @02:46PM (#14340420) Homepage
    There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ [fit.edu] gives some impressive results, but the latest benchmark figures [maximumcompression.com] are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip [7-zip.org] seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers [freebsd.org], of which there are lots with many older formats still not supported by open source tools
    • I've tried PAQ before and it can achieve good results, especially for text, but given the extremely slow nature of the algorithm I judged it not a good enough improvement over LZMA for the autopackage installers.

      Still, worth remembering, especially as these algorithms are being improved all the time.

  • by ahziem (661857) on Monday December 26, 2005 @02:48PM (#14340432) Homepage
    A key benefit to PKZIP and tarballs formats is that they will be accessible for decades or hundreds of years. These formats are open (non-proprietary), widely implemented, and free (as in freedom) software.

    The same can't be said for WinRK. Therefore, if you plan to want access to your data for a long period of time, you should carefully consider whether the format will be accessible.
  • Unix compressors (Score:5, Interesting)

    by brejc8 (223089) * on Monday December 26, 2005 @02:48PM (#14340434) Homepage Journal
    I did a short review and benchmarking of unix compressors [brej.org] people might be interested in.
    • Thanks for this - you helped me take the plunge and updating my remote backup scripts ... They now take about 1/10th the time to transfer and space to store, all by changing gzip to 7z in 4 or 5 places!
    • What is up with this? "Also do note the lack of ace results as there are no Unix ace compressors." on your Compression times page, yet somehow you were able to do it on your Size page. Kind of funny isn't it?
      • I used winace to do the compression. The idea is to determine which format to distribute the files in and ace is still possible due to the linux decompressor.
      • No, not funny at all. Perfectly sensible.

        Size doesn't depend on the OS it was compressed on (generally - perhaps a small bit, at most). So he compressed it for size on Windows (or an OS with an ACE compressor).

        Speed, however, does depend on the OS it was compressed on. Much more than size, at any rate. So the results would have been skewed in one direction or the other, due to the OS.
        • So, it is perfectly sensible to include a non-UNIX compression utility in a UNIX compression utilities review? WinAce does have a decompressor for UNIX, but no compressor, therefore shouldn't it have been dropped completely for this review because of that? Because it is irrelevant to this review if there is no UNIX compressor for it and this is a UNIX compression review.

          Sorry for nit picking, but come on how can you go use WinACE on Windows to do the size compressions and then use all the other compressors
    • Why don't you add to the list the compressor (bzz) that ships with the djvu tools.
  • by mattkime (8466) on Monday December 26, 2005 @02:50PM (#14340446)
    Why mess around with compressing individual files? DiskDoubler is definitely the way to go. Hell, I even have it set up to automagically compress files I haven't used in a week.

    Its running perfectly fine on my Mac IIci.
  • Input type? (Score:3, Interesting)

    by reset_button (903303) on Monday December 26, 2005 @02:53PM (#14340458)
    Looks like the site got slashdotted while I was in the middle of reading it. What file types were used as input? Clearly compression algorithms differ on the file types that they work best on. Also, a better metric would probably have been space/time, rather than just using time. Also, I know that zlib, for example, allows you to choose the compression level - was this explored at all?

    Also, do any of you know any lossless algorithms for media (movies, images, music, etc)? Most algorithms perform poorly in this area, but I thought that perhaps there were some specifically designed for this.
    • Re:Input type? (Score:3, Interesting)

      by bigbigbison (104532)
      According to Maximum Compression [maximumcompression.com], which is basically the best site for compression testing, Stuffit's [stuffit.com] new version is the best for lossless jpeg compression. I've got it and I can confirm that it does a much better job on jpegs than anything else I've tried. Unfortunately, it is only effective on jpegs not gifs, pngs, or even pdfs which seem to use jpeg compression. And, outside of the mac world, it is kind of rare.
  • by canuck57 (662392) on Monday December 26, 2005 @02:54PM (#14340470)

    I generally prefer gzip/7-Zip.

    The reasoning is simple, I can use the results cross platform without special costly software. A few extra bytes of space is secondary.

    For many files, I also find buying a larger disk a cheaper option than spending hours compressing/uncompressing files. So I generally only compress files I don't think I will need that are very compressable.

    • haha, yeah, 7-zip isn't 'weird' at all. I like how you try to make it sound like it's just as pervasive as something like gzip, even though 7-zip's a pretty much unknown format.
      • Yeah, when compressing files, I'm basically limited to .zip for most people, cause WinXP will handle that. For the savvy, I might get to use .rar for a little better compression.

        Has anyone heard of WinUHA yet? That is supposed to be pretty good, and I'd not mind testing out other archivers, as long as the time savings on transferring smaller files aren't overtaken by the compression/decompression time. Though, again, all these things are useless if no one can uncompress them.
      • 7-zip is the 16th most popular download on SourceForge (8544268 downloads so far), and it gets downloaded about 18000 times per day, so it must be going somewhere in terms of popularity.
  • small mistake (Score:5, Interesting)

    by ltwally (313043) on Monday December 26, 2005 @04:01PM (#14340827) Homepage Journal
    There is a small mistake on page 3 [rojakpot.com] of the article, in the first table: WinZip no longer offers free upgrades. If you have a serial for an older version (1-9), that serial will only work on the older versions. You need a new serial for v10.0, and that serial will not work when v11.0 comes out.

    Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.

  • by Anonymous Coward
    I always compress my compressed files over and over until I achieve absolute 0Kb.
    I carry all data of my entire serverfarm like that on a 128Mb USB-stick.
  • Nothing to see here (Score:5, Informative)

    by Anonymous Coward on Monday December 26, 2005 @04:15PM (#14340892)

    I can't believe TFA made /. The only thing more defective than the benchmark data set (Hint: who cares how much a generic compressor can save on JPEGs?) is the absolutely hilarious part where the author just took "fastest" for each compressor and then tried to compare the compression. Indeed, StuffIt did what I consider the only sensible thing for "fastest" in an archiver, which is to just not even try to compress content that is unlikely to get significant savings. Oddly, the list for fastest compression is almost exactly the reverse of the list for best compression on every test. The "efficiency" is a metric that illuminates nothing. An ROC plot of rate vs compression for each test would have been a good idea; better would be to build ROC curves for each compressor, but I don't see that happening anytime soon.

    I wouldn't try to draw any conclusions from this "study". Given the methodology, I wouldn't wait with bated breath for parts two and three of the study, where the author actually promises to try to set up the compressors for reasonable compression, either.

    Ouch.

  • by bigbigbison (104532) on Monday December 26, 2005 @04:18PM (#14340900) Homepage
    Since the original site seems to be really slow and split into a billion pages, those who aren't aware of it might want to look at MaximumCompression [maximumcompression.com] since it has tests for several file formats and also has a multiple file compression test that is sorted by efficiency [maximumcompression.com]. A program called SBC [netfirms.com] does the best, but the much more common WinRAR [rarlab.com] comes in a respectable third.
  • by Karma Farmer (595141) on Monday December 26, 2005 @04:21PM (#14340918)
    The "related links" box for this story is horribly broken. Instead of being links related to the story, it's a bunch of advertising. I'm sure this was a mistake or a bug in slashcode itself.

    I've searched the FAQ, but I can't figure out how to contact slashdot admins. Does anyone know an email address or telephone number I can use to contact them about this serious problem? I'm sure they'll want to fix it as quickly as possible.
  • by Mr.Ned (79679) on Monday December 26, 2005 @04:24PM (#14340936)
    http://rzip.samba.org/ [samba.org] is a phenomenal compressor. It does much better than bzip2 or rar on large files and is open source.
  • Decompression Speed (Score:4, Interesting)

    by Hamfist (311248) on Monday December 26, 2005 @04:42PM (#14341034)
    Interesting that the article talks about compression ratio and compression speed. When considering compression, Decompression time is extremely relevant. I don't mind witing more to compress the fileset, as long as decompression is fast. I normally compress once, and then decompress various times (media files and games for example).
  • Unicode support? (Score:3, Informative)

    by icydog (923695) on Monday December 26, 2005 @05:01PM (#14341105) Homepage
    Is there any mention made about unicode support? I know that WinZip is out of the question for me because I can't compress anything with Chinese filenames with it. They'll either not work at all, or become compressed but the filenames will turn into garbage. Even though the data stays intact, it doesn't help much if it's a binary and has no intelligible filename.

    I've been using 7-Zip for this reason, and also because it compresses well while also working on Windows and Linux.
  • by Grimwiz (28623)
    A suitable level of paranoia would suggest that it would be good to decompress the compressed files and verify that they produce the identical dataset. I did not see this step in the overview.
  • by cbreaker (561297) on Monday December 26, 2005 @05:10PM (#14341141) Journal
    All I see is ads. I think I found a paragraph that looked like it may have been the article, but every other word was underlined with an ad-link so I didn't think that was it..
  • JPG compression (Score:5, Interesting)

    by The Famous Druid (89404) on Monday December 26, 2005 @05:15PM (#14341160)
    It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
    I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.
  • by EdMcMan (70171) <moo.slashdot2.z.edmcman@xoxy.net> on Monday December 26, 2005 @05:36PM (#14341252) Homepage Journal
    It's a crime that the submitter didn't mention this was with the fastest compression settings.
  • by Master of Transhuman (597628) on Monday December 26, 2005 @05:39PM (#14341263) Homepage

    Proprietary, costs money...

    I use ZipGenius - handles 20 compression formats including RAR, ACE, JAR, TAR, GZ, BZ, ARJ, CAB, LHA, LZH, RPM, 7-Zip, OpenOffice/StarOffice Zip files, UPX, tc.

    You can encrypt files with one of four algorhythms (CZIP, Blowfish, Twofish, Rijndael AES).

    If you set an antivirus path in ZipGenius options, the program will prompt you to perform an AV scan before running the selected file.

    It has an FTP client, TWAIN device image importing, file splitting, convert RAR into SFX, converts any Zip archive into an ISO image file, etc.

    And it's totally free.
  • by Dwedit (232252) on Monday December 26, 2005 @06:04PM (#14341355) Homepage
    They are testing 7-zip at the FAST setting, which does a poor job compared to the BEST setting.
    • Same for ANY other program involved.

      Not to mention that some programs differ a LOT between fasterst and slowest and some dont...

      Its just bullshit.
      Same for his example data: nearly EVERYTHING there was already compressed inside the file container... who the fuck wants to save space by compressing video or jpgs?

      A real field would be stuff where compression actually saves something, like log files. A look at maximumcompression tells me that there are programs that can compress apache logs to less than the half
  • by BigFoot48 (726201) on Monday December 26, 2005 @06:29PM (#14341476)
    While we're discussing compression and PKZip, I thought a little reminder of who started it all, and who died before his time, may be in order.

    Phillip W. Katz, better known as Phil Katz (November 3, 1962-April 14, 2000), was a computer programmer best-known as the author of PKZIP, a program for compressing files which ran under the PC operating system DOS.

    http://en.wikipedia.org/wiki/Phil_Katz [wikipedia.org]

  • by dr_skipper (581180) on Monday December 26, 2005 @08:38PM (#14342016)
    This is sad. Over and over slashdot is posting stories with nothing more than some lame tech review and dozens of ads. I really believe people are generating sites with crap technical content, packing them with ads, and submitting to slashdot hoping to win the impression/click lottery.

    Please editors, check the sites out first. If it's 90% ads and impossible to navigate without clicking ads accidentally, it's just some losers cash-grab site.
  • by hazem (472289) on Monday December 26, 2005 @10:10PM (#14342331) Journal
    I know not many of you actual RTFA, but that article was so damned annoying. There's a table in there - think it's to compare compression schemes? nope - it's for processors. There are red links.. article related? Nope - ad links. Blue underlined links - yup, for more ads.

    What a steaming pile of shit. Happy new year.

"In matters of principle, stand like a rock; in matters of taste, swim with the current." -- Thomas Jefferson

Working...