Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Software IT

A Look at Data Compression 252

With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."
This discussion has been archived. No new comments can be posted.

A Look at Data Compression

Comments Filter:
  • WinRK is excellent (Score:5, Interesting)

    by drsmack1 ( 698392 ) * on Monday December 26, 2005 @02:40PM (#14340376)
    Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.
  • Windows only (Score:3, Interesting)

    by Jay Maynard ( 54798 ) on Monday December 26, 2005 @02:45PM (#14340407) Homepage
    It's a real shame that 1) the guy only did Windows archivers, and 2) SBC Archiver is no longer in active development, closed source, and Windows-only.
  • Actually (Score:5, Interesting)

    by Sterling Christensen ( 694675 ) on Monday December 26, 2005 @02:45PM (#14340413)
    WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.
  • Unix compressors (Score:5, Interesting)

    by brejc8 ( 223089 ) * on Monday December 26, 2005 @02:48PM (#14340434) Homepage Journal
    I did a short review and benchmarking of unix compressors [brej.org] people might be interested in.
  • by mosel-saar-ruwer ( 732341 ) on Monday December 26, 2005 @02:52PM (#14340453)
    No talk of the speed of compression/decompression?

    Speed aside [and speed would be a huge concern if you insisted on compression], I just don't understand the desire for compression in the first place.

    As the administrator, your fundamental obligation is data integrity. If you compress, and if the compressed file store is damaged [especially if the header information on a compressed file - or files - is damaged], then you will tend to lose ALL of your data.

    On the other hand, if your file store is ASCII/ANSI text, then even if file headers are damaged, you can still read the raw disk sectors and recover most of your data [might take a while, but at least it's theoretically do-able].

    In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.

  • Input type? (Score:3, Interesting)

    by reset_button ( 903303 ) on Monday December 26, 2005 @02:53PM (#14340458)
    Looks like the site got slashdotted while I was in the middle of reading it. What file types were used as input? Clearly compression algorithms differ on the file types that they work best on. Also, a better metric would probably have been space/time, rather than just using time. Also, I know that zlib, for example, allows you to choose the compression level - was this explored at all?

    Also, do any of you know any lossless algorithms for media (movies, images, music, etc)? Most algorithms perform poorly in this area, but I thought that perhaps there were some specifically designed for this.
  • by Ironsides ( 739422 ) on Monday December 26, 2005 @03:03PM (#14340525) Homepage Journal
    In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.

    Because when you are storing Petabytes of information it makes a difference in cost.

    Besides, all the problems you mention with data coruption can be solved by backing up the information more than once. Anyplace that places a high value on there info is going to have multiple backups in multiple places anyways. The most usefull application of compression is in archiving old customer records. Being mostly text, you can easily get above 50% compression ratios. Also, these are going to be backed up to tape (not disk). Being able to reduce the volume of tapes being stored by 50% can save a lot of money for a large organization.
  • by undeadly ( 941339 ) on Monday December 26, 2005 @03:17PM (#14340595)
    For the most part, the summary of the article seems to be the more time that a compressing application takes to compress your files, the smaller your files will be after compressing.

    Not only time, but also how much memory the algorithm uses, though the author did not mention how much space each algorithm uses. gzip, for instance, does not use much, but others, like rzip (http://rzip.samba.org/ [samba.org]) uses alot. rzip may use up to 900MB during compression.

    I did a test with compressing a 4GB tar archive with rzip, wich result in a compressed file of 2.1 GB. gzip at max compression gave about 2.7 GB.

    So one should choose an algorithm based upon need, and of course, availability of source code. Using a propetiary, closed source compression algorithm with no open source alternative implementation is begging for trouble down the road,

  • by ArbitraryConstant ( 763964 ) on Monday December 26, 2005 @03:21PM (#14340619) Homepage
    "My concern with all the 'new' compression programs is that they, unlike Zip, haven't survived the test of time. I've recovered damaged zip archives in the past and they have come through mostly intact. I've used archive/compression like ARJ with options to be able to recover data even if there are multiple bad sectors on a harddrive or floppy disk. How many of the new compression programs have the tools available to adequately recover every possible byte of data?"

    The solution to this issue is popular on usenet, since it's common for large files to be damaged. There's a utility called par2 that allows recovery information to be sent, and it's extremely effective. It's format-neutral, but most large binaries are sent as multi-part RAR archives. par2 can handle just about any damage that occurs, up to and including missing files.

    Most of the time however, when it's simply someone downloading something it is only necessary to detect damage so they can download it again. All the formats I have experience with can detect damage, and it's common for MD5 and SHA1 sums to be sent separately anyway for security reasons.
  • by Rich0 ( 548339 ) on Monday December 26, 2005 @03:22PM (#14340630) Homepage
    If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.

    So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.

    If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.

    Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.

    With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of .wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...
  • by LWATCDR ( 28044 ) on Monday December 26, 2005 @04:00PM (#14340817) Homepage Journal
    "As the administrator, your fundamental obligation is data integrity. If you compress, and if the compressed file store is damaged [especially if the header information on a compressed file - or files - is damaged], then you will tend to lose ALL of your data."
    Not all data is stored in ASKII and or ANSI. Compressing the data can make it more secure not less.
    1. It takes up less sectors of a drive so it is less likely to get corrupt.
    2. Can contain extra data to recover from bad bits.
    3. Allows you to make redundant copies without using any more storage space.
    Let's say that you have some files that are in ASCII you want to store. Using any compression method you can probably store 3 copies of the file using the same amount of disk space.
    You are far more likely to recover a full data set from three copies of compressed file than from one copy of an uncompressed file.

    Also we do not have unlimited bandwidth and unlimted storage EVERYWHERE.Loseless video, image, and audio files take up a lot of space. For some applications MP3, Ogg, MPG, and JPEG just don't cut it.
    So yes compression still is important.
  • small mistake (Score:5, Interesting)

    by ltwally ( 313043 ) on Monday December 26, 2005 @04:01PM (#14340827) Homepage Journal
    There is a small mistake on page 3 [rojakpot.com] of the article, in the first table: WinZip no longer offers free upgrades. If you have a serial for an older version (1-9), that serial will only work on the older versions. You need a new serial for v10.0, and that serial will not work when v11.0 comes out.

    Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.

  • Re:Speed (Score:3, Interesting)

    by Karma Farmer ( 595141 ) on Monday December 26, 2005 @04:01PM (#14340828)
    3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?

    If your file starts out as 250 mb, it might be worth it. However, if you start with a 2.5 gb file, then it's almost certainly not -- especially once you take the closed-source and undocumented nature of the compression algorithm into account.
     
    /not surprisingly, the article is about 2.5 gb files
  • by Anonymous Coward on Monday December 26, 2005 @04:02PM (#14340833)
    Bah, speak for yourself. In this day when everyone has a 100 Mbit connections (at least around pretty much in this country; hint, not USA), to be honest, compressed content is actually a hassle. For instance, when I'm downloading your latest movie on DVD-R, it's usually packed in RARs, saving a few 100 Mb if that at best. But who cares? When my download speed is pretty much limited by my harddrive, I'd rather spend the extra 10 seconds to get everyone incompressed instead of having to wait 10 minutes to unpack the damn thing.
  • Re:Input type? (Score:3, Interesting)

    by bigbigbison ( 104532 ) on Monday December 26, 2005 @04:04PM (#14340844) Homepage
    According to Maximum Compression [maximumcompression.com], which is basically the best site for compression testing, Stuffit's [stuffit.com] new version is the best for lossless jpeg compression. I've got it and I can confirm that it does a much better job on jpegs than anything else I've tried. Unfortunately, it is only effective on jpegs not gifs, pngs, or even pdfs which seem to use jpeg compression. And, outside of the mac world, it is kind of rare.
  • by Mr.Ned ( 79679 ) on Monday December 26, 2005 @04:24PM (#14340936)
    http://rzip.samba.org/ [samba.org] is a phenomenal compressor. It does much better than bzip2 or rar on large files and is open source.
  • Decompression Speed (Score:4, Interesting)

    by Hamfist ( 311248 ) on Monday December 26, 2005 @04:42PM (#14341034)
    Interesting that the article talks about compression ratio and compression speed. When considering compression, Decompression time is extremely relevant. I don't mind witing more to compress the fileset, as long as decompression is fast. I normally compress once, and then decompress various times (media files and games for example).
  • JPG compression (Score:5, Interesting)

    by The Famous Druid ( 89404 ) on Monday December 26, 2005 @05:15PM (#14341160)
    It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
    I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.
  • by Master of Transhuman ( 597628 ) on Monday December 26, 2005 @05:39PM (#14341263) Homepage

    Proprietary, costs money...

    I use ZipGenius - handles 20 compression formats including RAR, ACE, JAR, TAR, GZ, BZ, ARJ, CAB, LHA, LZH, RPM, 7-Zip, OpenOffice/StarOffice Zip files, UPX, tc.

    You can encrypt files with one of four algorhythms (CZIP, Blowfish, Twofish, Rijndael AES).

    If you set an antivirus path in ZipGenius options, the program will prompt you to perform an AV scan before running the selected file.

    It has an FTP client, TWAIN device image importing, file splitting, convert RAR into SFX, converts any Zip archive into an ISO image file, etc.

    And it's totally free.
  • Re:Speed (Score:5, Interesting)

    by moro_666 ( 414422 ) <kulminaator@gmai ... Nom minus author> on Monday December 26, 2005 @05:39PM (#14341269) Homepage
    if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop.

    another case is if you only have 100 megabytes you can use and only a zzzxxxyyy archiver can compress it into the 100mb while gzip -9 leaves you with 102mb.

    so it really depends if you need it or not. sometimes you need it, mostly you don't.

    but bashing on the issue "like nobody ever needs it" is certainly wrong.
  • by BigFoot48 ( 726201 ) on Monday December 26, 2005 @06:29PM (#14341476)
    While we're discussing compression and PKZip, I thought a little reminder of who started it all, and who died before his time, may be in order.

    Phillip W. Katz, better known as Phil Katz (November 3, 1962-April 14, 2000), was a computer programmer best-known as the author of PKZIP, a program for compressing files which ran under the PC operating system DOS.

    http://en.wikipedia.org/wiki/Phil_Katz [wikipedia.org]

  • by Anonymous Coward on Monday December 26, 2005 @06:47PM (#14341549)
    I have ported ppmd to a nice pzip style utility and a pzlib style library. Find it at http://pzip.sf.net/ [sf.net]

    Speed is better than bzip2 and compression is top class, beaten only by 7zip and LZMA compresserors (which require much more speed and memory). Problem is that decompression is the same speed as the compression, unlike bzip2/gzip/zip where the decompression is much faster

    The review quoted above is totally useless because 7zip for example uses a 32Kb dictionary. Given a 200Mb dictionary it really starts to perform quite well! I would not be suprised if 7zip didn't come out the winner there given a better compression parameter.

  • by hobuddy ( 253368 ) on Monday December 26, 2005 @09:14PM (#14342135)
    7-zip is the 16th most popular download on SourceForge (8544268 downloads so far), and it gets downloaded about 18000 times per day, so it must be going somewhere in terms of popularity.
  • by chronicon ( 625367 ) on Monday December 26, 2005 @11:47PM (#14342728) Homepage
    Speaking of Comparisons (Score:-1, Redundant)

    I knew I had seen this story before but it wasn't here. This article was up on Digg three days ago [digg.com]--with only three Diggs to it's name (at the time of this writing), but it's front page news here? Interesting to say the least...

    I predict that this Digg [digg.com] will become frontpage Slashdot news shortly. It was quite popular (914 diggs so far) and it's hit the three-day mark...

    I know, this is all so OT, but it's no worse then whining about duplicate postings here...

    Oh the irony here is just too much to take without laughing! My comment gets hammered with the REDUNDANT pummel when I point out that /. is being REDUNDANT in posting old Diggs? Man, it just doesn't get any better then this to make a point.

    Moderators: did you catch the not-so-subtle play I made here by quoting ALL of my original message? In case you didn't, I'm beinging REDUNDANTLY sarcastic...

    Enjoy!

"More software projects have gone awry for lack of calendar time than for all other causes combined." -- Fred Brooks, Jr., _The Mythical Man Month_

Working...