Forgot your password?
typodupeerror
Google Graphics Open Source Software Technology

Google Publishes Zopfli, an Open-Source Compression Library 124

Posted by Soulskill
from the use-it-on-the-interweb dept.
alphadogg writes "Google is open-sourcing a new general purpose data compression library called Zopfli that can be used to speed up Web downloads. The Zopfli Compression Algorithm, which got its name from a Swiss bread recipe, is an implementation of the Deflate compression algorithm that creates a smaller output size (PDF) compared to previous techniques, wrote Lode Vandevenne, a software engineer with Google's Compression Team, on the Google Open Source Blog on Thursday. 'The smaller compressed size allows for better space utilization, faster data transmission, and lower Web page load latencies. Furthermore, the smaller compressed size has additional benefits in mobile use, such as lower data transfer fees and reduced battery use,' Vandevenne wrote. The more exhaustive compression techniques achieve higher data density, but also make the compression a lot slower. This does not affect the decompression speed though, Vandenne wrote."
This discussion has been archived. No new comments can be posted.

Google Publishes Zopfli, an Open-Source Compression Library

Comments Filter:
  • Overhyped (Score:4, Informative)

    by Anonymous Coward on Friday March 01, 2013 @05:00PM (#43049073)

    This team is clearly just trying to make a name for themselves. It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress.

    Their underlying implemented might be cool research. But it's practical merit is virtually nil.

    Now, cue the people who are going to do some basic arithmetic to "prove" me wrong, yet who probably don't even bother using gzip content-encoding on their website right now, anyhow.

    • by Anonymous Coward

      Actually, they state that the 3-8% better maximum compression than zlib is 2-3 orders of magnitude longer to compress.

      I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

      • Re:Overhyped (Score:5, Insightful)

        by Baloroth (2370816) on Friday March 01, 2013 @05:12PM (#43049201)

        Actually, they state that the 3-8% better maximum compression than zlib is 2-3 orders of magnitude longer to compress.

        I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

        Static content that only has to be compressed once, yet is downloaded hundreds of thousands or millions of times. 3-8% is a pretty significant savings in that case.

        • Re:Overhyped (Score:5, Informative)

          by Trepidity (597) <delirium-slashdot AT hackish DOT org> on Friday March 01, 2013 @05:25PM (#43049327)

          One example that comes to mind: Android APKs use the zip format.

        • by Nyder (754090)

          Actually, they state that the 3-8% better maximum compression than zlib is 2-3 orders of magnitude longer to compress.

          I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

          Static content that only has to be compressed once, yet is downloaded hundreds of thousands or millions of times. 3-8% is a pretty significant savings in that case.

          Word, when I'm downloading the latest pirated release of a 1080p movie, or some big ass game, that 3% will save the host a lot of bandwidth, which is good.

          Of course, with limited download amounts, that will even be better.

          Chances of this being used? Probably only for game "rips", everything else, nill.

          • Re:Overhyped (Score:4, Interesting)

            by SuricouRaven (1897204) on Friday March 01, 2013 @06:42PM (#43050087)

            Wrong field. For general-purpose compression formats, rar is already far more capable than this, and 7z is better still. But neither of these are suitable for webbrowsers to transparently decompress - there, gzip and DEFLATE still reigns supreme. Zopfil is backwards-compatible: Browsers that support gzip/DEFLATE will work with it, no updates required.

            Personally I think Google should have worked on increasing the number of decompressors browsers support - bzip would be nice, at least. The Accept-Encoding negotiation is already there, very easy to extend. But this will have to do.

          • Re:Overhyped (Score:5, Insightful)

            by citizenr (871508) on Friday March 01, 2013 @06:58PM (#43050283) Homepage

            Word, when I'm downloading the latest pirated release of a 1080p movie

            "word", and intend to download zipped h.264 files leads me to believe you are retarded.

        • Re:Overhyped (Score:4, Interesting)

          by Pausanias (681077) <pausaniasx@gmail ... inus threevowels> on Friday March 01, 2013 @06:38PM (#43050033)

          The numbers cited are for gzip. The improvement over 7-zip is much less than 3%; it's more like 1%, at the cost of a factor of four slowdown with respect to 7-zip. Note that this is for 7-zip when restricted to deflate-compatible formats only.

          Here's the paper:
          https://code.google.com/p/zopfli/downloads/list [google.com]

      • by DragonWriter (970822) on Friday March 01, 2013 @05:22PM (#43049289)

        I can't imagine what kind of content you're hosting that'd justify 3 orders of magnitude compression time to gain 3% compression.

        In anything that is static enough that it will be downloaded many times in its lifetime, and not time sensitive enough that it needs to be instantly available when generated, very small gains in compression efficiency are worth paying very large prices in compression.

        If you, for just one of many Google-relevant examples, host a fair number of popular JavaScript libraries [google.com] (used on both your own sites -- among the most popular in the world -- and vast numbers of third party sites that use your hosted versions) and commit, once you have accept a particular stable version of a library, to hosting it indefinitely, you've got a bunch of assets that are going to be static for a very long time, and accessed very large numbers of times. One time cost to compress is going to be dwarfed by even a miniscule savings in transfer costs for those.

        • by zachie (2491880)
          Popular Youtube videos. Or dropbox -- their main costs are probably in bandwidth and storage, and they can push the (de)compression tasks to clients so it's free lunch.
      • by Goaway (82658)

        It's too bad they didn't publish any kind of article that would explain to you what kind of content would benefit.

      • All the PNG icons and graphics on your website perhaps?

    • Re:Overhyped (Score:5, Interesting)

      by TeknoHog (164938) on Friday March 01, 2013 @05:11PM (#43049183) Homepage Journal
      If I understand this correctly, the point is to be compatible with zlib decompression. Obviously, you can bet much better compression with xz/lzma, for example, but that would be out of range for most browsers.
      • Re:Overhyped (Score:4, Interesting)

        by n7ytd (230708) on Friday March 01, 2013 @06:16PM (#43049795)

        If I understand this correctly, the point is to be compatible with zlib decompression. Obviously, you can bet much better compression with xz/lzma, for example, but that would be out of range for most browsers.

        Odd that Google doesn't just push to extend the supported compression formats to include more of these more modern compression libraries if this is a serious concern for them. This sounds like two guys using their 20% time to figure out a way to optimize the deflate algorithm. Kudos to them, but this is not comparable to releasing a royalty-free video codec or other large Googly-type project.

        According to the article, "Zopfli is 81 times slower than the fastest measured algorithm gzip -9" Almost two orders of magnitude of time taken, in return for a compression gain of 3%-8%. It would have been informative to know how much working memory was used vs. what gzip requires. This is a small gain of network bandwidth; trivial, even. But, if you're Google and already have millions of CPUs and petabytes of RAM running at less than 100% capacity, this is the type of small gain you might implement.

        • It's a matter of where. The extra resources are required on the server - even if the content is dynamic, it's quite possible that power and processor time will be cheap there. The corresponding savings are achieved on the clients, which includes smartphones - where connection quality ranges from 'none' to 'crap,' and the user will begrudge every last joule you need to display the page. It's worth throwing a lot of resources away on the server if it can save even a much smaller amount on the more-constrained

          • Exactly my thoughts too - it is rare for any new compression algorithm to be less cpu intensive on the decompression side than what we already have. So, while adding new algorithms to the list that clients can negotiate with servers won't hurt, chances are the most band-width constrained clients won't support them anyway.

          • You can save a lot more joules by avoiding animated Flash ads and excessive JavaScript.

            Has anyone ever calculated the advertising contribution to global warming?

    • Re:Overhyped (Score:5, Informative)

      by sideslash (1865434) on Friday March 01, 2013 @05:18PM (#43049245)

      It improves over gzip by a mere 3% or so, but takes an order of magnitude longer to compress [...] it's practical merit is virtually nil.

      Maybe it's useless to you as a developer(?), and to most people. However, you benefit from this kind of technology all the time. Compare this to video encoding, where powerful machines spend a heck of a lot of time and CPU power to gain extra 3%'s of compression to save bandwidth and give you a smooth viewing experience.

      This tool could have many useful applications for any kind of static content that is frequently served, including web services, as well as embedded content in mobile games and other apps. Every little bit of space savings helps (as long as it isn't proportionally slower to expand, which the article says it stays comparable).

    • by zachie (2491880)
      I disagree. The practical utility is a function of the number of downloads you get per compression operation, the cost of CPU time, and the amount of money that can be saved in bandwidth reduction. I can see how this can be an improvement to serve static content. For example, assuming browsers incorporate the capability to decompress it, lowering the bandwidth of Youtube by ~3% is an achievement.
      • Re:Overhyped (Score:5, Insightful)

        by nabsltd (1313397) on Friday March 01, 2013 @06:07PM (#43049719)

        For example, assuming browsers incorporate the capability to decompress it, lowering the bandwidth of Youtube by ~3% is an achievement.

        I don't know why people keep mentioning Youtube, since all videos are already compressed in such a way that pretty much no external compression is going to gain anything.

        Although when compressing a video Zopfli might result in a smaller file compared to gzip, that doesn't mean either will be smaller than the original. All H.264 files should be using CABAC [wikipedia.org] after the motion, macroblock, psychovisual, DCT, etc. stages, and that pretty much means that the resulting files have as much entropy per bit as possible. At that point, nothing can compress them further.

        • Re:Overhyped (Score:4, Informative)

          by SuricouRaven (1897204) on Friday March 01, 2013 @06:54PM (#43050231)

          There are tricks to that h264 encoding to squeeze a bit more. You can improve the motion estimation by just throwing power at it, though the gains are asymptotic. Or increase the frame reference limit - that does great thing on animation, if you don't mind losing profile compliance. Things like that. Changing the source is also often of great benefit - if it's a noisy image, a bit of noise-removal filtering before compression can not just improve subjective quality but also allow for much more efficient compression. Interlaced footage can be converted to progressive, bad frame rate conversions undone - progressive video just compresses better. It's something of a hobby of mine.

          I wrote a guide on the subject: http://birds-are-nice.me/publications/Optimising%20x264%20encodes.htm [birds-are-nice.me]

          You're right about Zopfli though. Regarding h264, it changes nothing.

          • progressive video just compresses better.

            Well, now, hold on just a second - interlacing can be annoying to deal with, but the fact of the matter is that it allows you to throw away half of your raw data for only a 30% (or less, with modern deinterlacers) perceptual loss of quality - that is excellent compression. Now, if someone came up with it today, they'd be rightly heckled, but because it was around for decades even before the digital era, every modern TV has a good, or in some cases excellent, deinterlacer built in, and - if your target is to

            • Interlacing is good if you need to use analog electronics. But that 'annoying' goes beyond just annoying: It over-complicates everything. The compression benefits are more than offset by the reduced efficiency of the more modern encoding, plus almost every stage in the process - every filter, as well as the encoder and decoder - need to be interlacing-aware. It's an awkward, obsolete technology and I eagerly await the day it is no longer to be found outside of historical video.

              The link looks very interestin

              • by nabsltd (1313397)

                I've done a few restorations before, but you can't see any of them other than http://birds-are-nice.me/video/restorations.shtml [birds-are-nice.me] - all the rest are of various copyrighted videos.

                Did you try Fizick's DeSpot for the Chevrolet commercial? It worked wonders for me on a HDTV broadcast of a crappy print of Private Benjamin, but I had to try dozens of different combinations of parameters to get the maximum clean with no false positives. It could be tweaked even better by using ConditionalReader and/or masking to not process the frames or areas of frames that had glaring false positives, but that's for some other day.

              • It's an awkward, obsolete technology

                ...that's still being broadcast daily across thousands of channels all around the world...

                • Not for much longer. It's on the decline.

                  • [citation needed]

                    You really think those SD channels are going anywhere any time soon? Even for HD, all UK satellite channels (including Sky) are 1080i, as are most others in Europe.

                    • Actually, yes. Not any time soon, but gradually, over the next decade or so. Almost all new programs are made in HD, and even many of those SD channels are just duplicates of an HD channel maintained for compatibility.

                    • Not for much longer.

                      Not any time soon

                      Have you considered a career in politics? :)

            • by nabsltd (1313397)

              bad frame rate conversions undone

              Here's [horman.net] is my adventure with same - thought it might be of some interest.

              First, your link is broken. I fixed it in the quoting I did.

              Second, why did you bother, when all the Dr. Who from The Next Doctor onward were available in HD free-to-air in the correct frame rate? I have them sourced from the original broadcasts. There's a logo, true, but no commercials or cuts due to time constraints.

              • First, your link is broken. I fixed it in the quoting I did.

                Oops... thanks.

                Second, why did you bother, when all the Dr. Who from The Next Doctor onward were available in HD free-to-air in the correct frame rate?

                Planet of the Dead onwards, actually - The Next Doctor has only been broadcast at SD, and is upscaled on the blu-ray. Being broadcast does not make something "available" ;)

                There's a logo, true

                There's one reason - although honestly I don't remember the original broadcasts having a logo. I've also never seen them available above 720p. Thirdly, the bitrate is a lot higher on blu-ray than from a broadcast.

                • by nabsltd (1313397)

                  There's a logo, true

                  There's one reason - although honestly I don't remember the original broadcasts having a logo. I've also never seen them available above 720p. Thirdly, the bitrate is a lot higher on blu-ray than from a broadcast.

                  I could have sworn that my recordings had logos, but you're right...they don't. I guess I was thinking of the Sarah Jane Adventures, which do have logos on the HD broadcasts. I also have The Next Doctor as an HD broadcast, but I think you're right that it was a lower res source, now that I look at it.

                  I resize to 1280x720 for most things when I store them in my library, so I don't really care about the higher res. As for the bitrate, most of it is likely wasted anyway. Run a Blu-Ray .m2ts through Bitrate

          • by nabsltd (1313397)

            It's something of a hobby of mine.

            I wrote a guide on the subject: http://birds-are-nice.me/publications/Optimising%20x264%20encodes.htm [birds-are-nice.me]

            x264 is easy at this point (CRF and --preset slower FTW).

            Right now, I'm trying to figure out how to isolate indivdual hues using AviSynth so that my color correction will only target very specific problem areas. I've done a decent job getting worst of the teal and orange [blogspot.com] look from some of the worst examples (The Terminator remaster, where there was nothing white in the entire movie..it was all blue tinted), as well as getting the green out of Fellowship of the Ring, but those are global changes to get know

          • by nospam007 (722110) *

            "There are tricks to that h264 encoding to squeeze a bit more"

            Even Zöpfli was compressed a bit more, by omitting the umlaut it has in the original word.

    • Re:Overhyped (Score:5, Interesting)

      by K. S. Kyosuke (729550) on Friday March 01, 2013 @05:28PM (#43049353)

      But it's practical merit is virtually nil.

      ...unless you're a large web-based company serving terabytes of identical textual files to end users using deflated HTTP streams.

    • by Goaway (82658)

      In addition to all the other explanations of how you missed the point, Deflate is also used in PNG. This will allow you to make smaller PNG files, too, which can be quite a significant part of your bandwidth.

    • by gweihir (88907)

      One order of magnitude slower??? Then this is just stupid. There are compressors in this speed class that do far better than zlib. The stated compression gains do not even justify the effort of actually looking at this thing.

      • Re: (Score:3, Informative)

        by Anonymous Coward

        But the decompressors for those algorithms are not available in most web browsers, making them totally unusable for the stated use case.

        But hey, why read the article when you can whine about it blindly on /.?

        • by pjt33 (739471)

          PNGcrush and kzip are the counterexamples which spring to mind. I think some of Charles Bloom's compressors are zlib-compatible too. I wonder what Zopfli is doing: some kind of optimal parse? If so, it's hardly novel.

          • by pjt33 (739471)

            Have now RTFA. Still don't know what it's doing, but I was amused by the statement

            Zopfli is written in C for portability

            There are an awful lot of variables which are typed as just int or unsigned and yet whose width appears to matter.

            • Yes, I also wonder what they did. They don't say in their article, and I didn't want to spend time just now wading through the source code to find out for sure. But I suspect it's just throwing more CPU cycles at the compression problem so it can look further ahead.

              In this pointer compression, greedy often isn't best. Here's an example text to illustrate: "resident prevent president". The greedy approach is to always make a maximum length match. It would compress the example text as follows: "reside

              • by pjt33 (739471)

                That would be the optimal parse approach. Of course, the well-known problem with optimal parsing is that sometimes a sub-optimal parse turns out to be better [blogspot.com] once you take into account the Huffman step. It could be that they're focussing on the feedback between those two steps.

      • One order of magnitude slower??? Then this is just stupid. There are compressors in this speed class that do far better than zlib.

        And still can be decompressed by anything that can decompress gzip with no modification on the decompresser?

        (Note that this is important for many use cases, because browsers typically can handle decompressing gzipped HTTP content, so if you have compatible-but-better compression, you can deploy for your server content and browsers handle it with no changes on the client side.)

  • by Antony T Curtis (89990) on Friday March 01, 2013 @05:01PM (#43049079) Homepage Journal

    Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

    • by Anonymous Coward on Friday March 01, 2013 @05:17PM (#43049225)

      Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

      Pfft. Another blatant corporate shill for gzip early in the Slashdot comments. You can't trust anybody on the internet these...

      Oh, wait, the data actually does say that. Huh. That's... a really weird feeling, having someone on the internet legitimately say something's good and have data to back it up.

      • by theCoder (23772)

        Your corporate shill quip made me remember a passing comment I once overheard that went something like:

        ... so what this GNU company has done is license the zip technology to make the gzip format ...

        I almost cried.

    • Yes, and gzip isn't so slow that it can only be used on static content. Even if you always generate into a cached version, do you really want to spend 81x the CPU time to gain a few percent in compression, and delay the content load on the client each time that happens?

      • by DragonWriter (970822) on Friday March 01, 2013 @05:37PM (#43049431)

        Yes, and gzip isn't so slow that it can only be used on static content. Even if you always generate into a cached version, do you really want to spend 81x the CPU time to gain a few percent in compression, and delay the content load on the client each time that happens?

        Why would you recompress static content every time it is accessed? For frequently-accessed, static content (like, for one example, the widely-used JavaScript libraries that Google hosts permanently), you compress it once, and then gain the benefit on every transfer.

        For dynamic content, you probably don't want to do this, but if you're Google, you can afford to spend money getting people to research the best tool for very specific jobs.

    • by n7ytd (230708) on Friday March 01, 2013 @06:01PM (#43049655)

      Looking at the data presented in the pdf, it seems to me that gzip does a fantastic job for the amount of time it takes to do it.

      So the obvious conclusion is that what we need is a gzip -11 option.

    • re Looking at the data presented in the pdf,...
      .
      One obvious truth that is appartent from look at the data presented in the pdf is that those in the googleborg don't know how to format data or text in their documents. (they've scrubbed all doc-generation info from the document before pdf'ing it, but considering that the fonts are all Arial family [Arial-BoldMT, Arial-ItalicMT, Arial-MT, fully embedded truetype fonts] it's possible to guess what word processor they used)
      :>p
      The other thing that is obvio
      • I presume a different country of origin for the research ...

        cf. http://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use [wikipedia.org]

        In Brazil, Germany, Netherlands, Denmark, Italy, Portugal, Romania, Sweden, Slovenia, Greece and much of Europe: 1 234 567,89 or 1.234.567,89. In handwriting, 1234567,89 is also seen, but never in Denmark, the Netherlands, Portugal, Sweden or Slovenia. In Italy a straight apostrophe is also used in handwriting: 1'234'567,89.

        In Switzerland: There are two cases. 1'234'567.89 is used for cu

        • Ah, thank you very much for the extra info. My sincere apologies for not knowing about that particular formatting option. I've played with internationalization settings before, but I had never seen that one.
          .
          You might have to concede, however, that the bizarre use of flush left justification or left-alignment of integer values does not make much sense. Numbers are easier to parse and perceive the relative log-magnitude of when they are presented as decimal aligned for integers or floating point values
          • Oh totally, in fact, right-aligned is also incorrect. Using a decimal tab-stop is the correct option.

  • I implement server software and a very important factor to me is how fast the library performs. Does this new one faster than zlib?
    • Re: (Score:1, Troll)

      by gewalker (57809)

      So important that you could not even be bothered to even look at the article that tells you ~100x slower, 5% better compression compared to zlib.

    • It's far slower, but it could be worth the extra CPU cost for rarely-changing data served to all users, such as big blobs of CSS or JavaScript or public Atom feeds. Compress when it changes, cache the resulting .gz file, serve that.
    • Might want to upgrade your reading skills, even the summary says its slower.

  • by mrjb (547783) on Friday March 01, 2013 @05:09PM (#43049159)
    "Zopfli is a compression-only library, meaning that existing software can decompress the data." (source: http://littlegreenfootballs.com/page/294495_Google_Compression_Algorithm_Z [littlegreenfootballs.com]). As long as the compression can be done on cached pages, hey- that's another 3-8% more people served with the same amount of bandwidth, without any additional requirements on the client side.
    • by Trepidity (597)

      Considering how slow it is (~100x slower than zlib), I doubt anyone will be using it for on-the-fly compression of web content. It'd only really make sense for one-time compression, e.g. Google might use this to slim Android APKs down a little bit.

      • Or they may not re-compress their home page image every time they send it out. The system we use to scale images only scales and optimizes them once, then stores the scaled copies for repeated use. It takes up more space, but is much faster than re-scaling every time or sending the full image.

        I'm guessing whatever google uses does this even better. (and will soon use this new compression technique)
      • by drinkypoo (153816)

        Websites get cached in chunks now so that parts of them can be dynamic and other parts not, and so that you can still gain the benefits of caching while you have content which is only partially dynamic. So when the chunks get cached, you compress them if they will be alive long enough to provide sufficient benefit.

    • As long as the compression can be done on cached pages, hey- that's another 3-8% more people served with the same amount of bandwidth,

      Can anyone address their methodology of testing? They talk about test corpa - but it isn't clear to me if they feed all off the data in each corpa into a single compression run or they individually compressed each file (necessitating a restart of the dictionary search for each one). If it is the former, that may be skewing their results as well since the typical web server isn't handing out 3MB+ files that often.

  • I wonder how it compares to kzip (http://advsys.net/ken/utils.htm) which is trying to do the same just better and faster. Also google is trying to save 3% on gzipped content, but they dont use optipng/pngout on their images... up to 10% gains... jpegs, never heard of jpegtran google? it saves 20% on my digicam pictures (leaving exif and all meta intact).
  • There is zero need for this. There are a number of free compressors available that already cover the spectrum well, for example lzop, gzip, bzip2, xz (in order of better compression and more resource consumption). The stated 3-8% better compression in relation to zlib is not even worth considering using this. Also, anything new will have bugs and unexpected problems.

    This is over-hyped and basically a complete non-event.

    • There are a number of free compressors available that already cover the spectrum well, for example lzop, gzip, bzip2, xz (in order of better compression and more resource consumption).

      Those (except, naturally, gzip) are not compatible with gzip decompressors (of the type found in virtually every browser), so they are useless for the main use case for this, which is as for server side compression for web content that is completely invisible, compared to gzip, to web clients (requiring no changes and having

  • Just add another compression level and merge the code.
    Everything and everyone reaps the benefits automatically as soon as they update.

A sheet of paper is an ink-lined plane. -- Willard Espy, "An Almanac of Words at Play"

Working...