Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Google Programming

Google Launches Brotli, a New Open Source Compression Algorithm For the Web 215

Mark Wilson writes: As websites and online services become ever more demanding, the need for compression increases exponentially. Fans of Silicon Valley will be aware of the Pied Piper compression algorithm, and now Google has a more efficient one of its own. Brotli is open source and is an entirely new data format that offers 20-26 percent greater compression than Zopfli, another compression algorithm from Google. Just like Zopfli, Brotli has been designed with the internet in mind, with the simple aim of making web pages load faster. It is a "lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods". Compression is better than LZMA and bzip2, and Google says that Brotli is "roughly as fast" as zlib's Deflate implementation.
This discussion has been archived. No new comments can be posted.

Google Launches Brotli, a New Open Source Compression Algorithm For the Web

Comments Filter:
  • by monkeyzoo ( 3985097 ) on Tuesday September 22, 2015 @02:12PM (#50576949)

    What's the Weissman score?

    • by Anonymous Coward

      Lossless ha. Like the "c" missing from "Compresses" from the summary?

      Auto correction is losing us more in meaning than any lossy jpeg compression ever did.

  • by dmbasso ( 1052166 ) on Tuesday September 22, 2015 @02:16PM (#50576983)

    It is a "lossless compressed data format that ompresses data

    ... by discarding random bits and pieces of redundant occurrences of words.

    • by wickerprints ( 1094741 ) on Tuesday September 22, 2015 @02:17PM (#50577001)

      No no no... you don't understand. It's just THAT good of a ompression gorithm.

      • Re: (Score:3, Funny)

        by Anonymous Coward

        No no no... you don't understand. It's just THAT good of a ompression gorithm.

        Well, if it ncreases xponentially, it must be good.

      • by Anonymous Coward

        No no no... you don't understand. It's just THAT good of a ompression gorithm.

        Ompression is meta Compression, the type of compression performed by Omniscient beings. It's that good.

      • No no no... you don't understand. It's just THAT good of a ompression gorithm.

        although i work with computers, I'm not up on the newest whiz bang stuff these days. i thought ompression might be some fancy word like impression but meaning something else.

        so anyways, fuck the little punks that come up with all these stupid names that have to sound all web.20 like the 40 or whatever AWS names you have to learn to use Amazon.

    • Re: (Score:3, Funny)

      by Anonymous Coward

      That's lossy ompression.

      • by Anonymous Coward

        It's not lossy. If it were, you wouldn't know how to reconstruct "ompression"; but I think you do. It's as if that sentence were run through an algorithm that detected some of the unnecessary letters in Enlish and omitted thm.

        • by njnnja ( 2833511 )

          Why would it need to comit them?

        • I only figured out you omitted them after reading it a second time. Obviously, we don't really need those letters if I didn't even notice they were gone... But what a nightmare for people learning English as a foreign language!
    • There C is there in square brackets. Not sure why...

    • It's the cloud man. You have to expect bits to go missing every once in a while.

  • A better idea (Score:5, Insightful)

    by Anonymous Coward on Tuesday September 22, 2015 @02:16PM (#50576985)

    If they want to make webpages load quicker, remove ads.

  • Better (Score:3, Funny)

    by Anonymous Coward on Tuesday September 22, 2015 @02:17PM (#50576995)

    Hey, Google! I have a compression algorithm that can compress any size to a single byte. I just need a little help with the decompress and we can really speed things up.

  • by Anonymous Coward

    for a better user experience. The real reason is so that ad blockers will no longer work. It will no longer be web "pages", they will be "apps". The walled garden will move to the web and have the doorway sealed shut.

    • The page has to be decompressed so that the browser knows what else to fetch. This won't affect either ad blockers or host file based blocking, nor will it affect proxies that decode the original page and strip out references to anything that sucks bandwidth, such as images and tracking scripts so no attempt is made to download them. More aggressively, write a browser in java that does page rewriting. Or an app (such as Simply Slashdot) that only downloads the headlines and comments.
    • by sims 2 ( 994794 )

      I don't see what your problem with compression is.
      Chrome already supports gzip, deflate and sdch

      Its part of the "Accept-Encoding" header that gets sent to the server by your browser.

      You can check yours here:
      http://www.xhaus.com/headers [xhaus.com].

      The server won't send a compressed version if your browser doesn't say it can decode it first.

    • by Eythian ( 552130 )

      This isn't insightful, it's just wrong. We already have binary page delivery if the browser and server agree to support gzip or whatever. It's just one more thing like that.

    • if only we had some unified app framework so that you didn't have to install a new app for every site. you could give it a little text bar at the top where you would type in the name of the app's data you want, and instead of having to have the app installed, it would just load the data FROM the source the app loads from, and then render it. you might have to add in some markup language tags here and there to help with the formatting, but it should be doable. this would save a lot on internal storage on the

  • by Anonymous Coward on Tuesday September 22, 2015 @02:36PM (#50577133)

    Stop making my browser run 500 trips to DNS in order to run 500 trips to every ad server in the world.

    Also, for the everloving sake of Christ, you don't need megabytes of scripts, or CSS, or any other shit loaded from 50 more random domains in order to serve up an article consisting of TEXT. One kilobyte of script will be permitted to setup a picture slideshow, if desired. /Get off my e-lawn

  • Brotli, Zopfli... is Google now run by Austrians, according to the Austrian nicknames?

    And lossless too? I'd prefer if they lost the ads, then the compression wouldn't be needed.

  • by NotInHere ( 3654617 ) on Tuesday September 22, 2015 @02:49PM (#50577211)

    From the paper [gstatic.com]:

    Unlike other algorithms compared here, brotli includes a static dictionary. It contains 13’504
    words or syllables of English, Spanish, Chinese, Hindi, Russian and Arabic, as well as common
    phrases used in machine readable languages, particularly HTML and JavaScript.

    This means that brotli isn't a general purpose algorithm, but only built for the web, not more. I guess that future versions of the algorithm will include customized support for other, smaller languages, whose compression databases are only downloaded if you open a web page in that language.

    • by Anonymous Coward

      That's not really true. See https://quixdb.github.io/squash-benchmark/ [github.io]. It compresses non-text data quite well.

    • I'd rather not go back to the days when you had to be on the right code page [wikipedia.org] (or compression database) to not see a wall of Mojibake [wikipedia.org]. Switching to SHIFT-JIS or EUC-JP [wikipedia.org] to see old websites is no fun.
    • by Warma ( 1220342 )

      So this means that it is an algorithm for compressing text. What I'm wondering about is how much actual performance we get, when most of the time spent loading pages is spent loading already heavily compressed content such as images.

  • 1997 called (Score:3, Insightful)

    by Intrepid imaginaut ( 1970940 ) on Tuesday September 22, 2015 @02:52PM (#50577231)

    It wants its bottleneck back. From what I can see it's plugins, scripts and adverts loading from fifty different sites that clog web pages, not large file sizes or what have you. Yes I get that compression is vital for an outfit like Google and they want to showcase what they've been doing but most websites don't have their traffic volume.

    • Re:1997 called (Score:4, Insightful)

      by Ravaldy ( 2621787 ) on Tuesday September 22, 2015 @03:49PM (#50577623)

      Websites maybe, web applications are a different story. Problem is that businesses make heavy use of web applications and data lists can grow out of control quickly if you include the formatting required to make it look right. One could say that these web application have bad data presentation strategies and I would argue that you'd probably be right in many cases. Unfortunately not all problems can be solved with forced filters and paging.

      Take /. for example. The HTML alone of this page (with very few comments at this point) is 200KB. If you add up the CSS and JS you are well above 1MB. Data alone would probably take only 100KB but data that's hard to decipher through is not fun data hence the overhead.

      Compression is a no brainer if used properly. Bandwidth is limited and so is processing power. Just a matter of deciding which one is more important at any given time.

      • by iONiUM ( 530420 )

        I work on an enterprise Business Intelligence web application, and it is 5.0/5.4MB debug (compressed / uncompressed), and 1.6/3.4MB minified + release (compressed / uncompressed) payload. It's over 700 JavaScript files, 200+ images, lots of CSS etc etc (debug, obviously much of this is combined into 1 when built into release). While it's a massive huge first payload, it successfully loads on tablets and phones etc. as long as they have a decent connection (HSPDA+, LTE or WiFi).

        So I suppose I would agree wit

        • by iONiUM ( 530420 )

          I should also mention that loading the debug version onto a tablet is actually impossible. All of the current top-end tablets and phones (iPad Air, Note 4, etc.) all crash when the uncompressed, unminified and uncombined files are transmitted to it. None of them can take it, except the Surface, if you consider that a tablet (I don't).

          As a result, we've had to use emulators on machine with huge amounts of RAM and pray to God the error shows up in the emulator as well. Hopefully we can start taking advantage

    • by AmiMoJo ( 196126 )

      Chrome practically killed off plug-ins when it ditched the Netscape plug-in API earlier this year. Future versions will not run Flash adverts by default either. Google do appear to be taking some fairly aggressive steps to clean up the web and advertising.

      I'm surprised Mozilla hasn't jumped on this bandwagon too.

  • AdBlock developed a webpage compression algorithm that works much better than that. Although it simultaneously functions as a virus scanner, removing many malicious scripts and even most social engineering attack vectors, it not only doesn't slow down your computer, but rather makes pages load faster!

    • Be careful, you might summon the flying spittle of APK.

  • Brotli has been designed with the internet in mind, with the simple aim of making web pages load faster.

    Intall AdBlock, and it'll be faster.

    No need for compression, when the ad server are the ones who slow page loading to a crawl.

  • As websites and online services become ever more demanding, the need for compression increases exponentially.

    I hope not, because we're not going to get exponential increases in compression over what we have now.

    • I've found that misuse of the word "exponentially" by people who don't understand what it means is growing exponentially more irritating. ;)

  • I was like "Haven't I seen this before?", and thinking that I had, because I work on Chrome, but then looked at:
          https://en.wikipedia.org/wiki/... [wikipedia.org]
    which says "This page was last modified on 27 February 2015, at 18:32."

    Maybe it's just coming out of beta.

    • I was like "Haven't I seen this before?", and thinking that I had, because I work on Chrome, but then looked at: https://en.wikipedia.org/wiki/... [wikipedia.org] which says "This page was last modified on 27 February 2015, at 18:32."

      Maybe it's just coming out of beta.

      This is Google. Nothing ever comes out of beta.

  • As opposed to... (Score:4, Insightful)

    by JustAnotherOldGuy ( 4145623 ) on Tuesday September 22, 2015 @03:26PM (#50577431) Journal

    It is a "lossless compressed data format..."

    As opposed to what, a lossy compression formula for data?

    Well hell, if you don't need the original data back I've got a compressor algorithm that'll reduce a 50GB file to 1 byte every time. Sometimes even less than a byte, like maybe 0.25 bytes. In fact it reduces the file to such a small size that just finding it again turns out to be a real challenge...

    • by Stavr0 ( 35032 )

      I could see a lossy algorithm for HTML / JS that eliminates lengthy tags and inefficient structures, perhaps even perform code optimization on heavily JS-infested pages, while rendering identically to the original.
      The result of course would look nothing like the source and couldn't easily be reconstructed.

    • by suutar ( 1860506 )

      yes, exactly. Not everything requires the original data be recreated exactly, after all, just "similar enough". (images and audio, mostly; text and things represented as text usually fare poorly when parts are missing.)

    • It's easy to joke about, but in many cases lossy encoding/compression algorithms can be incredibly useful. Most tech people are already familiar with the fact that JPEG and MP3 are lossy encodings that produce "good enough" results for the vast majority of use cases while potentially reducing the data footprint by orders of magnitude, so I won't dwell on those

      But what many of us forget to consider (even though we're aware of the fact that it's done) is that code can also be encoded in lossy ways to great be

      • It's easy to joke about, but in many cases lossy encoding/compression algorithms can be incredibly useful.

        Yes yes, I know. But for something like a compressed database dump....no. Or medical imaging....no. Or forensic work.....no.

        Pictures of cats....yes.

        • Quite right. And there will likely never be a general purpose lossy compression scheme, since the "loss" needs to be tailored to whatever the content is in order for the scheme to be useful.

          When I was in grad school, our research group was working on a lossless compression algorithm tailored for web content (for internal use, since we had done the largest web crawl in academia at the time and needed a better way to store the data), much like this new one Google has. By relying on assumptions that hold true

          • or potentially even in database dumps (e.g. eliminating needlessly duplicated rows,

            I don't see how an algorithm could ever really determine with any certainty which/what rows were "needlessly duplicated", as the reason(s) for duplication could vary so widely as to make it a guessing game. Maybe a row was there to test "needless" duplication.

            Unless it was able to read the designer's mind or somehow positively determine for itself what the definition of "needless" was, this would be a disaster waiting to happen. Maybe a duplicated row was there for a good reason, just not one that's apparen

            • Quoting myself, emphasis added:

              (e.g. eliminating needlessly duplicated rows, performing cascading deletes on data that wasn't cleaned up properly before, etc., though again, it would need to be tailored to the case at hand

              I fully agree with you that there will never be a lossy compression scheme that works in the general case. And I agree with you as well that we can't come up with a lossy database compression scheme that can work in that generalized case. I was merely pointing out that there may be specific cases within that general case where lossy compression is possible.

              Sorry if that comes across as me being obtuse. That wasn't my intent. I was merely trying to point out that domain knowled

  • by Anonymous Coward

    I grabbed source, compiled and tests on latest linux kernel tarball. All compressors at 'best', so -9 for gzip, bzip2 and xz. Brotli at 11 (I think that's the highest). Brotli took twice as long as xz -9, but took 1/3 of the memory of xz -9.

      82M Sep 22 14:12 linux-4.2.1.tar.xz
      87M Sep 22 14:46 linux-4.2.1.tar.bro
      98M Sep 22 14:13 linux-4.2.1.tar.bz2
    125M Sep 22 14:13 linux-4.2.1.tar.gz
    602M Sep 22 14:14 linux-4.2.1.tar

    • by AvitarX ( 172628 )

      Twice as long to compress, to compress then decompress, or to decompress?
      They seem to be only claiming fast expansion.

  • "lossless compressed data format that ompresses data

    You've lost a "c" already.

  • Fans of Silicon Valley will be aware of the Pied Piper compression algorithm, and now Google has a more efficient one of its own.

    IMHO, at first read, it sounds like it's saying it's more efficient than Pied Piper's (fictional) algorithm... Which of course is impossible, since Pied Piper's will compress everything.

  • It goes up to eleven

One good suit is worth a thousand resumes.

Working...