Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Internet Chrome IT

Double-keyed Browser Caching Is Hitting Web Performance 88

A Google engineer has warned that a major shift in web browser caching is upending long-standing performance optimization practices. Browsers have overhauled their caching systems that forces websites to maintain separate copies of shared resources instead of reusing them across domains.

The new "double-keyed caching" system, implemented to enhance privacy, is ending the era of shared public content delivery networks, writes Google engineer Addy Osmani. According to Chrome's data, the change has led to a 3.6% increase in cache misses and 4% rise in network bandwidth usage.
This discussion has been archived. No new comments can be posted.

Double-keyed Browser Caching Is Hitting Web Performance

Comments Filter:
  • by TechyImmigrant ( 175943 ) on Tuesday January 14, 2025 @03:14PM (#65088661) Homepage Journal

    If it's a local cache and the two things contain the same data, store it once and reference count it. You have to retrieve the second copy once to know it's the same so unless I'm visualizing the problem wrong, the information leakage is not happening.

    • This would do not help reduce the cache misses and save network bandwidth, though.
      • This would do not help reduce the cache misses and save network bandwidth, though.

        If the excess bandwidth is the downloading on first visit of a file you previously downloaded, then that's the cost of security.
        You could mitigate by first sharing hashes of the code you are linking so the browser can make a choice about downloading or not. A filename is not a secure hash.
        If it's say a common jquery library, then there's very little information leaked by not downloading it.
        If it's something more specific to singular websites, then go ahead and download it.

    • by Dan East ( 318230 ) on Tuesday January 14, 2025 @04:21PM (#65088889) Journal

      store it once and reference count it.

      Um, that's exactly how it used to work, and opens the door for privacy issues. I have a domain and resource "a.com/foo.txt". Each time that is visited it returns a unique random number. A page on totally unrelated websites b.com and c.com each load that file. Whichever hits it first results in a new number being generated, however the second one will be cached if it is not double-keyed. Then both b.com and c.com will have the same file with the same data - so now I can track you across both websites.

      There are other tricks that can be used. Say you visit my website for the first time, and I want to know if you have visited my competitor's website. I can have my page load a resource from that other site, and based on how fast it loads I can know if it was already cached or not and thus know if you visited their site lately.

      • by DarkOx ( 621550 )

        This correct fix for most of these problems is the integrity attribute. It makes most mischief impossible.

        What browsers really ought to do is - allow single cache key if integrity is present.

        Encourage developers to use it.

        • This correct fix for most of these problems is the integrity attribute. It makes most mischief impossible.

          What browsers really ought to do is - allow single cache key if integrity is present.

          Encourage developers to use it.

          I don't see how the integrity attribute helps to prevent the timing attack. Can you elaborate?

        • by allo ( 1728082 )

          If it is in the cache, it will not be loaded on the other site. So slashdot can load a https://trackingcompany/slashd... [trackingcompany] and a https://trackingcompany/pornsi... [trackingcompany] and the trackingcompany can measure if one of them or both are accessed from your ip. The content of the files is irrelevant for that.

          • If it is in the cache, it will not be loaded on the other site. So slashdot can load a https://trackingcompany/slashd... [trackingcompany] and a https://trackingcompany/pornsi... [trackingcompany] and the trackingcompany can measure if one of them or both are accessed from your ip. The content of the files is irrelevant for that.

            The content is not irrelevant if it's executable and not signed.
            The mitigation of side channels is best done with quantization and trace synchronization prevention. But those might be not in the skill set of the average web programmer.

            The blasted wasteland approach is to not cache. The googles will whine about the wasted bandwidth, but then they shouldn't have spent decades putting insecure constructs into web technologies.

            • by allo ( 1728082 )

              If there is content it can do all kinds of things. But the simple attack does not need content, it only needs to monitor requests and place crafted URLs (e.g. to images, css or other things that are loaded even without JS active) on the websites. Not much interaction of the client needed except for loading, caching and not loading cached contents.

              • Read my comment further up. Count the references from each site and always download from each site the first time you visit it regardless of whether or not you have it. So the side channel is muted.

  • by fph il quozientatore ( 971015 ) on Tuesday January 14, 2025 @03:19PM (#65088685)
    Blame the assholes who used web caching across multiple domains to track users. Very ironic that it's a Google engineer raising the issue.
    • If they can use it to track us they will. Blame the people who made the systems they're exploiting.
      • by Shaitan ( 22585 ) on Tuesday January 14, 2025 @03:44PM (#65088765)

        "If they can use it to track us they will"

        Blame the people who refuse to imprison 'they' for the felony behavior of doing so and refuse to admit that user metrics and meta data are property of users not those spying on them.

        • Playing punishment whack-a-mole may give you special jollies but it doesn't fix the actual problem. And it's additionally really expensive. Good fences make good neighbors.
          • by flink ( 18449 )

            Good fences make good neighbors.

            Good fences don't make good neighbors. The poem was being ironic. It was lamenting the building of fences keeping people apart so they didn't get to know one another.

            • Re: (Score:3, Funny)

              Good fences don't make good neighbors. The poem was being ironic. It was lamenting the building of fences keeping people apart so they didn't get to know one another.

              I'm aware the Robert Frost questioned the need for fences. The common usage, though, is that a pragmatic containment keeps things civil.

              You're entirely welcome for the chance to misunderstand and then incorrrect me, proving your intellectual prowess, though. Happy to oblige.

          • by Shaitan ( 22585 )

            The actual problem here isn't technical or rather the technical problem always existed and always will, science depends upon it. Making the retention and trafficking of user data a felony [which is actively prosecuted] and recognizing ownership remains with the user even when data rests on or is generated on a third party system will move the issue from main street to the same dark alleys where carders and nigerian prices dwell.

            That ends it as SOP in the modern world and reduces it by 99.99%. I'd call that

        • "If they can use it to track us they will"

          Blame the people who refuse to imprison 'they' for the felony behavior of doing so and refuse to admit that user metrics and meta data are property of users not those spying on them.

          You think you own your data on an open network? How very quaint of you. The second your data is accessible publicly it's owned by whoever is willing to throw money at it. The real problem is that we don't get the money generated from our data. Somehow, the middle-men don't even put up a pretense of trying to pay the creators of the product they are selling. The digital age is a wonderous thing for grifters and grubbers.

          • by Merk42 ( 1906718 )
            "Information wants to be free!"
            "No! Not like that!"
            • "Information wants to be free!" "No! Not like that!"

              Information wants to be tied up, spanked, and have the electrodes applied to its nipples. It's not even fussy about who does it.

          • by Xenx ( 2211586 )

            The real problem is that we don't get the money generated from our data.

            I don't want to sound like I'm supporting the poor behavior around the practices, but we do in fact get paid for our data. We get paid in use of those services. It costs money to run, and their operating costs come from the sale of data. The real problem is that people are bad at recognizing costs outside of immediate price. This allows companies to take advantage of the situation.

            I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that

            • The real problem is that we don't get the money generated from our data.

              I don't want to sound like I'm supporting the poor behavior around the practices, but we do in fact get paid for our data. We get paid in use of those services. It costs money to run, and their operating costs come from the sale of data. The real problem is that people are bad at recognizing costs outside of immediate price. This allows companies to take advantage of the situation. I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that means services would need to start charging for their use in some capacity. I don't have a good solution at hand, but there should at least be more done to make it apparent the real price of a service. Maybe something like either mandating an ad/data-sale free option, or having to show a realistic estimate of earnings based on user data at sign up and update on say an annual basis.

              I'd be curious how the data aggregators that take our data even when we aren't using their service can justify it. Granted, they don't have to, but places like Facebook and their shadow profiles, developed from the fact that somebody's address book had my contact info in it, and then they collect data from any site where I've volunteered my data for a service I was using. I don't even mind so much when we're sharing data purposefully for a free service. When I pay for a service, or buy something from a sell

              • There are two groups that are facilitating this:
                1. The c-suite types who are driven by profit and just see the numbers not the scope of the data. While your data is worth something, it's not a lot.
                2. The engineers doing it who abstract it as a challenge/problem to be solved rather than something impacting people. And frankly don't always realize what they are doing. I've had more than a few cases were well intentioned engineers brought me a solution that was revealing far more than they intended.
                3. Oka
                • by Shaitan ( 22585 )

                  "I swear Amazon knew my wife was pregnant before we did"

                  No dude, your wife knew before you did and Amazon knew 5 seconds later when she began searches related to it.

                • by bn-7bc ( 909819 )
                  "The engineers doing it who abstract it as a challenge/problem to be solved rather than something impacting people. And frankly don't always realize what they are doing" Calling https://en.wikipedia.org/wiki/... [slashdot.org]."">Thomas Midgley Jr Hiw dit cfcs and leaded fuel wirk out?
              • by Xenx ( 2211586 )

                When I pay for a service, or buy something from a selling site, then see that data clearly get shoveled to advertisers, how is that "fair use" on their part. If I'm already paying, fuck off on the data collection.

                I don't disagree, but that data money is still part of the price of the goods/service you bought. I understand that sometimes the market wouldn't support adjustments, but most companies aren't going to give up that extra money for nothing.

                • by Shaitan ( 22585 )

                  "I don't disagree, but that data money is still part of the price of the goods/service you bought."

                  Exactly... the advertiser bought the data, the vendor paid the advertiser, the vendor factors that into the price of the product that I'm buying. At the end of the day it is ME paying for my data. Actually I'm not just paying for my data but also paying for the ad views that didn't result in a conversion as well.

                  • by Xenx ( 2211586 )

                    Exactly... the advertiser bought the data, the vendor paid the advertiser, the vendor factors that into the price of the product that I'm buying.

                    I don't deny there is a bit of recursion in the process. However, it likely isn't as straight forward or equal as you describe it.

                    At the end of the day it is ME paying for my data.

                    Technically, sure, but that is true of literally every business cost. There will be variances in spending, but at the end of the day the companies are still going to spend that money on advertisements. That means you're paying the same amount either way, but in the case of selling your data they still make more overall.

                    • by Shaitan ( 22585 )

                      Right, and since the data doesn't belong to them it should be a felony to retain, sell, or traffick in user data/meta data.

                    • by Xenx ( 2211586 )
                      Did you even read what I said? Because, this reply completely ignores it.
                    • by Shaitan ( 22585 )

                      Sure did, and it hasn't really changed from this "most companies aren't going to give up that extra money for nothing", you just supported your assertion that it is extra money for them.

                      I didn't ignore it, I gave an alternative to asking them to give up extra money for nothing. They can give up extra money in exchange for c suite not having to serve as bitches in prison.

            • by Shaitan ( 22585 )

              "I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that means services would need to start charging for their use in some capacity. I don't have a good solution at hand"

              What do you mean? You just gave a perfectly good solution. But that doesn't automatically mean services would have to charge for us. TV and radio were free and ad supported without data collection. Just because people WANT to laser target their ads doesn't mean they would

              • by Xenx ( 2211586 )

                You just gave a perfectly good solution.

                A good solution for some. It is not my place, or yours, to be the sole arbiter of what is good for everyone. Maybe the people and/or law will eventually go that way. I think the average person doesn't care enough, and would rather save the money.

                The problem comes from transiting, sharing, and storing the data. Legally the site I'm interacting with [knowingly, not secretly via invisible pixels and cookie trails] should get to see the data from my interaction for their own use and that could even include serving ads but it remains my property and shouldn't be retained or shared with anyone else.

                Legally, you are incorrect. The data is about you, but it does not necessarily belong to you. There is a distinction. Obviously, legal jurisdiction and all that. I'm not arguing against a desire for it to work that way. I'm only correcting the statement as presented

                • by Shaitan ( 22585 )

                  "Legally, you are incorrect."

                  No, I'm not. By right my data belongs to me even when in the hands of a third party with which I conduct business such as a bank. As with anything which is innately true, by right, this is true regardless of any status or interpretation. If a system of law fails to recognize an innate right it is self-invalidating and pending remediation, someone who points this out is not incorrect.

                  "I'm not arguing against a desire for it to work that way. I'm only correcting the statement as p

                  • by Xenx ( 2211586 )

                    No, I'm not.

                    Yes, you are. Now, kindly shove off.

                    • by Shaitan ( 22585 )

                      I'm beginning to suspect you are one of the evil bastards who has been trafficking in our data and doesn't want to face justice and/or wants to keep up the exploitation. Like a southern plantation owner defending slavery back in the day.

                      I get it. It's going to be disruptive but in practice I suspect we'll only need to toss a few dozen execs in prison to make the point before the masses who are only willing to engage in LEGAL [if unethical] business practices flee in en mass. The odds of you being one of the

                    • by Xenx ( 2211586 )
                      I believe in honesty first when arguing against something. You appear unable to distinguish your opinion from actual fact. I don't wish to continue wasting my time on this with you.
                    • by Shaitan ( 22585 )

                      "You appear unable to distinguish your opinion from actual fact."

                      In order to distinguish one's opinion from actual fact one would have to hold an opinion they believe is wrong. As Dr. House says, it isn't so much that I think I'm always right as that I find it is difficult to operate from the other assumption.

          • by Shaitan ( 22585 )

            "You think you own your data on an open network? How very quaint of you. The second your data is accessible publicly it's owned by whoever is willing to throw money at it."

            Open network? I'm not on an open network. I'm using a private system, connected to a private network for which I pay a private party for access, through agreements that private party has with other private parties it transits my requests to systems controlled by another private party which I wish to access (slashdot for example). None of

      • Google has dictated the direction of web technologies for the last twenty years. The blame still lies with them.

      • by allo ( 1728082 )

        Why blame them? They invented double-keyed cache what prevents it. The solution is fine, it only lowers the performance a bit ... blame the advertisers that it is needed.

    • by AvitarX ( 172628 )

      The more sophisticated you need to be to track, the bigger the advantage larger companies have.

    • I mean they are 'fixing' it. Meaning they're making the changes, and they might as well get the kudos for doing the work too. Advertise it a bit.

  • by Midnight Thunder ( 17205 ) on Tuesday January 14, 2025 @03:21PM (#65088699) Homepage Journal

    Could websites not simply provide strong checksum to resources, as a way of identifying if a piece of content is the same? Then the caching is simply done with file checksum? I'm sure someone has thought about this, so not sure why this is not being used?

    • I think the problem there is timing attacks: so long as resources get cached across domains itâ(TM)s possible(how practical and how precise would vary according to fiddly details of exactly how common or distinctive various cached assets are) for a site you visit to draw inferences on where you have been previously by referring to a potentially cached resource and seeing whether itâ(TM)s available essentially instantly or whether thereâ(TM)s a delay that suggests your browser needs to grab a
      • by dgatwood ( 11270 )

        I think the problem there is timing attacks: so long as resources get cached across domains itâ(TM)s possible(how practical and how precise would vary according to fiddly details of exactly how common or distinctive various cached assets are) for a site you visit to draw inferences on where you have been previously by referring to a potentially cached resource and seeing whether itâ(TM)s available essentially instantly or whether thereâ(TM)s a delay that suggests your browser needs to grab a copy.

        It tells them that you have been to a site that uses a particularly resource, but not what site. You could just as easily have previously visited a site that uses the above approach to guess what sites you have been to.

        So at best, it violates your privacy exactly once, and because the user might have hit such a browser profiling site previously in incognito mode, you can't be certain even once unless your list of resources changes rapidly.

        Given a choice, unless I'm missing something, I think I'd rather hav

        • by Burdell ( 228580 )

          Sites often load dozens of potentially-shared resources like JS, icons, fonts, JSON data used by the JS, etc. All a CDN has to do is make sites use different combinations - for example site 1 uses JS-A, JS-B, icon-X, and icon-Y. Site 2 uses JS-A, JS-B, icon-X, and icon-Z (where icon-Y and icon-Z are really the same file). Now the CDN can tell when you visit both sites, because your browser would load everything for a visit to site 1, and only icon-Z for a visit to size 2.

          • by dgatwood ( 11270 )

            Sites often load dozens of potentially-shared resources like JS, icons, fonts, JSON data used by the JS, etc. All a CDN has to do is make sites use different combinations - for example site 1 uses JS-A, JS-B, icon-X, and icon-Y. Site 2 uses JS-A, JS-B, icon-X, and icon-Z (where icon-Y and icon-Z are really the same file). Now the CDN can tell when you visit both sites, because your browser would load everything for a visit to site 1, and only icon-Z for a visit to size 2.

            The CDN already likely knows when I visit both sites, because they're serving the main page of both sites. This is about as scary to me as "Your computer is sending out an IP address" ads.

  • by ahziem ( 661857 ) on Tuesday January 14, 2025 @03:32PM (#65088725) Homepage
    This is why we can't have nice things
    • Because we insist on all the things being "free?"
      • No, because advertisers are not content with just advertising (like in the old days of print ads) and insist on invading our privacy.

        • Advertising (like protesting) has always been disruptive and annoying. That's the point. The most obnoxious thing about any reasonably urban area is all of the visual pollution that is advertising.
        • The Better Ads Standards [betterads.org], published by the Coalition for Better Ads, forbid use on the web of a lot of techniques routinely used to boost visibility of advertisements in old media. Compared to radio and television, the Standards ban interstitials that force the viewer of a text article to wait several seconds until the sponsor has finished delivering a message. You may recall some publishers doing a "pivot to video" in 2017 or so, publishing fewer text articles and more videos. This was an attempt to recapt

      • by ahziem ( 661857 )
        I'm not sure what you meant by free. The article mentioned increased costs, but I didn't see anything about a new charge for something that was previously free. I meant the consequence of bad actors abusing the cache information is that web visitors have longer page load times, bandwidth for users (like mobile data) and providers increases, the benefits of shared assets on CDNs is reduced, and web developers need new strategies to improve page load times.
  • The storage argument "forces websites to maintain separate copies of shared resources" per domain is a red herring (BS). The websites already have separate directory trees (in CDN). Any shared resources can continue to be shared: just use symbolic links in website tree(s) to the shared resource(s) in CDN. Don't they do this already?
    • by KlomDark ( 6370 )
      No, the cache is now tied to the domain name of the parent site. (The site you went to in the first place, not the CDN) They aren't shared anymore.
      • by srg33 ( 1095679 )
        I know that (tied to parent site / domain name). That means copies in the client browser cache. But, the websites only need one file that they can point to . . .
        • by tepples ( 727027 )

          To protect the viewer from fingerprinting, the browser downloads a separate copy of the file for each website that points to it.

          • by srg33 ( 1095679 )
            IF you're helping to clarify THEN thanks.
            ELSE
            That's what I said. I started by pointing out that the article was wrong in claiming that Double-keyed Browser Caching "forces websites to maintain separate copies of shared resources". Websites only need one file that they can point to (public URL or private filesystem) . . . even if the browser (client) downloads more than once from the website (server).
          • It can be downloaded then throw away. There is no need for the browser to keep two copies.
            • It can be downloaded then throw away. There is no need for the browser to keep two copies.

              Deduplicating the browser's disk-backed cache is subject to timing attacks on disk access speed. If the browser deduplicates cached responses with matching URLs, sizes, and hashes, a website's script can time cache retrieval to distinguish a file that is in RAM because it's been visited through another website from a file that remains on disk because it has not recently been visited through another website. This is especially true on machines with a conventional hard disk drive or a slow eMMC, as opposed to

  • Because FluffyBunny.jpg is obviously the same image on MisterRogersNeigborhood.com and PornHub.com.

    • Re: (Score:3, Interesting)

      by Anonymous Coward

      Actually had something like this happen on the server side with our CDN provider. A miscellaneous image lookup returned a cached image of a nude model because of some key collision with another tenant at the CDN. I asked them to fix the problem and to their credit they did fix the keying technique within minutes. And credited our account.

  • by Fly Swatter ( 30498 ) on Tuesday January 14, 2025 @04:08PM (#65088845) Homepage
    No trust, no shared cache. It's really quite simple. Third party content can't be trusted.
    • Remember when software installed on local computers, relied on many shared components that were separately installed? You know, like image processing libraries and such? It was a mess. Install an update to one component, and who knows how many of your software titles broke. Nowadays, languages like .NET package everything together in one set of folders. No sharing, even if you are using the same component as other software. It keeps software from interfering with other software.

      Now the internet is finding o

      • by tepples ( 727027 )

        With the movement toward self-contained applications, we end up having each application's install folder becoming the size of an operating system.

        • Oh I don't know. I suppose it depends on the stack. I just did some checking on some .NET web apps, the total size of the deploy folder was about 70 MB. It's true, once upon a time OSes were that size, but not anymore.

  • Hey website, tell me the hash value of the document at this URL.
    If I find the hash in my local cache, and it hasn't expired.
    ....Use its static content (but it gets its own private data/cookie store).
    Otherwise
    ....Hey website, give me the document at this URL.

    This strategy will reduce requests of identical documents and scripts across websites, so long as the hash function is cryptographically secure against collisions.

    • by ffkom ( 3519199 )
      You miss that the second (third, fourth...) access is attempted by malicious web sites that only want to probe whether they get the response fast (from the cache) or slow (from the net), in order to figure out what other web-pages you visited earlier. Given the vast amount of foreign content many web-sites use, it is easy to identify them purely from what they populated your cache with.
    • by Mal-2 ( 675116 )

      There is no hash function that is secure against collisions unless the hash is longer than the data itself. It's the pigeonhole principle: [wikipedia.org] if n items are put into m containers, with n > m, then at least one container must contain more than one item.

      • There is no hash function that is secure against collisions unless the hash is longer than the data itself.

        Information security are less concerned with perfect security against collisions than with probabilistic security. If the number of possible hashes exceeds 10^82, the number of atoms in the observable universe, collisions become vanishingly unlikely. The number of distinct SHA-256 hash values is close to that: on the order of 10^77. If you can find a collision, you will probably become rich and famous.

        Timing attacks are the problem.

        • by Mal-2 ( 675116 )

          So this is a case where the probability of any particular hash collision is essentially zero even though the probability of having more than one possible original file per hash is essentially 100%? Like you can throw a dart at a dartboard and (if you don't suck) have essentially 100% chance of hitting the board, but the probability of hitting any particular bristle within the dartboard is essentially zero.

          I don't like this one bit, it sounds like an opportunity to force open the tiniest of cracks with a qua

  • OK (Score:5, Insightful)

    by bill_mcgonigle ( 4333 ) * on Tuesday January 14, 2025 @05:45PM (#65089109) Homepage Journal

    4% for significantly enhanced privacy?

    Sold.

    • Pff... I get much, MUCH better gains by using the usual ad blockers and script blockers.

      Even after all these years, I'm still shocked at how much web traffic is saved. Holy cow.

  • by ffkom ( 3519199 ) on Tuesday January 14, 2025 @05:50PM (#65089127)
    ... is a very small price to pay for better privacy. Even bigger privacy gains are possible, of course, by not using any Google services at all.
    • I'm not sure this change will guarantee privacy, or that privacy was impossible unless this was done.

      I mean would Google implement anything that truly impacted privacy? Meaning that it wouldn't surprise me if they already know a way around this new set of assumptions. Or they think it'll increase the value of other solutions they sell. Force us onto the standard pathways that they control and own.

  • They already have a way around it to continue tracking you If the product you are using is free, you aren't the customer
  • If you are optimizing in the 4% range, you are doing it massively wrong. This is greed-fueled stupidity, nothing else. All it does is ossify the web and its tech. But maybe Google wants that.

  • Sure cache state might leak a little state information you wouldn't (in a perfect world) want to happen. And???

    What does it matter if I have jquery already installed/loaded in the version you looked for?

    Can something be implemented poorly and leak something important? I'm guessing so, but is this really a common problem that we should completely change all browser caching and assumptions about shared web code?

  • It would leave users vulnerable to a supply chain attack and grave privacy violations

"Never give in. Never give in. Never. Never. Never." -- Winston Churchill

Working...