Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
The Internet Chrome IT

Double-keyed Browser Caching Is Hitting Web Performance 60

A Google engineer has warned that a major shift in web browser caching is upending long-standing performance optimization practices. Browsers have overhauled their caching systems that forces websites to maintain separate copies of shared resources instead of reusing them across domains.

The new "double-keyed caching" system, implemented to enhance privacy, is ending the era of shared public content delivery networks, writes Google engineer Addy Osmani. According to Chrome's data, the change has led to a 3.6% increase in cache misses and 4% rise in network bandwidth usage.

Double-keyed Browser Caching Is Hitting Web Performance

Comments Filter:
  • If it's a local cache and the two things contain the same data, store it once and reference count it. You have to retrieve the second copy once to know it's the same so unless I'm visualizing the problem wrong, the information leakage is not happening.

    • This would do not help reduce the cache misses and save network bandwidth, though.
    • by Dan East ( 318230 ) on Tuesday January 14, 2025 @04:21PM (#65088889) Journal

      store it once and reference count it.

      Um, that's exactly how it used to work, and opens the door for privacy issues. I have a domain and resource "a.com/foo.txt". Each time that is visited it returns a unique random number. A page on totally unrelated websites b.com and c.com each load that file. Whichever hits it first results in a new number being generated, however the second one will be cached if it is not double-keyed. Then both b.com and c.com will have the same file with the same data - so now I can track you across both websites.

      There are other tricks that can be used. Say you visit my website for the first time, and I want to know if you have visited my competitor's website. I can have my page load a resource from that other site, and based on how fast it loads I can know if it was already cached or not and thus know if you visited their site lately.

      • by DarkOx ( 621550 )

        This correct fix for most of these problems is the integrity attribute. It makes most mischief impossible.

        What browsers really ought to do is - allow single cache key if integrity is present.

        Encourage developers to use it.

        • This correct fix for most of these problems is the integrity attribute. It makes most mischief impossible.

          What browsers really ought to do is - allow single cache key if integrity is present.

          Encourage developers to use it.

          I don't see how the integrity attribute helps to prevent the timing attack. Can you elaborate?

  • by fph il quozientatore ( 971015 ) on Tuesday January 14, 2025 @03:19PM (#65088685)
    Blame the assholes who used web caching across multiple domains to track users. Very ironic that it's a Google engineer raising the issue.
    • If they can use it to track us they will. Blame the people who made the systems they're exploiting.
      • by Shaitan ( 22585 )

        "If they can use it to track us they will"

        Blame the people who refuse to imprison 'they' for the felony behavior of doing so and refuse to admit that user metrics and meta data are property of users not those spying on them.

        • Playing punishment whack-a-mole may give you special jollies but it doesn't fix the actual problem. And it's additionally really expensive. Good fences make good neighbors.
          • by flink ( 18449 )

            Good fences make good neighbors.

            Good fences don't make good neighbors. The poem was being ironic. It was lamenting the building of fences keeping people apart so they didn't get to know one another.

            • Re: (Score:2, Funny)

              Good fences don't make good neighbors. The poem was being ironic. It was lamenting the building of fences keeping people apart so they didn't get to know one another.

              I'm aware the Robert Frost questioned the need for fences. The common usage, though, is that a pragmatic containment keeps things civil.

              You're entirely welcome for the chance to misunderstand and then incorrrect me, proving your intellectual prowess, though. Happy to oblige.

        • "If they can use it to track us they will"

          Blame the people who refuse to imprison 'they' for the felony behavior of doing so and refuse to admit that user metrics and meta data are property of users not those spying on them.

          You think you own your data on an open network? How very quaint of you. The second your data is accessible publicly it's owned by whoever is willing to throw money at it. The real problem is that we don't get the money generated from our data. Somehow, the middle-men don't even put up a pretense of trying to pay the creators of the product they are selling. The digital age is a wonderous thing for grifters and grubbers.

          • by Merk42 ( 1906718 )
            "Information wants to be free!"
            "No! Not like that!"
            • "Information wants to be free!" "No! Not like that!"

              Information wants to be tied up, spanked, and have the electrodes applied to its nipples. It's not even fussy about who does it.

          • by Xenx ( 2211586 )

            The real problem is that we don't get the money generated from our data.

            I don't want to sound like I'm supporting the poor behavior around the practices, but we do in fact get paid for our data. We get paid in use of those services. It costs money to run, and their operating costs come from the sale of data. The real problem is that people are bad at recognizing costs outside of immediate price. This allows companies to take advantage of the situation.

            I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that

            • The real problem is that we don't get the money generated from our data.

              I don't want to sound like I'm supporting the poor behavior around the practices, but we do in fact get paid for our data. We get paid in use of those services. It costs money to run, and their operating costs come from the sale of data. The real problem is that people are bad at recognizing costs outside of immediate price. This allows companies to take advantage of the situation. I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that means services would need to start charging for their use in some capacity. I don't have a good solution at hand, but there should at least be more done to make it apparent the real price of a service. Maybe something like either mandating an ad/data-sale free option, or having to show a realistic estimate of earnings based on user data at sign up and update on say an annual basis.

              I'd be curious how the data aggregators that take our data even when we aren't using their service can justify it. Granted, they don't have to, but places like Facebook and their shadow profiles, developed from the fact that somebody's address book had my contact info in it, and then they collect data from any site where I've volunteered my data for a service I was using. I don't even mind so much when we're sharing data purposefully for a free service. When I pay for a service, or buy something from a sell

              • There are two groups that are facilitating this:
                1. The c-suite types who are driven by profit and just see the numbers not the scope of the data. While your data is worth something, it's not a lot.
                2. The engineers doing it who abstract it as a challenge/problem to be solved rather than something impacting people. And frankly don't always realize what they are doing. I've had more than a few cases were well intentioned engineers brought me a solution that was revealing far more than they intended.
                3. Oka
                • by Shaitan ( 22585 )

                  "I swear Amazon knew my wife was pregnant before we did"

                  No dude, your wife knew before you did and Amazon knew 5 seconds later when she began searches related to it.

              • by Xenx ( 2211586 )

                When I pay for a service, or buy something from a selling site, then see that data clearly get shoveled to advertisers, how is that "fair use" on their part. If I'm already paying, fuck off on the data collection.

                I don't disagree, but that data money is still part of the price of the goods/service you bought. I understand that sometimes the market wouldn't support adjustments, but most companies aren't going to give up that extra money for nothing.

                • by Shaitan ( 22585 )

                  "I don't disagree, but that data money is still part of the price of the goods/service you bought."

                  Exactly... the advertiser bought the data, the vendor paid the advertiser, the vendor factors that into the price of the product that I'm buying. At the end of the day it is ME paying for my data. Actually I'm not just paying for my data but also paying for the ad views that didn't result in a conversion as well.

                  • by Xenx ( 2211586 )

                    Exactly... the advertiser bought the data, the vendor paid the advertiser, the vendor factors that into the price of the product that I'm buying.

                    I don't deny there is a bit of recursion in the process. However, it likely isn't as straight forward or equal as you describe it.

                    At the end of the day it is ME paying for my data.

                    Technically, sure, but that is true of literally every business cost. There will be variances in spending, but at the end of the day the companies are still going to spend that money on advertisements. That means you're paying the same amount either way, but in the case of selling your data they still make more overall.

                    • by Shaitan ( 22585 )

                      Right, and since the data doesn't belong to them it should be a felony to retain, sell, or traffick in user data/meta data.

            • by Shaitan ( 22585 )

              "I understand that to some, the only right answer would be zero data collection/sales. However, people need to understand that means services would need to start charging for their use in some capacity. I don't have a good solution at hand"

              What do you mean? You just gave a perfectly good solution. But that doesn't automatically mean services would have to charge for us. TV and radio were free and ad supported without data collection. Just because people WANT to laser target their ads doesn't mean they would

              • by Xenx ( 2211586 )

                You just gave a perfectly good solution.

                A good solution for some. It is not my place, or yours, to be the sole arbiter of what is good for everyone. Maybe the people and/or law will eventually go that way. I think the average person doesn't care enough, and would rather save the money.

                The problem comes from transiting, sharing, and storing the data. Legally the site I'm interacting with [knowingly, not secretly via invisible pixels and cookie trails] should get to see the data from my interaction for their own use and that could even include serving ads but it remains my property and shouldn't be retained or shared with anyone else.

                Legally, you are incorrect. The data is about you, but it does not necessarily belong to you. There is a distinction. Obviously, legal jurisdiction and all that. I'm not arguing against a desire for it to work that way. I'm only correcting the statement as presented

                • by Shaitan ( 22585 )

                  "Legally, you are incorrect."

                  No, I'm not. By right my data belongs to me even when in the hands of a third party with which I conduct business such as a bank. As with anything which is innately true, by right, this is true regardless of any status or interpretation. If a system of law fails to recognize an innate right it is self-invalidating and pending remediation, someone who points this out is not incorrect.

                  "I'm not arguing against a desire for it to work that way. I'm only correcting the statement as p

          • by Shaitan ( 22585 )

            "You think you own your data on an open network? How very quaint of you. The second your data is accessible publicly it's owned by whoever is willing to throw money at it."

            Open network? I'm not on an open network. I'm using a private system, connected to a private network for which I pay a private party for access, through agreements that private party has with other private parties it transits my requests to systems controlled by another private party which I wish to access (slashdot for example). None of

      • Google has dictated the direction of web technologies for the last twenty years. The blame still lies with them.

    • by AvitarX ( 172628 )

      The more sophisticated you need to be to track, the bigger the advantage larger companies have.

  • Could websites not simply provide strong checksum to resources, as a way of identifying if a piece of content is the same? Then the caching is simply done with file checksum? I'm sure someone has thought about this, so not sure why this is not being used?

    • I think the problem there is timing attacks: so long as resources get cached across domains itâ(TM)s possible(how practical and how precise would vary according to fiddly details of exactly how common or distinctive various cached assets are) for a site you visit to draw inferences on where you have been previously by referring to a potentially cached resource and seeing whether itâ(TM)s available essentially instantly or whether thereâ(TM)s a delay that suggests your browser needs to grab a
      • by dgatwood ( 11270 )

        I think the problem there is timing attacks: so long as resources get cached across domains itâ(TM)s possible(how practical and how precise would vary according to fiddly details of exactly how common or distinctive various cached assets are) for a site you visit to draw inferences on where you have been previously by referring to a potentially cached resource and seeing whether itâ(TM)s available essentially instantly or whether thereâ(TM)s a delay that suggests your browser needs to grab a copy.

        It tells them that you have been to a site that uses a particularly resource, but not what site. You could just as easily have previously visited a site that uses the above approach to guess what sites you have been to.

        So at best, it violates your privacy exactly once, and because the user might have hit such a browser profiling site previously in incognito mode, you can't be certain even once unless your list of resources changes rapidly.

        Given a choice, unless I'm missing something, I think I'd rather hav

        • by Burdell ( 228580 )

          Sites often load dozens of potentially-shared resources like JS, icons, fonts, JSON data used by the JS, etc. All a CDN has to do is make sites use different combinations - for example site 1 uses JS-A, JS-B, icon-X, and icon-Y. Site 2 uses JS-A, JS-B, icon-X, and icon-Z (where icon-Y and icon-Z are really the same file). Now the CDN can tell when you visit both sites, because your browser would load everything for a visit to site 1, and only icon-Z for a visit to size 2.

          • by dgatwood ( 11270 )

            Sites often load dozens of potentially-shared resources like JS, icons, fonts, JSON data used by the JS, etc. All a CDN has to do is make sites use different combinations - for example site 1 uses JS-A, JS-B, icon-X, and icon-Y. Site 2 uses JS-A, JS-B, icon-X, and icon-Z (where icon-Y and icon-Z are really the same file). Now the CDN can tell when you visit both sites, because your browser would load everything for a visit to site 1, and only icon-Z for a visit to size 2.

            The CDN already likely knows when I visit both sites, because they're serving the main page of both sites. This is about as scary to me as "Your computer is sending out an IP address" ads.

  • by ahziem ( 661857 ) on Tuesday January 14, 2025 @03:32PM (#65088725) Homepage
    This is why we can't have nice things
    • Because we insist on all the things being "free?"
      • No, because advertisers are not content with just advertising (like in the old days of print ads) and insist on invading our privacy.

        • Advertising (like protesting) has always been disruptive and annoying. That's the point. The most obnoxious thing about any reasonably urban area is all of the visual pollution that is advertising.
        • The Better Ads Standards [betterads.org], published by the Coalition for Better Ads, forbid use on the web of a lot of techniques routinely used to boost visibility of advertisements in old media. Compared to radio and television, the Standards ban interstitials that force the viewer of a text article to wait several seconds until the sponsor has finished delivering a message. You may recall some publishers doing a "pivot to video" in 2017 or so, publishing fewer text articles and more videos. This was an attempt to recapt

      • by ahziem ( 661857 )
        I'm not sure what you meant by free. The article mentioned increased costs, but I didn't see anything about a new charge for something that was previously free. I meant the consequence of bad actors abusing the cache information is that web visitors have longer page load times, bandwidth for users (like mobile data) and providers increases, the benefits of shared assets on CDNs is reduced, and web developers need new strategies to improve page load times.
  • The storage argument "forces websites to maintain separate copies of shared resources" per domain is a red herring (BS). The websites already have separate directory trees (in CDN). Any shared resources can continue to be shared: just use symbolic links in website tree(s) to the shared resource(s) in CDN. Don't they do this already?
    • by KlomDark ( 6370 )
      No, the cache is now tied to the domain name of the parent site. (The site you went to in the first place, not the CDN) They aren't shared anymore.
      • by srg33 ( 1095679 )
        I know that (tied to parent site / domain name). That means copies in the client browser cache. But, the websites only need one file that they can point to . . .
        • by tepples ( 727027 )

          To protect the viewer from fingerprinting, the browser downloads a separate copy of the file for each website that points to it.

  • Because FluffyBunny.jpg is obviously the same image on MisterRogersNeigborhood.com and PornHub.com.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Actually had something like this happen on the server side with our CDN provider. A miscellaneous image lookup returned a cached image of a nude model because of some key collision with another tenant at the CDN. I asked them to fix the problem and to their credit they did fix the keying technique within minutes. And credited our account.

  • by Fly Swatter ( 30498 ) on Tuesday January 14, 2025 @04:08PM (#65088845) Homepage
    No trust, no shared cache. It's really quite simple. Third party content can't be trusted.
    • Remember when software installed on local computers, relied on many shared components that were separately installed? You know, like image processing libraries and such? It was a mess. Install an update to one component, and who knows how many of your software titles broke. Nowadays, languages like .NET package everything together in one set of folders. No sharing, even if you are using the same component as other software. It keeps software from interfering with other software.

      Now the internet is finding o

      • by tepples ( 727027 )

        With the movement toward self-contained applications, we end up having each application's install folder becoming the size of an operating system.

  • Hey website, tell me the hash value of the document at this URL.
    If I find the hash in my local cache, and it hasn't expired.
    ....Use its static content (but it gets its own private data/cookie store).
    Otherwise
    ....Hey website, give me the document at this URL.

    This strategy will reduce requests of identical documents and scripts across websites, so long as the hash function is cryptographically secure against collisions.

    • by ffkom ( 3519199 )
      You miss that the second (third, fourth...) access is attempted by malicious web sites that only want to probe whether they get the response fast (from the cache) or slow (from the net), in order to figure out what other web-pages you visited earlier. Given the vast amount of foreign content many web-sites use, it is easy to identify them purely from what they populated your cache with.
    • by Mal-2 ( 675116 )

      There is no hash function that is secure against collisions unless the hash is longer than the data itself. It's the pigeonhole principle: [wikipedia.org] if n items are put into m containers, with n > m, then at least one container must contain more than one item.

      • There is no hash function that is secure against collisions unless the hash is longer than the data itself.

        Information security are less concerned with perfect security against collisions than with probabilistic security. If the number of possible hashes exceeds 10^82, the number of atoms in the observable universe, collisions become vanishingly unlikely. The number of distinct SHA-256 hash values is close to that: on the order of 10^77. If you can find a collision, you will probably become rich and famous.

        Timing attacks are the problem.

  • OK (Score:4, Insightful)

    by bill_mcgonigle ( 4333 ) * on Tuesday January 14, 2025 @05:45PM (#65089109) Homepage Journal

    4% for significantly enhanced privacy?

    Sold.

    • Pff... I get much, MUCH better gains by using the usual ad blockers and script blockers.

      Even after all these years, I'm still shocked at how much web traffic is saved. Holy cow.

  • by ffkom ( 3519199 ) on Tuesday January 14, 2025 @05:50PM (#65089127)
    ... is a very small price to pay for better privacy. Even bigger privacy gains are possible, of course, by not using any Google services at all.
  • They already have a way around it to continue tracking you If the product you are using is free, you aren't the customer

Five is a sufficiently close approximation to infinity. -- Robert Firth "One, two, five." -- Monty Python and the Holy Grail

Working...