Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google The Internet

Google: 60% Of The Internet Is Duplicate (seroundtable.com) 79

Google says "60% of the internet is duplicate." Gary Illyes from Google posted this slide at the Google Search Central Live in Singapore the other day.
This discussion has been archived. No new comments can be posted.

Google: 60% Of The Internet Is Duplicate

Comments Filter:
  • So what does that leave?

    • by Anonymous Coward
      A slide which says "60% of the internet is duplicate" and nothing more, is completely meaningless and useless.

      Which, I guess, is why it is a story on ./
      • Slashdot is responsible for a big % of that. More, when we include slashdot duping itself.

      • There are a ton of sites that are literally duplicates in content, only the wrappers around are different. Ie, ask a question that is perhaps suited for one site, ie, stackoverflow, you will get the orginal site in the top 10 hits, but the other 9 out of 10 hits have exactly the same question and the same answer word for word. You got to get past the top ten before you find an alternative answer.

        • by Z00L00K ( 682162 )

          I don't worry too much about that and if you are smart you'd realize that a lot of information is duplicated and you'll have to dig deeper.

          The hard thing for users on the net is actually when the information you look for exists on only one single site.

    • by Z00L00K ( 682162 )

      Slashdot is now on second spot for me when I google 60% Of The Internet Is Duplicate [letmegooglethat.com].
      (Excluding paid inks)

      Recursion win?

  • by Required Snark ( 1702878 ) on Friday November 25, 2022 @05:18PM (#63079708)
    You can never have too many cats.
    • Re: (Score:1, Troll)

      by ayesnymous ( 3665205 )
      Cats are an invasive species. They kill billions of birds and other small animals every year.
      • Whereas humans are beneficial to all wildlife whereever they go, sans cats.

        • Humans might kill a small amount of animals, but unlike cats, humans don't have a natural instinct to kill EVERY SINGLE small animal it sees.
          • Indoor cats don't ravage wildlife, and barn cats are useful on farms for snake and mouse control. The problem is when there are outdoor neighborhood cats that perpetually breed unchecked. A well kept domestic cat will live a long healthy life without destroying local wildlife.

        • BTW your response is an example of whataboutism.
      • Not just that, but they make up 50% of the images on the internet. The other 60% is porn.
      • Good. F your birds and other small animals. The problem is they aren't killing enough humans, so the solution to both is obvious: we need to breed and release much larger, and more aggressive felines. Those will prey upon larger game, reducing humans and by extension global warming and allowing an explosion of the smaller critter populations.
      • I love how I triggered all the cat lovers. Both of the sentences that I posted are factual. And I get modded as "Troll".
  • Only 60%? (Score:5, Funny)

    by KnobbyMcKnobface ( 10233038 ) on Friday November 25, 2022 @05:23PM (#63079718)
    Really looking forward to reading this again when itâ(TM)s reposted tomorrow.
  • by fahrbot-bot ( 874524 ) on Friday November 25, 2022 @05:24PM (#63079722)

    ... in one, one-sentence post.

    TFT: "60% Of The Internet Is Duplicate"
    TFS: "60% Of The Internet Is Duplicate"
    TFA: "60% Of The Internet Is Duplicate" (Tweet *and* and photo)

    • Just wait until tomorrow, then your mind will truly be blown.

      • by aitikin ( 909209 )

        Just wait until tomorrow, then your mind will truly be blown.

        When 60% of the front page will be duplicate?

        • Just wait until tomorrow, then your mind will truly be blown.

          When 60% of the front page will be duplicate?

          We already have sites where 60% of the articles some days are about the latest serial fuck-ups of Elon Musk.

          Musk + Crypto Bros == George Carlin was right - 90% of all stuff is shit. Some days I think he was a bloody optimist.

          • by quonset ( 4839537 ) on Friday November 25, 2022 @06:55PM (#63079906)

            We already have sites where 60% of the articles some days are about the latest serial fuck-ups of Elon Musk.

            Except for here. Not a single story in the past 2 - 3 weeks on the shitshow which is Twitter after Musk blew $44 billion on it. For the record, it appears half of the biggest advertisers [cbsnews.com] have stopped advertising on the site since that pedo guy took over.

            • Yeah I'm a little surprised there hasn't been more coverage. Especially of the hard numbers on advertiser losses. That shit's whacked. Pretty soon Twitter's gonna be penis pills and BrainForce 24/7.
            • > half of the biggest advertisers [cbsnews.com] have stopped

              The linked [mediamatters.org]original listed article had a headline that says half, but a list of 49 advertisers of which 7 have " issued a statement or was publicly reported as stopping its ads". Your post is a lie based on a lie based on a lie.
            • by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Saturday November 26, 2022 @08:39AM (#63080966) Homepage Journal

              Except for here. Not a single story in the past 2 - 3 weeks on the shitshow which is Twitter after Musk blew $44 billion on it.

              The current owners of Slashdot are cryptocurrency shills, and Musk is a crypto hero (with all the sarcasm that entails.) Just based on what does and doesn't get posted here, and what sources they use, not to mention what words have been put into the word filter, I suspect they are also right wingers. They literally have blocked both "Nazi" and "reich" in the past, and the name of their company is still in the filter.

          • > George Carlin was right - 90% of all stuff is shit

            That's why we need personal AI running locally. The model can be trained to be precise and civilised unlike the web. It will filter the web and give you only the good parts without the bad parts.
            • > George Carlin was right - 90% of all stuff is shit That's why we need personal AI running locally. The model can be trained to be precise and civilised unlike the web. It will filter the web and give you only the good parts without the bad parts.

              I would hate that. You know the old saying - keep your friends close, and your enemies closer. I prefer to be aware of what the nutters are doing so when I encounter them in real life I can deal with them - usually with my variant of that saying - "keep your friends close, tell your enemies to fuck off."

        • When Beauhd doesn't realise msmash already posted this.

    • Not to mention that /.’s entire oeuvre is summaries of articles posted elsewhere. Oops; I just mentioned that.
  • by Anonymous Coward

    As fluff pieces go, this is one of the fluffier ones.

    But why post this here? Whitewashing the endless dupes because not only will these editors not edit, but they won't keep an eye on each other's postings either?

    Also, failure to discern "internet" and "world-wide web". Google likes to pretend they're identical, and facebook likes to pretend they, one website, comprise the entire useful internet (witness "internet.org free basics"), but that doesn't make it so.

    It just makes those parties all the more ar

  • Does posting this make you feel better about the constant duplicate posts here?

    • Does posting this make you feel better about the constant duplicate posts here?

      Actually, yes. Not even half of slashdot articles are dupes, and if it’s not low information third party scrapes of legitimate news, two days late, and duped, I demand my money back.

    • Slashdot is responsible for 71% of the duplication :/

    • A bit unrelated but recently both image and code generation models have been accused of duplicating copyrighted works. Apparently they learned that skill from humans. /s
  • SEO spam (Score:5, Insightful)

    by self-inflicted ( 6168820 ) on Friday November 25, 2022 @06:04PM (#63079802)
    Google has allowed a ton of blogspam built entirely with scraped content to rise to the top of search results, I'm sure that's a decent chunk of it.

    --
    We will soon have the option to harvest our farts, so we can post & comment on stats about them.
  • by Big Hairy Gorilla ( 9839972 ) on Friday November 25, 2022 @06:19PM (#63079826)
    Just lookup phone install instructions. There are exact dupe pages where only the address and graphics are different. .. who copied who? And these guys reached the holy Grail of SEO. Fooling the algorithm into filling the top 10 results with their dupes and slurping up those delicious advertising dollars.

    And what about getting AI generated pages that match search criteria but are pages of endless jibberish... nonsensical but syntactically correct?
    • Just lookup phone install instructions. There are exact dupe pages where only the address and graphics are different. .. who copied who? And these guys reached the holy Grail of SEO. Fooling the algorithm into filling the top 10 results with their dupes and slurping up those delicious advertising dollars. And what about getting AI generated pages that match search criteria but are pages of endless jibberish... nonsensical but syntactically correct?

      If you don't buy, then all you're doing is causing them to waste money. And after a while, it's obvious what results are spam without even clicking on them. Unfortunately, it's obvious that plenty of people are still stupid enough to click on them, or Google would de-prioritize them and run stuff that DOES get action.

    • Gibberish is so 2019. Nowadays SEO can have well written pieces. Tests show people can predict when an AI wrote the text about 55% of the time, this used to be 75% before 2019. So it's a coin toss.
      Figure 3.13 : https://arxiv.org/pdf/2005.141... [arxiv.org]
  • by bjoast ( 1310293 ) on Friday November 25, 2022 @06:29PM (#63079852)
    And 90% of it is caused by Slashdot.
  • by ArchieBunker ( 132337 ) on Friday November 25, 2022 @06:41PM (#63079870)

    60% of google’s search results are spam or link farms. It really pisses me off when they return a link that’s just my terms but fed into Amazon or some other retailer.

    • 60% of google’s search results are spam or link farms. It really pisses me off when they return a link that’s just my terms but fed into Amazon or some other retailer.

      Remember the ebay search results for "white slaves"? [marketwatch.com] I warned the boss back then NOT to buy results for any and all terms really cheap, and showed him what happened with our ad feed when people typed in "buy white slaves." People ended up seeing "Buy white slaves now!" "White slaves for sale!" "Best deals on white slaves."

      But nope, didn't put in filters, cost him the $800/day ebay account.

      Google and ebay still do stupid shit - try searching google for "buy used tampon". ebay has 5,300 results.

  • Or, as the old saying goes: Every repost is a repost of a repost.

  • Why would this be seen as a problem?

    If website X disappears and the content is still available on website Y that's a good thing and the next guy looking for that content will be happy to find it there.

  • Fix it (Score:4, Funny)

    by jargonburn ( 1950578 ) on Friday November 25, 2022 @06:53PM (#63079896)
    That's very wasteful. We need a better way to keep track of all the content on the internet.
    I know, let's put the internet on the Blockchain!
  • by Flexagon ( 740643 ) on Friday November 25, 2022 @07:22PM (#63079974)

    So, given archive.org's charter, they should at some point account for ~50% of the internet just from the Wayback machine, and growing well beyond that as it archives old versions of no longer active pages (considering that many of its snapshots aren't that much different than earlier ones). Add to that its quasi-library function that most likely includes large overlaps with things like Youtube videos, and this hardly seems surprising. I'm actually surprised that the duplication isn't a lot more than 60%.

    And I certainly wouldn't call this wasteful.

  • Really, the idea that 60% of the internet is redundant is just rubbish.
  • by Big Hairy Gorilla ( 9839972 ) on Friday November 25, 2022 @09:07PM (#63080168)
    I just looked it up again and it's quite interesting. Most of the editorializing on the Dead Internet Theory make it sound like a conspiracy theory ... and I can see how it could be hijacked by a lot of interests... but I definitely think is more to it than quackery. The people writing about it have a vested interest in you believing in the veracity of their opinions. However:

    I think there's a new thing going on. Lookup something absurd like
    "eating rocks during christmas". This was the fist hit on my list
    https://www.kidsacookin.org/the-dangers-of-eating-rocks/

    There are some great lines in there. Like
      Eating rocks can have some negative consequences
    and
    We seem to have a tendency in the United States to place blame on others rather than ourselves. Due to this, there has never been an ice cream man in My Town for quite some time. I don&rsquo;t think it&rsquo;s a good idea to give a kid a baby oil bottle. If we want to keep babies safe, we might want to tattoo warning labels on them as soon as they leave the hospital.

    Seriously?

    Is the whole website is prebuilt using gpt-3? but nonetheless, it's not really a website. It's an advertising trap, built with pure nonsense put together by an AI/program but targeted to capture search edge cases. Absurd searches have been monetized. Clever to fool google and bing.
    • The only solution to an AI onslaught of spam is an AI filter running locally and being under user control 100%. You ask a question, it uses the web search engine to get info, then read it and give a direct answer. You don't need to see the ads on Google or even the original website most of the time. Your AI agent will be polite and helpful, and most importantly private and loyal.
  • AMP and regular HTML pages for the same name.

  • The vast majority of the English language consists of variations of the same 26 letters.
  • 10% must be a triplicate.

  • No seriously, just Google it.

    https://www.google.com/search?... [google.com]

  • They are obviously detecting duplicates because they are also scanning the Internet ArchiveðYS
  • What's driving me mad recently are the amount of blogs that seem to be nothing more than a straight copy and paste of the official docs. I get that these people get paid for eyeballs but trying to pad out sites like this just wastes everyone's time.
  • WP is basically a huge cut and paste from other websites, generally without permission. No wonder Google likes it so much.

  • I'm pretty sure that 100% of all alphabetical characters posted on Web pages are duplicates of the same character appearing elsewhere.
  • ... but isn't that the point? Isn't the internet designed with failures and some amounts of instability in mind? And, for a system to be fault tolerant, shouldn't there be a good amount of duplication? Like, let's assume 39% of the internet is original. A perfect duplication of that would be the other 39% of the internet. That's 78%. And since having just 2 copies of something can still be fairly risky, the remaining 22% is a third copy of some fraction of the original. That's not too bad. I mean, can you i
  • Even this article is a duplicate

  • A million sites duplicating wikipedia content is trash design to draw a few clicks. Basically half the internet is trash. Maybe stop interacting with trash and people will stop making so much of it.

  • Yes, some stuff is stolen. There are a ton of very good, perfectly legal reasons to copy information, not just backup for data retention. How much of it is people de-sliding stuff, removing junk and keeping the good stuff? Or news articles that are licensed/sold to multiple 'news services' that may or may not be merely re-skinned local versions?

    When Crypto drops like a lead stone do not be surprised if every single stock related website shows a YTD graph with them down 40%.

  • Are 60% Google results duplicate?

  • I can't imagine an internet where anything was only available on a single site.
    retweet, quote, excerpts, curated content.
    my gut tells me 60% is a low figure.

Keep up the good work! But please don't ask me to help.

Working...