Forgot your password?
typodupeerror
Google Businesses The Internet

Google De-indexes Talk.Origins, Won't Say Why UPDATED 575

Posted by kdawson
from the honest-webmasters-go-fish dept.
J. J. Ramsey writes "Talk.Origins is an archive with thousands of pages exposing creationist pseudoscience. Rather mysteriously, Google pulled the plug on its search engine, giving only the vague reason: 'No pages from your site are currently included in Google's index due to violations of the webmaster guidelines.' This was apparently triggered by a recent cracking of the site that added 'hidden links to non-topical sites,' but Google won't say just what the violations were. Talk.Origins webmaster Wesley R. Elsberry believes that this Google policy harms honest webmasters." From the article: "My mission, whether I liked it or not, was to find and fix whatever problem the [Talk.Origins Archive] might have, with no guidance as to what the problem was and nothing at all about where to start looking... I was extremely lucky. The damage to my site was limited and in the first place that I happened to look. Other honest webmasters might not be so lucky. They may have to undertake an arduous process of vetting pages, essentially having to second-guess the mind of the cracker in trying to locate a problem that Google knows the exact location of." Thanks to an alert reader who sent in Matt's blog posting about how Google handles hacked sites.
This discussion has been archived. No new comments can be posted.

Google De-indexes Talk.Origins, Won't Say Why UPDATED

Comments Filter:
  • huh? (Score:3, Funny)

    by Average_Joe_Sixpack (534373) on Monday December 04, 2006 @12:25AM (#17095420)
    What's this? [google.com]
    • by anagama (611277)
      Well, that's a specific search of usenet. Search the web for "talkorigins.org" -- you'd expect it to be the first link. Rather, there are some references to it in other sites but no actual link to the site itself.
    • Re:huh? (Score:5, Informative)

      by Daniel Dvorkin (106857) * on Monday December 04, 2006 @12:37AM (#17095512) Homepage Journal
      That's the Google Groups archive of the talk.origins newsgroup, which is a different animal (an ancestral form, one might say) from the Talk.Origins Archive web site. It was the site that was delisted. [talkorigins.org]

      And indeed, as of right now (10:35 PM CST) a Google search for "talk.origins" doesn't show any links at all to the Talk.Origins Archive. In fact, the first link that comes up is to a young-Earth creationist site which claims to offer "intellectually honest responses to the claims of evolutionism's proponents, including--but not limited to--the 'Talk.Origins' newsgroup and the 'Talk.Origins Archive' website."

      Conclusions about species competing in crowded niches are left as an exercise to the reader.
    • Sort of. I believe he is talking about the Google search engine, not google groups which was Dejanews.
  • Hmm (Score:3, Insightful)

    by Herkum01 (592704) on Monday December 04, 2006 @12:26AM (#17095424)

    While, I have some sympathy for the guy, just because you think your an honest webmaster does not mean that Google should have to vet you and your content. They have a business to run too. At some point a webmaster has to put themselves in a position to recognize and address these sorts of problems BEFORE Google gets involved.

    • Re: (Score:3, Interesting)

      by arun_s (877518)
      Well, whatever it is, I hope things get fixed soon. In my fairly frequent science/evolution debates in my company's intranet forum, talkorigins is invariably what I link to after the JREF [randi.org]. The site is mind-bogglingly comprehensive, and I enjoy reading the post of the month section (even though a lot of the more detailed debates go well over my head).
      Its sad to see a great resource like that hacked and delisted; I wish them a speedy recovery.
  • by BorgCopyeditor (590345) on Monday December 04, 2006 @12:35AM (#17095502)
    The writeup sucks. It implies that Google is censoring Usenet.
  • Backups? (Score:3, Insightful)

    by TubeSteak (669689) on Monday December 04, 2006 @12:38AM (#17095518) Journal
    You'd think they'd keep regular "Last Known Good" backups and just be able to do a simple diff between the current page & their backup.

    Or even just MD5 sums of all their pages, once a day, with known updates marked as such.

    There should be no reason anyone has to even contemplate manually digging through thousands of pages if they've prepared sufficiently beforehand.

    Maybe they'll take some very simple & no-cost precautions now that they've been burned.
  • by MDMurphy (208495) on Monday December 04, 2006 @12:48AM (#17095590)
    So many people refer to Google as if it were a human looking at web sites and giving it the big thumbs up or down. As part of the indexing if the spider finds "violations" such as presenting a different page to spiders than to humans, it risks being dropped from the index. To expect a human response to why each site triggered the de-indexing is not reasonable.

    In the webmaster's whining about Google, he complains about the request to be re-indexed containing:

                        *I believe this site has violated Googles quality guidelines in the past.

                        * This site no longer violates Googles quality guidelines.

    He thinks these are "an admission of guilt", but they dont' say "I violated" they say "the site violated". So, if the site were hacked and did violate their indexing policy, fix it, say you've fixed it and move on. How many hits has he had over the years that came directly from Google? And did they come from Google due to all those people choosing Google to search for his site or it's topics? But now he whines about being delisted for the time it takes him to fix a site he should have kept unhacked in the first place.

    • by identity0 (77976) on Monday December 04, 2006 @03:27AM (#17096430) Journal
      Heresy! Google sees all, Google knows all! Google is a man, a spider, and the holy page in one!

      Brin 3:14 "And Google so loved the internet, that he sent his only-born son Larry Page to it so that any who believe in him shall not perish but have ever-lasting life in the Googleplex."

      So you see, there *is* a person, Larry Page, who is also the spider that indexes everything and is also the page that serves up results. Only through this holy trinity could results as good as Google's result, thus proving Google's divinity. If the almighty Google has delisted this sinner's page, then we should not be looking at it in the first place, yes? To go against the wishes of Google brings hellfire!
  • Synopsis (Score:5, Insightful)

    by operagost (62405) on Monday December 04, 2006 @12:59AM (#17095670) Homepage Journal

    "Talk.Origins is an archive with thousands of pages exposing creationist pseudoscience"
    This article is a submission containing a biased summary which has little to do with the actual topic, which is the enigmatic status of Google's search algorithms.
  • by Nevyn (5505) * on Monday December 04, 2006 @01:05AM (#17095712) Homepage Journal
    They may have to undertake an arduous process of vetting pages, essentially having to second-guess the mind of the cracker in trying to locate a problem that Google knows the exact location of.

    Bzzt. The website admin needs to locate one or more problems (== however many the cracker planted), and Google knows the exact location of at least one. "one or more" >= "at least one". If google tells people where their problems are, google will be playing whack a mole for eternity. There are contractors/services that should be able to help them/anyone, google is not one of them.

  • by derubergeek (594673) on Monday December 04, 2006 @01:23AM (#17095822) Homepage Journal
    This was quite obviously the work of the Flying Spaghetti Monster.
  • by martin-boundary (547041) on Monday December 04, 2006 @01:27AM (#17095844)
    While it's natural to sympathise with the victimized website, it doesn't follow that Google is doing something Evil(TM) in this instance, rather it's most likely that their current algorithms are badly tuned.

    With the index sizes that are being collected by search engines these days (on the order of 10 billion entries), it's completely naive to think that some humans are sitting at a terminal choosing to delist websites for some policy reason or other. It's also completely naive to think that a human email monkey can do any sort of digging to find out the exact reason that Google's automated algorithm has censored this particular site.

    Instead, Google's engineers have automated algorithms which do all the censorship, and the policy is just there as a thin cover for whatever the algorithm happens to be doing today. It's worse of course, because 1) algorithms change every few months and 2) there's simply no comprehensive way to test the quality of the implementation.

    Anyone who's programmed a nontrivial algorithm knows that obscure edge cases are a bitch, and with 10 billion websites, any algorithm will have plenty of obscure edge cases which nobody has ever tested, nor ever will. The most likely explanation is that the website in TFA is a false positive of some subsystem, but fixing it will require changes to the algorithms, and Google don't want to risk that, would you? The problem will probably go away in a few months when the algorithms are scheduled to be updated.

  • by RockoW (883785) on Monday December 04, 2006 @02:32AM (#17096168) Homepage
    Google have a set of http://www.google.com/webmasters/tools/ [google.com] tools for webmasters. essencially it give out every diagnostic needed to fix your site for Google. Additionaly you have statistics for searches and how GoogleBot see your site. So, you shouldn't blame until you googled for the answer! Searching for "Google index tool" shows up "Google Webmaster Central"...
    • by DrXym (126579) on Monday December 04, 2006 @09:03AM (#17097958)
      Google certainly has some useful tools, but when they don't work you are screwed. I have a site which I won't name which is not indexed by google and I have absolutely no idea why. I've submitted the url, built a sitemap using their own tools, validated it and even submitted the site for relisting. It still isn't there. What have I done wrong? The tools say everything is fine except it isn't. I could go to the web forum but other postings suggest the employees will likely just tell me wait for indexing. Except its not indexing me.

      The sick thing is that I have Google Adwords on that site so each day that Google don't list me, THEY are losing money. I estimate I get 10x the click through business from MSN search than I do from Google. I'd probably make 3x the profit (as would Google) if they'd index.

  • by Mouth of Sauron (196971) on Monday December 04, 2006 @03:07AM (#17096334)
    The site www.talkorigin.org is not the only site to have been de-indexed by Google.


    This is a google cache of talkorgins.org [72.14.203.104] showing the porn spam links.


    However, I checked on deepx.com [deepx.com] and it is *not* a porn site.


    From DeepX.com's about page:


    XML provides an open and flexible language for the creation, management and exchange of electronic content. Founded in 2000, deepX has an experienced team of consultants and developers, who specialise in the design and development of solutions using XML and the emerging technologies related to XML.


    Also, another link shows www.theoi.com [theoi.com] and it is *not* a porn site, either:


    Here's how THEOI used to look via the Wayback machine. [archive.org]


    Theoi.com has been banned by Google (no reason given) and forced to close down as a result. There are no plans to re-establish this site in the future.


    wu.edu.gh is Valley View University is a Seventh Day Adventist college in Ghana.


    Both deepx.com and wu.edu.gh redirect to porn sites.


    Unsurprisingly, wu.edu.gh, theoi.com and deepx.com have been de-indexed by google.


    I speculate that all these sites that have been de-indexed were tagged by automated processes.

  • No Free Consulting (Score:4, Insightful)

    by rossz (67331) <`ogre' `at' `geekbiker.net'> on Monday December 04, 2006 @03:20AM (#17096390) Homepage Journal
    Basically, this "so called" webmaster wanted free consulting from Google. I don't think so. My personal response would have been, "I'll be happy to supply you with the information you request. It will, however, cost you my standard consulting rate of $xx/hour, two hour minimum."

    Only friends and family get free computer help from me, but I'm rethinking that policy since I spent half a day cleaning the malware off my brother's computer during the last family holiday. He probably won't ask me to do it again, though. When he asked how his system got so infected, I answered (in front of the entire family), "You got infected from all those lesbian porn sites you've been visiting."

  • by Evets (629327) on Monday December 04, 2006 @05:44AM (#17097060) Homepage Journal
    Have a peek over at the forums at WebmasterWorld, DigitalPoint, SearchEngineWatch, or any number of other webmaster related sites. This happens all the time. It is an issue that webmasters have had to deal with for some time now. Google at least provides some input for you if you can be bothered to register a sitemap with them.

    Google has several billion pages in their index, and a significant portion of them are spam. Their business model relies on them having internal methods of dealing with web spam and it is not feasible or desirable for them to produce a list of violations to each and every person who runs afoul of their algorithms.

    This is far from the most popular or important site this has happened to. Wordpress was delisted, as was BMW, Syndic8, and many others. This guy is using the controversial nature of his subject matter in an attempt to draw more attention. Get in line buddy, there is a long list of people whining all over the web about the same thing. Are you more important because the word Christianity is loosely affiliated with your site? Nope.

    Do a little googling yourself and you can pretty easily figure out how to resolve the problem. It takes some time, and there are ways to accelerate the process. If you are that reliant on Google, it is time to start participating in some webmaster communities and figure out how to play ball with the Search Engines. Just like everybody else.
  • by GoogleGuy (754053) * on Monday December 04, 2006 @06:44AM (#17097318) Homepage
    If you dig deeper, it turns out that Google emailed talkorigins.org to alert the site that it had been hacked and was stuffed with rape and animal porn spam. Google's head of webspam has posted a full write-up [mattcutts.com].
  • evolution (Score:3, Funny)

    by dwater (72834) on Monday December 04, 2006 @08:26AM (#17097732)
    So, if evolutionary theory is correct, it seems to have favoured a line of cry babies. There's evidence against, if ever there was any.

    I suppose he could be a mutant....and his predecessors are all non-cry babies.

The person who's taking you to lunch has no intention of paying.

Working...