Forgot your password?
typodupeerror
Google Businesses The Internet Spam

Google Warns About Search-Spammer Site Hacking 59

Posted by CmdrTaco
from the secure-your-borders dept.
Al writes "The head of Google's Web-spam-fighting team, Matt Cutts, warned last week that spammers are hacking more and more poorly secured websites in order to 'game' search-engine results. At a conference on information retrieval, held in Boston, Cutts also discussed how Google deals with the growing problem of search spam. 'I've talked to some spammers who have large databases of websites with security holes,' Cutts said. 'You definitely see more Web pages getting linked from hacked sites these days. The trend has been going on for at least a year or so, and I do believe we'll see more of this [...] As operating systems become more secure and users become savvier in protecting their home machines, I would expect the hacking to shift to poorly secured Web servers.' Garth Bruen, creator of the Knujon software that keeps track of reported search spam, added that some campaigns involve creating up to 10,000 unique domain names."
This discussion has been archived. No new comments can be posted.

Google Warns About Search-Spammer Site Hacking

Comments Filter:
  • by vintagepc (1388833) on Thursday July 30, 2009 @11:06AM (#28882501) Journal
    I don't know about you, but something else that REALLY annoys me is pages that contain lists of words just so they come up on many searches... with no actual content. Or sites like "Buy *search term* at low prices" and they don't even sell what you're looking for. What's being done about those?
    • by Krneki (1192201)
      You have an "X" near the search result, if you don't like it, report the stupid web site.
      • Re: (Score:3, Interesting)

        by Shakrai (717556)

        Does that actually "report" it or does it merely remove it from your search results?

        • by D-Cypell (446534) on Thursday July 30, 2009 @12:16PM (#28883433)

          While I don't know for absolute certain, I *strongly* suspect that that data is collected and operated on. Most of the big sites are about so called 'collective intelligence', or collecting information about person A so that you can have a better idea of what you want to be providing to person B. This goes into what links are cicked, at which times of the day, how long people spend on a site or page etc etc. To have a function that is so incredibly explicit as 'This is crap, don't show me it again', and to *not* use that to refine future page generations would be deeply stupid, and stupid is one thing the guys at google aint.

      • Re: (Score:3, Informative)

        by Yvan256 (722131)

        I don't see any "X" (or any other icons) with my search results.

      • That's not a "report" button, its a "customize my results for the future button" and it is really stupid. The elephant in the room is that Google is exploitable just like every other search engine. People are noticing the quality of their searches declining and there doesn't seem to be much Google can do or is willing to do. Most of the shitty sites that have no value are loaded with AdSense. Pretty much Google needs to start filtering results or they need to replace PageRank which is fundamentally the prob

        • by Jurily (900488)

          That's not a "report" button, its a "customize my results for the future button" and it is really stupid.

          Agreed. At least, I never found a use for it anyway. I just don't bother to filter my search results manually, and it's not my job anyway: if it gets too much, I'll give Bing a chance.

          People are noticing the quality of their searches declining and there doesn't seem to be much Google can do or is willing to do.

          That's because they index everything ("Results 1 - 10 of about 15,280,000,000 for a. (0.07 seconds)") and then they try to rank the crap lower. A much better option would be to create a new search space on top of this one containing only sites recommended by humans, and rank those up automatically, like they did with wikipedia

      • Re: (Score:3, Informative)

        by sabernet (751826)

        If that really worked, I wouldn't still see so many damn "experts-exchange" results since I'm sure I've 'x'ed at least 5 dozen of them.

    • Re: (Score:1, Interesting)

      by Anonymous Coward

      What's being done about those?

      Google is making money off of them. [google.com]

      I'm sorry, but you simply cannot offer a "service" like this and at the same time claim relevant search results are your top priority. These two things are inherently at odds with each other.

    • Re: (Score:3, Informative)

      CustomizeGoogle is a firefox plugin(which hasn't been updated for 3.5 yet) lets you ignore domains.

      I had a ton on there.

      http://www.fixya.com/ [fixya.com] seems to have risen up now that I'm searching on how to fix some lawn equipment I inherited.
      "Yard Machines fix belt" and it comes back with http://www.fixya.com/tags/yard_machines_deck_diagram_belt [fixya.com]

      Of course this is 100% useless.

      Those sites are fun to mess with friends. "Dude, did you know that there's an entire webpage on fixing your impotency?"

      • Re: (Score:2, Insightful)

        by ex0a (1199351)

        CustomizeGoogle is a firefox plugin(which hasn't been updated for 3.5 yet) lets you ignore domains.

        From the CustomizeGoogle page [mozilla.org] the reported version allowed is up to 3.6a1pre for anyone reading this not checking into the addon because of the parent. This addon is really handy.

    • by skeeto (1138903)
      If it's a Google search, you can report the site here [google.com], though I don't think they look at these reports very often.
  • by ParticleGirl (197721) <SlashdotParticleGirl AT gmail DOT com> on Thursday July 30, 2009 @11:16AM (#28882621) Journal

    I found this pretty interesting: "Authentication [across the Web] would be really nice," says Tunkelang. "The anonymity of the Internet, as valuable as it is, is also the source of many of these ills." Having to register an e-mail before you can comment on a blog is a step in this direction, he says, as is Twitter's recent addition of a "verified" label next to profiles it has authenticated."

    The idea of universal authentication [gnucitizen.org] has been tossed around for a while. I feel like the biggest drawback is privacy (we'd have to trust some universal authentication system to hold onto some identifier even if posting anonymously) and the biggest obstacle is the need for universal participation. It's kind of too late to make an opt-in system. But I've liked the idea ever since early sci-fi interwebs (read: Ender's Game) had SOME kind of authentication.

    • Re: (Score:3, Insightful)

      by truthsearch (249536)

      Authentication would of course help for properly secured web sites. But many sites have content injected nefariously. One common method is to break into shared hosting servers via ftp or ssh and place javascript or html at the bottom of every html file.

      • by Shakrai (717556)

        One common method is to break into shared hosting servers via ftp or ssh

        Slightly off topic, but I've noticed that in the last year or two that brute force ssh attempts seem to have become so common that they should be considered part of the regular internet background noise. My servers were regularly being probed from multiple IP addresses (most of them in China), sometimes reaching 5-10 ssh attempts per second. They'd go through whole dictionaries of possible usernames and keep trying to hit the root account as well.

        I wouldn't run ssh these days without disabling password l

        • I use denyhosts because I have the same problem. denyhosts watches for repeat failed attempts from the same IP and then blocks them. It's fully configurable (e.g. block after 5 failed attempts within one day, unblock an IP after 30 days, send email reports, etc.).

          • Over a year ago: Using denyhosts I black holed IP addresses after two attempts and the entire subnet after 10 total attempts in 12 hours. I eventually gave up and took the host off port forwarding because I didn't need to access it remotely any more.

            I was quickly heading to the point that all of Russia, China and the Koreas were going to be completely black-holed. Interestingly some areas of the US, especially the mid-west and central canada were getting fairly dark too.

            If I ever need to put that host ba

        • by Tony Hoyle (11698)

          Isn't hashlimit designed to limit bandwidth? I'd rather just drop the initial connection..

          -A public -p tcp -m tcp --dport 22 -m state --state NEW -m recent --set --name SSH
          -A public -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 300 --hitcount 5 --name SSH -j DROP
          -A public -p tcp -m tcp --dport 22 -j ACCEPT

          You should also be protecting DNS and ICMP in the same way of course.

          • by Shakrai (717556)

            Isn't hashlimit designed to limit bandwidth? I'd rather just drop the initial connection..

            Umm, no? Its designed to limit the number of times it will match. It's based on number of packets seen in a defined interval. AFAIK it doesn't have anything to do with bandwidth or datarate. In fact, I've never seen iptables directly used to limit bandwidth, although I have seen it used to classify packets that then get shaped by the Linux traffic shaper.

            I do like the rules that you use though.

        • by Lumpy (12016)

          I ran a script that watched the log files and added drop rules for ANYTHING that tried to ssh in using root.

          This was back in 2004-2005 and I was getting about 30-40 ip's banned a day from china and the former USSR. It's simply ramped up with a crapload of zombie machines out there doing it.

          My solution was to screw it all and use only VPN.

        • by skeeto (1138903)

          This is an area of interest to me.

          My home computer gets pinged with ssh password guessing attempts all day. Not quite as hard as you, but a guess every few seconds. Key-only logins are a bit too inconvenient for me right now, so I take other measures. I have root logins disabled so they have to guess a password and a username, and they've never even guessed a correct username so far. I also used DenyHosts to mitigate attacks by instantly blocking anyone trying root logins, and block anyone else after 3 wron

    • See OpenID: http://openid.net/ [openid.net]

      Decentralized universal authentication.

    • by skeeto (1138903)

      I hate it when I read an article or a blog, want to leave a comment, but its locked behind some registration mechanism. Then I just don't bother. I'm not going to go through a tedious registration process just to leave one comment. Sometimes it's not even obvious how to register (I'm looking at you Wordpress). I imagine this costs these websites a lot of traffic. See The $300 Million Button [uie.com].

      No, anonymous commenting is too important. Throw up a captcha or something that anonymous commenters have to fill out,

  • Confirmation (Score:5, Interesting)

    by Drakkenmensch (1255800) on Thursday July 30, 2009 @11:19AM (#28882673)

    Anyone who frequently uses google knows this already. Plug in any kind of search and you're bound to get a slew of crap results along the lines of:

    Download [term] full version

    Torrent [term] keygen

    Torrent [term] latest version

    Torrent [term] hacked no-cd

    You'll get those even when searching for books.

    • Re:Confirmation (Score:5, Informative)

      by IBBoard (1128019) on Thursday July 30, 2009 @11:25AM (#28882751) Homepage

      Except that that's not what the summary mentions. The summary is talking about people hacking websites to get more "good" links to their site, rather than having to rely on standard link farms that are then blacklisted. It's like comment spam, only with hacking of servers instead.

      • Re: (Score:1, Interesting)

        by Anonymous Coward

        I've had my webpages up for years, but hadn't actually added anything new for a while so hadn't felt the need to stop by my site and do maintenance. This spring, Google sent me an email warning me that they were taking my site off their search engine for spamming. (Though they did suggest it had probably been hacked.)

        It was horrible. My pages had indeed been hacked and had "invisible" links written all over them. Some of them actually had all their real content deleted in favor of what looked like nothingne

    • by Ihmhi (1206036)

      I was always a bit puzzled by "Coheed & Cambria Latest CD no-cd"...

  • Only a year now? (Score:1, Interesting)

    by Nick (109)
    Or perhaps he meant it's only been popular in the last year or so. I've seen this going on for the last three years at the least.
  • by spyrochaete (707033) <spyrochaete@@@hyppy...zapto...org> on Thursday July 30, 2009 @11:40AM (#28882947) Homepage Journal

    If your website's front page has a PageRank score of 3/10 or higher it is a prime candidate for hijacking. Google gives extra clout to hyperlinks from sites with a high PageRank (aka "link juice"), so it's easiest for a malicious party to hijack a small number of high-ranking sites than a large number of low-ranking sites. The higher your PageRank the greater your risk.

    • The funniest part of this is that Google itself seams to fund them and has the ability to stop this MFA sites, link fraud sites -- this is a connected issue, but for some (very obvious) reason keeps it quiet.
    • by Dullstar (1581331)

      My website is probably a PageRank 1 or something. Just to get it to appear in the results you have to put the name in quotes. However, I think that's just a problem with the name considering the results you do get, so I'm going to redesign the whole site and give it a new name.

      What you said about PageRank reminds me of the April Fool's joke they did once (PigeonRank).

  • I am assuming you can produce a list of candidate sites that may be benefitting from this by tracking for sudden rapid growth in links. From there you should be able to come up with an algorithm that looks at what the beneficiary site is about and what the linking sites are about. I would assume the hacked sites will have a random distribution of topics and sources- or a highly clustered distribution if a certain type of site is most often hacked. Regardless the distribution should be markedly different fro
    • Re: (Score:3, Insightful)

      That doesn't work, because you can't possibly determine whether they're legitimate links or not(if the linking is done properly). For example, how do you differentiate inbetween something that starts as a result of an independently reported news event(or a slashdotting...), or something that starts as the result of hacking? If you want to waste the cycles, you can start mapping the event to find it's potential point of origin to see if it's a news site or something, but it's still going to hurt the little g
  • If you look at the discussion for almost any stock, they are all stock scam span. Having seen Google catch most of my email spam and news groups are pretty clean so this is a bit surprising.
  • "and users become savvier in protecting their home machines"

    And when pigs fly...

  • by Animats (122034) on Thursday July 30, 2009 @12:38PM (#28883777) Homepage

    Google can't solve this problem because their business model requires web spam.

    Google is in the advertising business, not the search business. Search is a traffic builder for the ads. Google's customers are their advertisers, not their search users. They have to maximize ad revenue. The problem is that more than a third of Google's advertisers are web spammers, broadly defined. [sitetruth.net] All those "landing pages", typosquatters, spam blogs, and similar junk full of Google ads are revenue generators for Google. Every time someone clicks on an AdWords ad, Google makes money, no matter what slimeball is running the ad. Google can't crack down too hard, or their revenue will drop substantially. Google does have some standards, but they're low.

    Google went over to the dark side around 2006. In 2004 and 2005, Google sponsored the Web Spam Summit [technorati.com], devoted to killing off web spammers. From 2006, Google sponsored the Search Engine Strategies [searchengi...tegies.com] conference, where the "search engine optimization" people meet. That was a big switch in direction, and a sad one.

    As we demonstrate with SiteTruth [sitetruth.com], it's not that hard to get rid of most web spam if you're willing to be a hardass about requiring a legit business behind each commercial web site. Google can't afford to do that. It would hurt their bottom line.

    However, cleaning up web search results with browser plug-ins is a viable option. Stay tuned.

  • by maccallr (240314) on Thursday July 30, 2009 @01:06PM (#28884177) Homepage Journal

    I saw this in the wild a few weeks ago. I had a google email alert running for my bank, which pointed me to a page which was blog-like but when you looked closer it was completely auto-generated gibberish. They had built the whole thing based on a list of banks and insurance companies. As it was under envsci.rutgers.edu I guessed they had been compromised.

    I reported it to the webmaster and I see that it is gone (both from Google's index and the server). Not a word of thanks though. How long does that take...

    Maybe someone here will give me a medal instead?

  • by Anonymous Coward
    This is particularly bad at the .edu domains. It is shocking and inexplicable that the IT departments at these universities don't know what's going on with their own servers and in their own zone files. There are literally thousands of hijacked subdomains under valid .edu domains. How can the network administrators not know what's going on? Don't they check their logs? Don't they see the google referrers for this spammy content? Could they be responsible for it themselves, or maybe getting a payoff for look
  • IMHO, it's mostly script kiddies doing this "hacking". Over the years I have had a few sites developed and largely left to sit for posterity. Unfortunately, the ones that were running off-the-shelf packages such as phpNuke (CMS) or WordPress (blog) or phpBB (forum) have been hacked, or overrun by spammers, at least once. All of those packages had security flaws over the years... some worse than others.

    Yes, I should have keep them up to date, but, no I didn't and lot's of people don't.

    I want to keep the bl

"The greatest warriors are the ones who fight for peace." -- Holly Near

Working...