How Google Trends & News Pollute the Web 101
Danny Sullivan's hard-hitting piece at Search Engine Land calls on Google to quit being evil in one particular way: collaborating with sleazy websites that jump on Google Trends to grab advertising revenue, as Google itself rakes it in. "Google's CEO Eric Schmidt has quite famously been on record many times talking about how the Web is full of garbage. It's a cesspool out there, he's said. Today, a short fast look at how his own company pollutes the Web. ... That [example of an off-topic, trend-following] page isn't adding any value to the web. If it didn't exist, we wouldn't be the less savvy... But thanks to Google Trends, we've got a big red flag up in front of publishers that wish to pollute Google's results with this type of garbage. ... On the one hand, I love Google Trends. It's fun seeing what the top terms are that are sparking interest... On the other hand, it's clear how much [garbage] Google has caused to be generated, simply by publishing the trends. But that garbage wouldn't happen, if it didn't know it was going to be rewarded. It is, both with traffic from Google and from revenue from Google for those carrying its ads."
hard hitting? (Score:5, Insightful)
What the hell is this guy's point? Bing could release a "trends" the same as google, yet everyone is acting like google is god.
If anything, a blog post on a site called search engine land, which is all about SEO, hating on google, sounds like a competitor disliking their own competitor.
Except... (Score:1, Insightful)
And the solution is? (Score:5, Insightful)
So should Google shut down Google Trends? Block it from their ad customers? Somehow force them to ignore it? What the hell does he expect/want/think how in a perfect world this would work?
There's no point to this article. It's claiming an evil conspiracy just because Google Trends exists.
Tools (Score:5, Insightful)
Re:hard hitting? (Score:5, Insightful)
I don't think he even understands how the ads work.
All you have to understand is that
The author of the article is complaining that Google encourages poor behavior and then turns a dime on it through whatever ads end up being hosted at the websites that don't produce any actual content. You can claim they don't know this is happening or they don't care or they are laughing all the way to the bank. Either way the author appears to be correct in his analysis although you cannot be certain that Trends is where the crap websites find which terms are hot. Other sites could possibly measure this but would require a lot of indexing and resources to do so. So it's most likely Google Trends.
Re:Chocomize! (Score:5, Insightful)
I'm sorry, but lets take a step back here ...
This sounds like a glitch in the search algorithm than anything else. Publishing trends is interesting, and can allow us to learn more about what we (as a species) do with the internet. This information is clearly abused by a few (who then go out and write fake page which use the popular keywords to attract attention to their page), but this is an abuse of the Trends information that google provides, not something inherently evil.
Google (or any search engine) could just tweak their results to reduce the importance of sites which are written *after* a topic became trendy. At least to give the existing articles a head start. Or I can imagine a million other ways in which they could tweak the algorithm.
But I don't think what the article is implying (that google should stop publishing Trends) should be taken seriously.
So Google is bad for being transparent? (Score:5, Insightful)
Re:Chocomize! (Score:5, Insightful)
Is it Chocomise in the UK, just out of interest?
Re:Chocomize! (Score:2, Insightful)
Google (or any search engine) could just tweak their results to reduce the importance of sites which are written *after* a topic became trendy.
Yea, that might not work so well for developing news stories. Yea, a CNN puff piece on Chocomize really only needs the one article that started it, but a trend like a political election, the latest news is significantly more relevant than the first.
Re:So Google is bad for being transparent? (Score:2, Insightful)
The part I find most irritating is that Google also profits from the actions of the abusers (because the abusers are using Google advertising).
Re:Who cares? (Score:2, Insightful)
Yes, but that will most likely be because almost all "popular music or terms" are complete unadulterated feces.
Idea and media makes profit (Score:2, Insightful)
These guys got lucky and hope to keep going with their chocolate idea. The only thing is that they need to keep their idea going. By being near the top of Google's search list, they will make money until it wavers. The CNN news story is the ground breaking story, now they would need to advertise on Television and maybe make an appearance on a show for a few minutes to make a huge profit for the company to survive on.
Not just trends (Score:5, Insightful)
Why would the spammers only copy trending topics? Why not just screen scrape everything from cnn.com and add ads? They do.
It just looks like they are only targeting trends because Google picks up on that stuff and aggregates it when it is a hot topic, so you see more of it.
Spammers don't need the trends, they are screen scraping everything, or just the headlines. This has been going on forever, long before "trends" existed. There are just more of them, and they are getting better at making their spam farms and increasing their page-rank, such that their screen scraped content is actually beating the site they copied from in the results.
Sadly it's only going to get worse, as it's too easy for even a single person to create many terabytes of auto-generated spam. Multiply that by the thousands of spammers doing it every minute.
Advertisers have turned the net into slime* (Score:2, Insightful)
What else is new? Try to find drivers and service manuals... Virtually all the results are spam sites.. I got better returns 20 years ago when Compuserve was king.
*Kinda reminds me of a nerdy news site that treats binspam as actual news on its front page. Eh... all part of the dumbing down process.
Can you actually replicate this article's issue? (Score:2, Insightful)
When I google for "Chocomize", my top three results are the source chocolate-making company - not spam. The fourth, the only thing remotely resembling pollution, is this searchengineland article itself.
Also, if this is an issue, I really don't think the right solution is to hide the information.
Re:Will only hurt google in the end (Score:3, Insightful)
Advertisers aren't stupid. Google ads are only worthwhile if they're actually generating revenue for the advertiser. Eventually, if they keep allowing this sort of practice, it's only going to drive down their own ad revenue (as advertisers realize they're not getting as much revenue from their ads as they once were).
If someone clicks on an advertisement then buys, does it really matter which spam site they arrived through? There's nothing that suggests they're getting less revenue; in fact, they may be getting more since the ads themselves will be relevant to what is searched for.
Re:Chocomize! (Score:4, Insightful)
Or modify their ranking algorithm to smack down these spammers. For example, just pick a few very unrelated trend keywords/phrases. Then find sites which are turning up for these set of unrelated keywords. After some sanity checks, rank the sites down.
And remember that xkcd coined word ( http://news.slashdot.org/article.pl?sid=10/05/13/183221 [slashdot.org] )? You can use stuff like that to find a whole bunch of sites to exclude.
The question is why can't Google fight it (Score:3, Insightful)
Let's say you're right. Now Google has an index for cnn.com, and an index for spamdomain.com. Presumably the timestamps on the cnn.com pages are a bit earlier since it takes time for spamdomain.com to scrape and republish the content, and then for Google to index the new content on spamdomain.com.
I'm no computer scientist but it seems that this is the sort of data mirroring that should be pretty easy to spot algorithmically. If two domains share >80% of the exact same content, de-emphasize the one with later timestamps.
The provocative theory is that Google doesn't care which site ranks first, as long as its ads are being served on both. Or worse, that Google allows the crap to float to the top if it is carrying Google ads, and cnn.com is not.
Is the theory right? Who knows besides Google? Perhaps it is not so easy for the algorithm to distinguish what to our minds is obvious spamming. And one of the things that Google is up-front about is that if they can't do it algorithmically, they're not interested in it.