Google Outlines the Role of Its Human Evaluators 62
An anonymous reader writes "For many years, Google, on its Explanation of Our Search Results page, claimed that 'a site's ranking in Google's search results is automatically determined by computer algorithms using thousands of factors to calculate a page's relevance to a given query.' Then in May of 2007, that statement changed: 'A site's ranking in Google's search results relies heavily on computer algorithms using thousands of factors to calculate a page's relevance to a given query.' What happened? Google's core search team explain."
Summary, missing from TFS (Score:5, Informative)
Because the summary wasn't kind enough to give you the answer to the question, here it is.
Human evaluators (mostly college students) are trained in the art of validating a search engine result. They examine the results of their searches, and determine which ones are the most highly relevant. For example, searching for the Olympics should yield information about the 2008 Olympics (or any current one) instead of the 1996 Olympics. The reviewers frequently work on the same query results, that way they can see how consistently the reviewers are rating websites.
The vast upshot of this, is that it helps weed out those websites that are cheating the system, and trying to get their website as the #1 google hit, so they can show you ads. So the large part of what they are doing is tracking spam websites, not real ones.
Re:Google is PEOPLE (Score:4, Informative)
In reality this is why search engines like Wolfram Alpha without the broad research and knowledge of Google in the industry don't stand much of a chance unless Google drops the ball.
Yeah - but before Google was people, Yahoo was people. Google gets an advantage based on what they're doing. But it doesn't make them invulnerable. Look at the tech industry for the past several decades to see this theme played out again and again.
some personal observations on the program (Score:3, Informative)
-as anything modern in IT, people sign Non-Disclosure Agreements (NDAs) so not a lot can be said from within the circle without breaking its terms. Having read the interview, I see the chief has also pretty much kept it this way, let alone only for the terms that are already publicly disclosed -google operates through 3rd party outsourcers and pretty much all non-essential communication is through them and not google directly, that's why the guy can't tell ya exact number about his posse. the big numbers are probably very correct, but I'm not sure about now. there seemed to be a very big wave of cut-offs and discontinued access for raters about a year ago, a lot of people got the boot and I'm not sure why - my bet is just a sweep of the axe. some were gone for a good reason, others very randomly. -the raters have a few spaces and forums to discuss their work, open to public and with minimal chance for an NDA break. -the raters have mods, too, but I haven't seen activity on that from for a while. -the specifics of the most cases have drawn me to a conclusion that for each surveyed example, there are at least 6 or 7 people working and giving opinions about, before a final decision is drawn, so there is your internal balance and weeding out bad judgement. lemme say it again you cannot single-handedly change Google's opinion about a particular site and particular search term. -about natural language processing - this is the scary part. you cannot imagine how good are these guys, especially their algorithms. from time to time they let us sneak peek at it and let me say we had a look at some betas (or alfa-s) of correct grammar processing and translation MONTHS ahead of their official announcement to the world. you could tell it was machine-made translation, but it was good, scary good. And I'm NOT talking English only, no,no. -the pay -it gets delayed about 6 weeks after month's end but is regular and usually not enough for a living, mainly due to the lack of work. first year it was good, very good, but in 2008 it started getting less and less, which is a shame, since it is a nice way to browse the net and get really paid to do it !
Re:Summary, missing from TFS (Score:2, Informative)
The vast upshot of this, is that it helps weed out those websites that are cheating the system, and trying to get their website as the #1 google hit, so they can show you ads. So the large part of what they are doing is tracking spam websites, not real ones.
Actually, it calls for further explanation, because manual tweaking of results produces bias and legal concerns. As guy from Google said,
We don't use any of the data we gather in that way. I mean, it is conceivable you could. But the evaluation site ratings that we gather never directly affect the search results that we return. We never go back and say, 'Oh, we learned from a rater that this result isnâ(TM)t as good as that one, so letâ(TM)s put them in a different order.' Doing something like that would skew the whole evaluation by-and-large. So we never touch it.
Mankind's knowledge stands on the shoulders of Google, so they can't just hire, say, a thousand students and use this evaluation as an significant weighting factor. It's rather a evaluation of algorithms for the sake of further improvement done fully by algorithms.
Search results that don't include search terms (Score:2, Informative)
Re:Fuzzy logic is killing Google (Score:4, Informative)
Plus signs should still be treated as true literals. Quotation marks don't indicate literality -- they indicate that you really, really care about things like word order and so on within the quotes. It used to be true that quotation marks implied a plus on everything inside them, but that wasn't an intentional feature. The advanced search check box was, AFAIK, just equivalent to sticking everything in quotes.
If you're still seeing fuzzification with a plus sign, something may be a bit screwy, and you should file a bug with a specific broken query. (Of course, if you run the query +wombats and see the word "wombat" highlighted in the snippet, that isn't the same thing -- +wombats was treated literally, so this document really truly matched the word "wombats," it might just also have matched the word "wombat" and the snippet highlighter decided that it made sense, for this particular query, to highlight the term. A bug would be if you found a truly irrelevant document coming up.)
Re:Google is PEOPLE (Score:3, Informative)
Of course they're both search engines, but most people (in my experience and I work in a library) know the difference and don't have any trouble differentiating the two.
Google = search for websites. Wolfram = search for data.