The Man Behind Google's Ranking Algorithm 115
nbauman writes "New York Times interview with Amit Singhal, who is in charge of Google's ranking algorithm. They use 200 "signals" and "classifiers," of which PageRank is only one. "Freshness" defines how many recently changed pages appear in a result. They assumed old pages were better, but when they first introduced Google Finance, the algorithm couldn't find it because it was too new. Some topics are "hot". "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds," said Singhal. Classifiers infer information about the type of search, whether it is a product to buy, a place, company or person. One classifier identifies people who aren't famous. Another identifies brand names. A final check encourages "diversity" in the results, for example, a manufacturer's page, a blog review, and a comparison shopping site."
Many other things are goo(gle)d (Score:3, Interesting)
Googling Uncommon Characters and Exact Phrases (Score:3, Interesting)
One search feature (Score:5, Interesting)
This could allow for a better search result when using for example "APPLE NEAR MACINTOSH" or "APPLE NEAR BEATLES"
Ho hum... Times changes and not always for the better...
Re:Many other things are goo(gle)d (Score:2, Interesting)
Re:Googling Uncommon Characters and Exact Phrases (Score:2, Interesting)
"Millions Of Black Boxes"? (Score:4, Interesting)
"Google rarely allows outsiders to visit the unit, and it has been cautious about allowing Mr. Singhal to speak with the news media about the magical, mathematical brew inside the millions of black boxes that power its search engine."
I could see tens of thousands, maybe hundreds of thousands, but millions?
Re:...only one? (Score:3, Interesting)
I'm familiar with all this stuff (Score:3, Interesting)
Re:I'm familiar with all this stuff (Score:4, Interesting)
When you say that your system is limited by human involvement, I presume you mean that implementing new features can have serious impact on the overall design (and therefore on testing procedures)? Feel free to not answer if you can't.
One thing I found interesting in the article is that Google's system sounds like it scales well. It reminded me of antispam architectures like Brightmail's (if memory serves), which have large numbers of simple heuristics which are chosen by an evolutionary algorithm. The point is that new heuristics can be added trivially without changing the architecture. I think their system used 10,000 when they described it a few years ago at an MIT spam conference. Adjustments were done nightly by monitoring spam honeypots.
I'd love to see better competition in the search engine space. I hope you succeed at improving your tech.