The Man Behind Google's Ranking Algorithm

The Man Behind Google's Ranking Algorithm 115

Posted by CmdrTaco on Sunday June 03, 2007 @10:45AM from the dear-god-no-more-seo-spam-please dept.

nbauman writes "New York Times interview with Amit Singhal, who is in charge of Google's ranking algorithm. They use 200 "signals" and "classifiers," of which PageRank is only one. "Freshness" defines how many recently changed pages appear in a result. They assumed old pages were better, but when they first introduced Google Finance, the algorithm couldn't find it because it was too new. Some topics are "hot". "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds," said Singhal. Classifiers infer information about the type of search, whether it is a product to buy, a place, company or person. One classifier identifies people who aren't famous. Another identifies brand names. A final check encourages "diversity" in the results, for example, a manufacturer's page, a blog review, and a comparison shopping site."

The Man Behind Google's Ranking Algorithm

This discussion has been archived. No new comments can be posted.

Search 115 Comments Log In/Create an Account

Comments Filter:

Many other things are goo(gle)d (Score:3, Interesting)

by Xoq jay ( 1110555 ) writes: on Sunday June 03, 2007 @11:14AM (#19371387)

Pagerank is the source of all wisdom in google... but there is so much more... Like string searching & matching algos, file searching.. you name it.. Just the other day I was searching for books about Google's algorithms... I found zero interesting stuff.. They keep their algorithms secret and out of the public domain... (like they should..). we praise Pagerank, but if we knew what other stuff is there, we would all be members of Church of Google (http://www.thechurchofgoogle.org/) :P

Googling Uncommon Characters and Exact Phrases (Score:3, Interesting)

by Anonymous Coward writes: on Sunday June 03, 2007 @11:24AM (#19371463)

One of the most annoying things about google for me is how it interprets queries with strange characters common to almost all programming languages. A google search for "ruby <<" returns no results related to the ruby append operator. A Simple search for "<<", by itself returns ZERO results.

One search feature (Score:5, Interesting)

by Z00L00K ( 682162 ) writes: on Sunday June 03, 2007 @11:27AM (#19371493) Homepage Journal

that has been lost was the "NEAR" keyword that AltaVista used earlier. I found it rather useful.
This could allow for a better search result when using for example "APPLE NEAR MACINTOSH" or "APPLE NEAR BEATLES"
Ho hum... Times changes and not always for the better...

Re:Many other things are goo(gle)d (Score:2, Interesting)

by Glacial Wanderer ( 962045 ) writes: on Sunday June 03, 2007 @12:23PM (#19371883) Homepage

I would agree that's likely the reason that Google won't release their algorithm, but my question was why many people outside of Google insist that Google should keep their algorithm secret. If Google in a moment of financial insanity released their search algorithms to their competition it wouldn't decrease the quality of my search results, actually that might improve my results if someone takes Google's algorithm and improves on it.

Re:Googling Uncommon Characters and Exact Phrases (Score:2, Interesting)

by Spy Hunter ( 317220 ) writes: on Sunday June 03, 2007 @02:54PM (#19373059) Journal

This is an interesting question that I've often wondered about. It's possible that Google programmers simply went in and special-cased C++ and C#, but I personally think that Google has an automated process which notices that "C++" and "C#" are commonly occurring both in web pages and queries, and then automatically adds them to the list of "strange" tokens to index.

"Millions Of Black Boxes"? (Score:4, Interesting)

by aldheorte ( 162967 ) writes: on Sunday June 03, 2007 @03:34PM (#19373403)

Not sure about this:

"Google rarely allows outsiders to visit the unit, and it has been cautious about allowing Mr. Singhal to speak with the news media about the magical, mathematical brew inside the millions of black boxes that power its search engine."

I could see tens of thousands, maybe hundreds of thousands, but millions?

Re:...only one? (Score:3, Interesting)

by rtb61 ( 674572 ) writes: on Sunday June 03, 2007 @09:44PM (#19376295) Homepage

From the results I've been getting lately, they seem to dropping page rank in preference to how many times the words 'google adwords' appears om the page, or more precisely the code for generating them. Totally worthless pages but obviously not worthless for google's bottom line. This story obviously reflects one thing and one thing only, the growing perception in the public's eye of the deteriorating quality of google's results, hence yet another marketing fluff piece, to try to convince them, it just ain't so.

I'm familiar with all this stuff (Score:3, Interesting)

by melted ( 227442 ) writes: on Sunday June 03, 2007 @11:22PM (#19376989) Homepage

And the thing that I want to know is how they evaluate the results. I actually do research in this space right now, and by far the most painful thing is evaluation of results. We have a system that automates most of the work, but there's still a lot of human involvement, and this limits the input dataset size and speed with which we can iterate the improvements.

Re:I'm familiar with all this stuff (Score:4, Interesting)

by martin-boundary ( 547041 ) writes: on Monday June 04, 2007 @12:25AM (#19377367)

Good question. I agree with you that the article doesn't say anything valuable in this respect :(
When you say that your system is limited by human involvement, I presume you mean that implementing new features can have serious impact on the overall design (and therefore on testing procedures)? Feel free to not answer if you can't.
One thing I found interesting in the article is that Google's system sounds like it scales well. It reminded me of antispam architectures like Brightmail's (if memory serves), which have large numbers of simple heuristics which are chosen by an evolutionary algorithm. The point is that new heuristics can be added trivially without changing the architecture. I think their system used 10,000 when they described it a few years ago at an MIT spam conference. Adjustments were done nightly by monitoring spam honeypots.
I'd love to see better competition in the search engine space. I hope you succeed at improving your tech.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

The Man Behind Google's Ranking Algorithm 115

The Man Behind Google's Ranking Algorithm More Login

The Man Behind Google's Ranking Algorithm

Many other things are goo(gle)d (Score:3, Interesting)

Googling Uncommon Characters and Exact Phrases (Score:3, Interesting)

One search feature (Score:5, Interesting)

Re:Many other things are goo(gle)d (Score:2, Interesting)

Re:Googling Uncommon Characters and Exact Phrases (Score:2, Interesting)

"Millions Of Black Boxes"? (Score:4, Interesting)

Re:...only one? (Score:3, Interesting)

I'm familiar with all this stuff (Score:3, Interesting)

Re:I'm familiar with all this stuff (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot