Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet IT

The Man Behind Google's Ranking Algorithm 115

nbauman writes "New York Times interview with Amit Singhal, who is in charge of Google's ranking algorithm. They use 200 "signals" and "classifiers," of which PageRank is only one. "Freshness" defines how many recently changed pages appear in a result. They assumed old pages were better, but when they first introduced Google Finance, the algorithm couldn't find it because it was too new. Some topics are "hot". "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds," said Singhal. Classifiers infer information about the type of search, whether it is a product to buy, a place, company or person. One classifier identifies people who aren't famous. Another identifies brand names. A final check encourages "diversity" in the results, for example, a manufacturer's page, a blog review, and a comparison shopping site."
This discussion has been archived. No new comments can be posted.

The Man Behind Google's Ranking Algorithm

Comments Filter:
  • by Xoq jay ( 1110555 ) on Sunday June 03, 2007 @11:14AM (#19371387)
    Pagerank is the source of all wisdom in google... but there is so much more... Like string searching & matching algos, file searching.. you name it.. Just the other day I was searching for books about Google's algorithms... I found zero interesting stuff.. They keep their algorithms secret and out of the public domain... (like they should..). we praise Pagerank, but if we knew what other stuff is there, we would all be members of Church of Google (http://www.thechurchofgoogle.org/) :P
  • by Anonymous Coward on Sunday June 03, 2007 @11:24AM (#19371463)
    One of the most annoying things about google for me is how it interprets queries with strange characters common to almost all programming languages. A google search for "ruby <<" returns no results related to the ruby append operator. A Simple search for "<<", by itself returns ZERO results.
  • One search feature (Score:5, Interesting)

    by Z00L00K ( 682162 ) on Sunday June 03, 2007 @11:27AM (#19371493) Homepage Journal
    that has been lost was the "NEAR" keyword that AltaVista used earlier. I found it rather useful.

    This could allow for a better search result when using for example "APPLE NEAR MACINTOSH" or "APPLE NEAR BEATLES"

    Ho hum... Times changes and not always for the better...

  • by Glacial Wanderer ( 962045 ) on Sunday June 03, 2007 @12:23PM (#19371883) Homepage
    I would agree that's likely the reason that Google won't release their algorithm, but my question was why many people outside of Google insist that Google should keep their algorithm secret. If Google in a moment of financial insanity released their search algorithms to their competition it wouldn't decrease the quality of my search results, actually that might improve my results if someone takes Google's algorithm and improves on it.
  • by Spy Hunter ( 317220 ) on Sunday June 03, 2007 @02:54PM (#19373059) Journal
    This is an interesting question that I've often wondered about. It's possible that Google programmers simply went in and special-cased C++ and C#, but I personally think that Google has an automated process which notices that "C++" and "C#" are commonly occurring both in web pages and queries, and then automatically adds them to the list of "strange" tokens to index.
  • by aldheorte ( 162967 ) on Sunday June 03, 2007 @03:34PM (#19373403)
    Not sure about this:

    "Google rarely allows outsiders to visit the unit, and it has been cautious about allowing Mr. Singhal to speak with the news media about the magical, mathematical brew inside the millions of black boxes that power its search engine."

    I could see tens of thousands, maybe hundreds of thousands, but millions?
  • Re:...only one? (Score:3, Interesting)

    by rtb61 ( 674572 ) on Sunday June 03, 2007 @09:44PM (#19376295) Homepage
    From the results I've been getting lately, they seem to dropping page rank in preference to how many times the words 'google adwords' appears om the page, or more precisely the code for generating them. Totally worthless pages but obviously not worthless for google's bottom line. This story obviously reflects one thing and one thing only, the growing perception in the public's eye of the deteriorating quality of google's results, hence yet another marketing fluff piece, to try to convince them, it just ain't so.
  • by melted ( 227442 ) on Sunday June 03, 2007 @11:22PM (#19376989) Homepage
    And the thing that I want to know is how they evaluate the results. I actually do research in this space right now, and by far the most painful thing is evaluation of results. We have a system that automates most of the work, but there's still a lot of human involvement, and this limits the input dataset size and speed with which we can iterate the improvements.
  • by martin-boundary ( 547041 ) on Monday June 04, 2007 @12:25AM (#19377367)
    Good question. I agree with you that the article doesn't say anything valuable in this respect :(

    When you say that your system is limited by human involvement, I presume you mean that implementing new features can have serious impact on the overall design (and therefore on testing procedures)? Feel free to not answer if you can't.

    One thing I found interesting in the article is that Google's system sounds like it scales well. It reminded me of antispam architectures like Brightmail's (if memory serves), which have large numbers of simple heuristics which are chosen by an evolutionary algorithm. The point is that new heuristics can be added trivially without changing the architecture. I think their system used 10,000 when they described it a few years ago at an MIT spam conference. Adjustments were done nightly by monitoring spam honeypots.

    I'd love to see better competition in the search engine space. I hope you succeed at improving your tech.

"And remember: Evil will always prevail, because Good is dumb." -- Spaceballs

Working...