Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Math

The Math Behind PageRank 131

anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."
This discussion has been archived. No new comments can be posted.

The Math Behind PageRank

Comments Filter:
  • Nouns maybe? (Score:4, Insightful)

    by Bryansix ( 761547 ) on Wednesday December 06, 2006 @08:07PM (#17139344) Homepage
    It seems like it would be the nouns, pronouns, etc. that Google should be paying attention to. Who cares about all the verbs, adjectives, etc. that just muddy the indexing waters?
  • Re:10,000 words (Score:0, Insightful)

    by Anonymous Coward on Wednesday December 06, 2006 @08:33PM (#17139700)

    Dear The Zon,

    You are not funny. Quit your lame endless attempts at humor.

    The Other 1.05 million readers

  • by Trieuvan ( 789695 ) on Wednesday December 06, 2006 @08:35PM (#17139726) Homepage
    The pagerank that's reported from toolbar is really old. Google never want to let you know the real number or it will be easy to spam ...
  • Re:Nouns maybe? (Score:2, Insightful)

    by abshnasko ( 981657 ) on Wednesday December 06, 2006 @08:45PM (#17139824)
    Searching for pill and the pill should yield very different results. Yes nouns are more important, but articles and other words cannot be disregarded.
  • Re:Bad summary (Score:5, Insightful)

    by martin-boundary ( 547041 ) on Wednesday December 06, 2006 @10:19PM (#17140646)
    It's nowhere near like that. A web matrix is very sparse, so if you did a true 25Bx25B matrix power iteration, you'd be multiplying zero by zero a gazillion times. Optimization is about not doing things you don't need to do, and optimizing PageRank is about figuring out clever ways to not do the full multiplication. Moreover, PageRank is calculated in parallel over a computer farm. Overall, you can expect a single iteration to take on the order of an hour, and you can expect around 50-80 iterations before Google gives up and says it's converged. You can also try and reuse the previous "converged" PageRank vector to cut down on the 50-80 iterations after you've crawled new pages.

    If google used a single computer to do all the work, and truly did 80*25B^2 operations, they'd be morons.

  • by l0cust ( 992700 ) on Thursday December 07, 2006 @12:33AM (#17141684) Journal
    Thanks for the informative post. I have one question though. How does it help find the relevant information unless that information just happens to be on a popular page too? What I mean to say is that the idea behind grading/filtering systems like PageRank is to provide the most relevant information about the thing you are trying to search on the net. Now suppose Mr. A is looking for some obscure Indian text written in Sanskrit and Mr. B has (recently or not) put up a website with that text as one of the contents but its not a popular blogsite, nor a mainstream ebook source site etc. And there are a gazillion hugely popular sites out there which mention that particular text while talking about a totally different book or text.

    So it means the only way that Mr. A will come across Mr. B's website is if he kept on looking for 100s of result pages or if he just chanced upon it via something he read about earlier. Doesn't it defeat the purpose of making the searches more relevant, specially since lots of webmasters actively use PageRanking system to get better ranking on the search index. Where does that leave the people with worthwhile content but not much popular backing?

    (I would like to clear that I am not trying to knock this system or anything, I am just curious about the implications for small-but-good-content website owners)
  • by Anonymous Coward on Thursday December 07, 2006 @04:30AM (#17143112)
    Blah. Ugly red clothes... Go Bears!

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...