Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Math

The Math Behind PageRank 131

anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."
This discussion has been archived. No new comments can be posted.

The Math Behind PageRank

Comments Filter:
  • by dada21 ( 163177 ) * <adam.dada@gmail.com> on Wednesday December 06, 2006 @07:50PM (#17139112) Homepage Journal
    I have sites with a PR of 6, and I can tell you that they got that way because of inbound links from other sites. In fact, when other sites dropped those links, my PR dropped (to 5, and even to 4). Getting more inbound links brought the PR back.

    Think about those links, too. How often do you use common words in an HREF? I don't think there's a lot of weeding out of common words since the link to a site is usually either its name, or a description containing some important keywords.

    I love seeing these technoscientists think they understand PageRank, but just like TimeCube, they're way, way off.
  • by markov_chain ( 202465 ) on Wednesday December 06, 2006 @08:04PM (#17139276)
    There has been a PageRank paper out there since 2000 or so, so it's not exactly a secret how it works. Basically an initial set of relevant pages is pulled from the database and ranked by doing some computation on a connectivity matrix. The trick is to come up with a good initial set; and unless they managed to implement an all-knowing oracle they probably do it by doing a keyword search. Here's where the article summary makes sense; if most pages have the same keywords, a keyword search is going to come up with an awfully large initial set.

    The article might have details, maybe someone who has actually read it can fill in :)

  • by Anonymous Brave Guy ( 457657 ) on Wednesday December 06, 2006 @08:52PM (#17139894)

    The underlying idea behind page rank is pretty well-exposed at this point, and is described in TFA. Essentially, it's a big set of simultaneous equations: each incoming link to your page gets a score that is roughly the rank of the source page divided by the number of outgoing links on that page, and then the rank of your page is roughly the sum of the scores of all incoming links.

    Various fudge factors are introduced along the way. For example, if you break Google's rules about displaying the same content to bots as to humans, you can get slapped right down. More subtly, newly registered domains take a modest hit for a while. More nobody-knows-ly, Google's handling of redirects is unclear: information about exactly what adjustments are made is pretty scarce, and there's a lot of conjecture around. One thing that's pretty certain is that they penalise for duplicate content, which is why some webmasters do apparently unnecessary things like redirecting http://www.theircompany.com/ [theircompany.com] to http://theircompany.com/ [theircompany.com] or vice versa.

    So, if you want to get a page with a high rank yourself, then ideally you need would get many established, highly-ranked pages to link to your page and no others. In your example, all those Geocities sites wouldn't help a lot, because (a) they'd have negligible rank themselves, and (b) they'd be penalised for being new and lose some of that negligible rank before they even started. Many times negligible is still negligible, and so would be your target page's rank. OTOH, get a few links from university sites, big news organisations and the like, and your rank will suddenly be way up there. Alternatively, get a grass-roots movement going where a gazillion individuals with small personal sites link to you, and the cumulative effect will kick in.

  • OK, but... (Score:1, Informative)

    by indigest ( 974861 ) on Wednesday December 06, 2006 @10:38PM (#17140788)
    The algorithms behind PageRank are no secret. Why not just read about them from the source [stanford.edu]?
  • by MadMagician ( 103678 ) on Wednesday December 06, 2006 @11:38PM (#17141260)

    For a different, somewhat more technical, but more succint discussion, Cleve Moler [of Matlab fame] wrote another view [mathworks.com] of this topic, about 5 years ago.

    The math is the same, of course, but two points of view may provide a greater sense of perspective. So to speak. And Cleve is always worth listening to.

  • by XorNand ( 517466 ) * on Thursday December 07, 2006 @01:28AM (#17142094)
    As pointed out, the Times site isn't fooled, but there are a good many out there that are fooled. Sometimes if you ever do a Google search, one of the results will contain a keyword or two. However, when you click on the link, you'll find yourself redirected to a subscription page. Useragent spoofing can frequently show you the same page that Google indexed.

    If you're a FF user, grab the Useragent Switcher extension [mozilla.org] and add in a UA of "Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)". You'll then be two clicks away from seeing what was previously registration-only.

Nothing is finished until the paperwork is done.

Working...