Forgot your password?
typodupeerror
Google Businesses The Internet

Google Opens Up (Some) Search Algorithms 86

Posted by CmdrTaco
from the nobody-else-has-a-million-servers-anyway dept.
overmars writes "After years of closely guarding the formula for its search algorithms, Google is opening up a little. The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse, said Udi Manber, Google's vice president of engineering, search quality, in a post on the corporate blog. Manber said the blog post is the first part of a renewed effort at the company 'to open up a bit more than we have in the past.' Manber said the most famous part of Google's ranking algorithm is PageRank, an algorithm developed by Google cofounders Larry Page and Sergey Brin. While PageRank is still in use, it is a 'part of a much larger system,' he said. 'Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing),' he said."
This discussion has been archived. No new comments can be posted.

Google Opens Up (Some) Search Algorithms

Comments Filter:
  • Dont do it Google! (Score:1, Interesting)

    by FudRucker (866063)
    As long as Microsoft wants to dominate the search engine market at the expense of Google, Yahoo and anyone else that gets in the way (knowing Microsoft's track record of abusive & dirty underhanded methods). I would keep that a secret to protect the intertubes from the likes of Microsoft.
    • Re: (Score:3, Insightful)

      by spidr_mnky (1236668)
      As long as we're twisting the lion's tail, you might instead say that the more people and companies who desire common progress share their work, the more people and companies who want to isolate themselves and their work for the sake of competition will be unable to keep up. Therefore, the more Google publishes, the harder they will be able to fight (our antagonist) MS.

      In reality, I'm sure Google's leadership has done some heavy analysis on exactly how much openness benefits them.
      • Re: (Score:2, Interesting)

        by Daengbo (523424)
        In reality, I'm sure Google's leadership has done some heavy analysis on exactly how much openness benefits them.
        and
        The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse

        Security through obscurity isn't a good plan, and Google knows that.
    • by risk one (1013529) on Saturday May 24, 2008 @08:47AM (#23527262)

      I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this. Ideas like PageRank gave Google the edge in the early days, but now, their advantage lies in other areas. The have a stunning amount of capital tied up in hardware, giving them amazing speed, and amazing amounts of data. They have code optimized to handle those amounts of data in reasonable time. They have the experience to take simple probability models like the ones described in the article, and make them work with those amounts of data.

      This is why it's impossible to beat Google at search and other data-based markets. It's not one simple patented idea anymore. If it was just that, Google would've disappeared years ago. The only way to beat the points described above, is to have the capital to buy the hardware, and knowledge to match Google. Microsoft can do that, but Google has one other thing that Microsoft doesn't. They understand their developers. They understand that if you give these kinds of scientist/developers an interesting problem, a fantastic dataset and the freedom to attack it in their own way, you barely even have to pay them anymore. The interest will take over and completely fuel the project. They will work overtime, and come in on the weekends, without being asked.

      That will bring energy to a project and a company, that you can never get through any tactic that Microsoft is likely to employ. I admit I don't precisely know what Microsoft is like on the inside, but I simply cannot conceive of them as a company that understands the joy of programming, or the joy of science (which is a huge big part of information retrieval). In any case, one blog post with some sketchy details isn't going to tell Microsoft anything they don't know already.

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this.

        Coming up with things is easy; implementing them is hard. Any average Joe Sixpack can come up with the idea of a flying car in five seconds, but to actually build one is another matter entirely - and doing so in a commercially viable way is yet another matter.

        Remember what Edison said about inspiration and perspiration?

        • Coming up with something is a little less abstract than you'd like it to be. Coming up with these ideas includes some rough thought of how to implement them; only having an idea about something is definitely not ``coming up with something''.
      • by kestasjk (933987) on Saturday May 24, 2008 @12:24PM (#23529344) Homepage

        I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this.

        Of course you could. ;-) You took a course in Information Retrieval, after all.
  • by k33l0r (808028) on Saturday May 24, 2008 @08:29AM (#23527166) Homepage Journal

    What, exactly, has Google opened up? As far as I can see fron TFA all that is explained is on a very general level, with no detail what so ever. I can't see Google's competion gaining any significant benefit from this.


    • Re: (Score:2, Interesting)

      by Anonymous Coward
      Right. And the competitors already know pieces of what Google has, as a result of the inevitable stream of engineers leaving to take new jobs. Particularly at SV startups founded by ex-Googlers.

      While Rob Enderle puts the matter trollishly, I agree with the thrust of what he says. Google has been given a free pass on this. Their main product/service is definitely not open source, or free software, and in fact is less open that most of Microsoft's products (for example). At least with Windows and .Net, we
    • I've never seen a VP who knows anything about what he's overseeing. So he caught some general phrases from his engineers and put them on the blog. Scientists' posts would be much more interesting.
      • Re: (Score:3, Informative)

        by Temporal (96070)
        The engineering VPs at Google are all engineers themselves. Udi himself was hired for his extensive background in web search, at Yahoo and Amazon. He knows a great deal about what he oversees.
    • From TFA...

      Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).

      Obviously, this usage of the word "open" is not related to open source software. It's more like he is willing to talk about it at all.

  • Pagerank: I am a prototype for a much larger system.
    User: What else do you know about me?
    Pagerank: Everything that can be known.
    User: How about a report on yourself?
    Pagerank:I was a prototype for Echelon IV. My instructions are to amuse visitors with
    information about their websites.
    User: I don't see anything amusing about spying on people.
    Pagerank: Human beings feel pleasure when they are watched. I have recorded their smiles
    as I tell them who they are.
    User: Some people just don't understand the dangers of
  • License? (Score:2, Interesting)

    by Bootarn (970788)
    Under which license is the algorithms being released? If it's a BSD-like license, MS will probably be all over it, but if it's a GPL license, it may be harder for them to claim the algorithms as their own, since they'll have to open up their own code.
    At least that's what I think.
    • Re: (Score:2, Informative)

      by Anonymous Coward
      I dont think algorithms are typically licensed. source code is licensed, algorithms are patented.
    • Re: (Score:3, Informative)

      by vertigoCiel (1070374)
      There's no indication in the article that any code or algorithms will be released. They're just talking about it on a very broad, conceptual level. The headline and summary are quite misleading.
  • by nweis (1095487) on Saturday May 24, 2008 @08:56AM (#23527300)
    // Disclosed code snippet from
    // Google search algorithm

    for (int i=0; i <= numResults; i++)
    {
        if (results[i].good)
        {
            show(results[i]);
        }
    }

    // ...
  • Accordingly, we must still consider the Pagerank important because it is the only part of the algorithm which we know and we know how to raise it. This is for all those who thought they no longer served the Pagerank for positioning in search engines.
  • by gary_7vn (1193821) on Saturday May 24, 2008 @09:29AM (#23527534) Homepage
    I have a terrible admission to make. I, among other things, design websites. Yet, when I search for me on the google, I don't come up. I use relevant terms that are all over my site, and in the metadata (although I understand they don't really matter anymore), yet my own personal site does not come up, even though the url has been up and running for 8 years. The final straw was when I did a search for web design, Ottawa, and a newly opened competitor (just around the corner actually) came up on the second page. I spent the last couple of days researching this (again) and I seem to be meeting all of googles requirements. I have never used a sleazy SEO company, my content is consistent and legal. What's up with that?
    • by Nirvelli (851945)
      Do you host your own site?
      If not, maybe your host has their robots.txt set to block searching?
      • by gary_7vn (1193821)
        No, my site is hosted by blacksun.ca, other sites that I have done come up on a keyword search just fine, that is part of the reason why I am mystified. Or googled or something like that. Thanks for the suggestion!
    • The first two results for +ottawa +web +design are: -Atomic Motion - Ottawa Web Design and Development -Envision Online - Ottawa Web Designers Ottawa Web Site Designs Your site appears to be eyestir: -EyeStir Visual Communications, Digital Signage, PowerPoint Searching for +Visual +Communications +Ottawa, your site doesn't show up until the 12th page, but searching for +digital +signage +canada, it's the third result. I'd guess google is ranking these results according to the page title. Try adding web
      • by gary_7vn (1193821)
        Thanks, I think that is the key, I have to load up on the correct search terms. I just "localized" last night on Google Web last night, so adding Canada really helps. Prior to that, the results were even worse. It's funny but I have a picture of myself and my cat fritz with a ufo in the background, Fritz sees it because I am looking at the camera. The text on the picture reads "I want to believe". The jpeg file has that name too, and because of the new X files movie, I am getting lots of hits from that stri
    • by popra (879835)
      1. Make sure that at some point google didn't label you as a "spam" site, http://www.google.com/webmasters/ [google.com] is a good starting point for learning google's view of your site's health

      2. Make sure that navigation in your site makes sense from google's bot perspective. Map categories/subcategories in your site to folders in the URL of your site. URLs of your site should be preety, contain relevant words and be relatively short, ie http://example.com/webdesign/logo/price-quote-for-logo-design.html [example.com] rather than
      • by gary_7vn (1193821)
        Thank you. Excellent advice. I will look at this for sure. I am already doing some of the things you suggest, will try the rest.
  • I have noticed often I search for a word and get pages the only contain synonyms (or variations on the word). Likewise for the handling of accents search for resumé and you'll find pages with resume.
    • by hey! (33014)
      Well, they're probably using some kind of hash based document fingerprinting anyway. Ignoring low entropy characteristics of a word when calculating the fingerprint makes sense, because you can always go back and take it into account once you've eliminated 99.999999999% of the documents on the Internet.

      Nice, nick, by the way.
  • In many ways, Google is much more proprietary than Microsoft is, and they actually used open source software to get there. So unlike Microsoft, which started off proprietary and has gradually been opening its stuff up, Google starts off getting other people's open stuff, turns it proprietary and then makes money off it. It kind of redefines 'pirate.' I think Google is feeling a little bit of the heat because people are starting to focus on that a bit."

    While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory?

    • by flooey (695860)

      While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory?

      I think what he's saying is that he thinks Google violates the spirit of licenses (particularly the GPL), even though they follow all the requirements of them. Some people get upset that the Internet makes it so that you can separate the using of software from the running of it (whereas in non-networked environments, those are equivalent), and all the obligations in the licenses are stated in terms of people who run the software, so companies like Google can modify software to their heart's content and ne

  • Handling diacritics can sometimes be involved. As an example, consider the o-umlaut (ö). In German, this is the usual letter "o" with a diacritical mark. In Swedish, the same glyph is a separate letter of the alphabet—and comes after the letter "z" in the standard ordering.

    English writers often omit the diacritical mark (they also sometimes transliterate "ö" as "oe", at least for German). Playing around with Google (via google.com, rather than google.de or google.se), it seems that
  • Ironically, in it's attempt to open up a little, the Google blog is blocked by the GFW of China...
  • "...PageRank, an algorithm developed by Larry Page and Sergey Brin..."

    Not true.

    PageRank was invented by Page (note the name), according to the patent. If the patent is incorrect on that, then the patent is invalid.

"Never give in. Never give in. Never. Never. Never." -- Winston Churchill

Working...