Google Opens Up (Some) Search Algorithms 86
overmars writes "After years of closely guarding the formula for its search algorithms, Google is opening up a little.
The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse, said Udi Manber, Google's vice president of engineering, search quality, in a post on the corporate blog. Manber said the blog post is the first part of a renewed effort at the company 'to open up a bit more than we have in the past.'
Manber said the most famous part of Google's ranking algorithm is PageRank, an algorithm developed by Google cofounders Larry Page and Sergey Brin. While PageRank is still in use, it is a 'part of a much larger system,' he said.
'Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing),' he said."
Dont do it Google! (Score:1, Interesting)
Re: (Score:3, Insightful)
In reality, I'm sure Google's leadership has done some heavy analysis on exactly how much openness benefits them.
Re: (Score:2, Interesting)
and
The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse
Security through obscurity isn't a good plan, and Google knows that.
Re:Dont do it Google! (Score:5, Insightful)
I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this. Ideas like PageRank gave Google the edge in the early days, but now, their advantage lies in other areas. The have a stunning amount of capital tied up in hardware, giving them amazing speed, and amazing amounts of data. They have code optimized to handle those amounts of data in reasonable time. They have the experience to take simple probability models like the ones described in the article, and make them work with those amounts of data.
This is why it's impossible to beat Google at search and other data-based markets. It's not one simple patented idea anymore. If it was just that, Google would've disappeared years ago. The only way to beat the points described above, is to have the capital to buy the hardware, and knowledge to match Google. Microsoft can do that, but Google has one other thing that Microsoft doesn't. They understand their developers. They understand that if you give these kinds of scientist/developers an interesting problem, a fantastic dataset and the freedom to attack it in their own way, you barely even have to pay them anymore. The interest will take over and completely fuel the project. They will work overtime, and come in on the weekends, without being asked.
That will bring energy to a project and a company, that you can never get through any tactic that Microsoft is likely to employ. I admit I don't precisely know what Microsoft is like on the inside, but I simply cannot conceive of them as a company that understands the joy of programming, or the joy of science (which is a huge big part of information retrieval). In any case, one blog post with some sketchy details isn't going to tell Microsoft anything they don't know already.
Re: (Score:1)
Re: (Score:1)
Oh, you mean the ones that have been fired over the contents of their blogs? [infoworld.com]
Not him, but I can see the article you linked as supporting evidence for my point. ;)
/certainly not angry,
Anyway, here's his response. [michaelhanscom.com]
Re: (Score:2)
I'll be joining Microsoft full-time this summer; if that says anything.
Re: (Score:2, Interesting)
I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this.
Coming up with things is easy; implementing them is hard. Any average Joe Sixpack can come up with the idea of a flying car in five seconds, but to actually build one is another matter entirely - and doing so in a commercially viable way is yet another matter.
Remember what Edison said about inspiration and perspiration?
Re: (Score:1)
Re:Dont do it Google! (Score:4, Funny)
I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this.
What exactly is open? (Score:5, Insightful)
What, exactly, has Google opened up? As far as I can see fron TFA all that is explained is on a very general level, with no detail what so ever. I can't see Google's competion gaining any significant benefit from this.
Re: (Score:2, Interesting)
While Rob Enderle puts the matter trollishly, I agree with the thrust of what he says. Google has been given a free pass on this. Their main product/service is definitely not open source, or free software, and in fact is less open that most of Microsoft's products (for example). At least with Windows and
To be fair, he's a VP (Score:2)
Re: (Score:3, Informative)
No, they _used to be_ engineers (Score:2)
Re: (Score:2)
From TFA...
Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).
Obviously, this usage of the word "open" is not related to open source software. It's more like he is willing to talk about it at all.
Deus Ex, anyone? (Score:1)
User: What else do you know about me?
Pagerank: Everything that can be known.
User: How about a report on yourself?
Pagerank:I was a prototype for Echelon IV. My instructions are to amuse visitors with
information about their websites.
User: I don't see anything amusing about spying on people.
Pagerank: Human beings feel pleasure when they are watched. I have recorded their smiles
as I tell them who they are.
User: Some people just don't understand the dangers of
Re: (Score:1)
License? (Score:2, Interesting)
At least that's what I think.
Re: (Score:2, Informative)
Re: (Score:1)
Re: (Score:3, Informative)
The secret ingredient... (Score:5, Funny)
// Google search algorithm
for (int i=0; i <= numResults; i++)
{
if (results[i].good)
{
show(results[i]);
}
}
//
Re: (Score:2, Funny)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Re:Your indexing is wrong (Score:1)
for (int i=0; i = numResults; i++)
should be
for (int i=0; i numResults; i++)
-Buffer Overflow Nazi
Re: (Score:1)
for (int i=0; i <= numResults; i++)
should be
for (int i=0; i < numResults; i++)
consider the Pagerank important (Score:2, Interesting)
Mystified by 'the google" (Score:4, Interesting)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
If not, maybe your host has their robots.txt set to block searching?
Re: (Score:1)
Re: (Score:1)
But FOCUSED and INFORMATIVE.
All my websites [alliancetec.com] are borderline boring - but nevertheless focused and informative
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Like for example now I am trying to optimize this website now: Farmhouses in Tuscany [lucertola.info]
When i first saw the original - I had to completely rewrite it from scratch - it took me over 3 months of research to come up with better text and structure - and the site is still not 100% finished.
It is not going to be easy
Lot's of energy goes into SEO.
The most important thing to remember is play by the rules - but play very hard.
Re: (Score:1)
FRONTPAGE
It kills any chances of your site doing well
Hand-code your site
Re: (Score:1)
2. Make sure that navigation in your site makes sense from google's bot perspective. Map categories/subcategories in your site to folders in the URL of your site. URLs of your site should be preety, contain relevant words and be relatively short, ie http://example.com/webdesign/logo/price-quote-for-logo-design.html [example.com] rather than
Re: (Score:1)
Much can be determined by using google (Score:2)
Re: (Score:2)
Nice, nick, by the way.
Making Open Source propreitary? (Score:1)
In many ways, Google is much more proprietary than Microsoft is, and they actually used open source software to get there. So unlike Microsoft, which started off proprietary and has gradually been opening its stuff up, Google starts off getting other people's open stuff, turns it proprietary and then makes money off it. It kind of redefines 'pirate.' I think Google is feeling a little bit of the heat because people are starting to focus on that a bit."
While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory?
Re: (Score:2)
While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory?
I think what he's saying is that he thinks Google violates the spirit of licenses (particularly the GPL), even though they follow all the requirements of them. Some people get upset that the Internet makes it so that you can separate the using of software from the running of it (whereas in non-networked environments, those are equivalent), and all the obligations in the licenses are stated in terms of people who run the software, so companies like Google can modify software to their heart's content and ne
Diacritics and language (Score:1)
English writers often omit the diacritical mark (they also sometimes transliterate "ö" as "oe", at least for German). Playing around with Google (via google.com, rather than google.de or google.se), it seems that
oh the irony... (Score:1)
Not true... (Score:1)
Not true.
PageRank was invented by Page (note the name), according to the patent. If the patent is incorrect on that, then the patent is invalid.