Cracking the Google Code... Under the GoogleScope 335
jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""
in case of slashdotting, article text (Score:5, Informative)
Google's US Patent confirms information retrieval is based on historical data.
Publication Date: 5/8/2005 9:51:18 PM
Author Name: Lawrence Deon
An Introduction:
Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out
Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.
The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.
What exactly do these changes mean to you?
Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:
"A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."
Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.
Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.
Here's how Google scores your web pages.
In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
What's new and interesting is what Google takes into account in determining the freshness of a web page.
For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.
According to their patent filing Google records and scores the following web page changes to determine freshness.
The frequency of all web page changes
The actual amount of the change itself... whether it is a substantial change redundant or superfluous
Changes in keyword distribution or density
The actual number of new web pages that link to a web page
The change or update of anchor text (the text that is used to link to a web page)
The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.
Developing your web page augments for page freshness.
Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.
Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.
How do you unravel that statement and differentiate between the two types of content?
An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.
A page related to winter clothin
Six weeks to fix? (Score:2, Informative)
If this claim is true, I guess we'll have to wait the typical "four to six weeks for delivery."
effect on search engine optimizers (Score:5, Informative)
Article text and Google cache link (Score:3, Informative)
Google United - Google Patent Examined
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control.
Publication Date: 4/7/2005 7:41:24 AM
By Jim Hedger, StepForth News Editor, StepForth Placement Inc.
Thoughts on Google's patent... "Information retrieval based on historical data."
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control. Some of the ideas and concepts covered in the document are almost certainly worked into the current algorithm running Google. Some are being worked in as this article is being written. Some may never see the blue-light of electrons but are pretty good ideas so it might have been considered wise to patent them. Google's not saying which is which. While not exactly War and Peace, it's a pretty complex document that gives readers a glimpse inside the minds of Google engineers. What it doesn't give is a 100% clear overview of how Google operates now and how the various ideas covered in the patent application will be integrated into Google's algorithms. One interesting section seems to confirm what SEOs have been saying for almost a year, Google does have a "sandbox" where it stores new links or sites for about a month before evaluation.
Google is in the midst of sweeping changes to the way it operates as a search engine. As a matter of fact, it isn't really a search engine in the fine sense of the word anymore. It isn't really a portal either. It is more of an institution, the ultimate private-public partnership. Calling itself a media-company, Google is now a multi-faceted information and multi-media delivery system that is accessed primarily through its well-known interface found at www.google.com.
Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll's over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn't work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn't work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.
In its recent SEC filing, the first it has produced since going public in August 2004, Google said it was going to spend a lot of money to continue outpacing its rivals. This year they figure they will spend about $500 million to develop or enhance newer technologies. In 2004 and 2003, Google spent $319 million and $177 million respectively. The increase in innovation-spending corresponds with a doubling of Google's staff headcount which has jumped from 1628 employees in 2003 to 3021 by the end of 2004.
Over the past five years Google has produced a number of features that have proven popular enough to be included among its public-search offerings. On their front page, these features include Image Search, Google Groups, Google News, Froogle, Google Local, and Google Desktop. There are dozens of other features which can be accessed by cli
Old Story (Score:1, Informative)
From the article: GOOGLE has plans that will dramatically improve the results of internet news searches, by ranking them according to quality rather than simply by their date and relevance to search terms. The ambitious system is revealed by patents filed in the US and around the world (WO 2005/029368) by researchers based at the company's headquarters in Mountain View, California.
Re:Unintended side effects of the Google arms race (Score:3, Informative)
Sometimes search engine optimization isn't about making a hack site rank well. Sometimes it is about getting the traffic that a really nifty site deserves.
In fact, I wish all the legit sites did everything they should morally do in terms of SEO. Then the spam sites wouldn't have such an easy time pushing them out of the way.
From a business perspective, money spent on making non-spammy search engine optimizations can be much more effective than money spent on marketing or public relations.
--
Scientific calculator with hex, octal, decimal, and binary [ostermiller.org]
Re:SEO (Score:3, Informative)
Web page "freshness?" A good thing... (Score:3, Informative)
The site is mostly static but is rich with cultural value. It's currently the number one hit on Google. I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.
Re:Great (Score:3, Informative)
Re:Unintended side effects of the Google arms race (Score:1, Informative)
I'm a former SEO guy...I've worked with many companies, large and small, to optimize their websites. I've done everything from online pharmaceuticals to christian mission trips. I've tried every trick in the book over a number of years...and I can tell you that as long as search engines exist, I seriously doubt the SEO companies will disappear.
Why? Simply put, people (and companies) do NOT understand how to present their content in a way that an automated bot can read and rank them.
Towards the end of my SEO consulting days, my advice was over and over again: Content is king. Build a good website, with good content, and make sure to include all the necessary elements to identify it as a good website.
Usually, that meant that I would go through their site, fill in missing pieces and recommend additional content. No schemes, no crazy link deals
So, keep in mind not ALL SEO guys are bad...just most of them...but companies will always need SEO guys to come in and fill in their site's holes.
Re:Google's Click History Asset (Score:4, Informative)
You sure about that? Try copying and pasting a Google results link.
For example, let's search Google for "elluusive" [google.com]. The first result was your slashdot "homepage", at http://slashdot.org/~eluusive [slashdot.org], which at first glance seems to be a direct link. But if you right-click on the link and copy it, paste it somewhere and you'll find something along these lines:
http://www.google.com/url?sa=U&start=1&q=http%3A/
Re:Google's Click History Asset (Score:5, Informative)
Each link in the search results on google has a onmousedown event attached.
If you have javascript enabled and click on it, then your browser will also execute the javascript, which sends a get request to google. They do log each link you click on.
check the source of any google search page.
The function that gets called for each onmousedown is called clk():
Re:Frequency of changes (Score:3, Informative)
Puh-leeeeze! That trick became ineffective last century. It's very easy for the search engine to check background colors and FONT tags and penalize the page that uses text that is too close to the background color.
Re:SEO (Score:4, Informative)
Doesn't work in slashdot because:
Re:On the minds of all slashdotters, (Score:4, Informative)
Have you already seen DOMAI [domai.com]? (NSFW)