Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Patents

Cracking the Google Code... Under the GoogleScope 335

jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""
This discussion has been archived. No new comments can be posted.

Cracking the Google Code... Under the GoogleScope

Comments Filter:
  • by Anonymous Coward on Tuesday May 10, 2005 @12:28PM (#12489464)
    Cracking the Google Code... Under the GoogleScope
    Google's US Patent confirms information retrieval is based on historical data.

    Publication Date: 5/8/2005 9:51:18 PM

    Author Name: Lawrence Deon

    An Introduction: ...if you thought you cracked the Google Code and had Google all figured out ... guess again.

    Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again.

    Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.

    The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.

    What exactly do these changes mean to you?
    Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:

    "A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."

    Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.

    Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.

    Here's how Google scores your web pages.

    In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
    What's new and interesting is what Google takes into account in determining the freshness of a web page.

    For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.

    According to their patent filing Google records and scores the following web page changes to determine freshness.
    The frequency of all web page changes
    The actual amount of the change itself... whether it is a substantial change redundant or superfluous
    Changes in keyword distribution or density
    The actual number of new web pages that link to a web page
    The change or update of anchor text (the text that is used to link to a web page)
    The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
    Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.

    Developing your web page augments for page freshness.

    Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.

    Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.

    How do you unravel that statement and differentiate between the two types of content?

    An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.

    A page related to winter clothin
  • Six weeks to fix? (Score:2, Informative)

    by Anonymous Coward on Tuesday May 10, 2005 @12:35PM (#12489555)
    I use google quite a bit to check on recent spyware/malware (used it this morning) and with all due respect, the first few links typically are for spyware products that don't work, domain parking sites (search engines themselves), requiring some amount of diligence to get to the "real" sites that have information.

    If this claim is true, I guess we'll have to wait the typical "four to six weeks for delivery."
  • by nemexi ( 786227 ) on Tuesday May 10, 2005 @12:36PM (#12489564)
    One of the most interesting (and obvious) effects of Google's changes: The company which once ranked first for the phrase "search engine optimization", SEOinc, is now nowhere to be found -- even a search for the company's name doesn't bring up the company's website. SEOincs response has been a -- somewhat ineffective -- try to bring those reporting [outer-court.com] on its fall [battellemedia.com] to "cease and desist".
  • by RealProgrammer ( 723725 ) on Tuesday May 10, 2005 @12:37PM (#12489568) Homepage Journal
    I think this is the same article: google:www.coder.com [64.233.167.104]
    Google United - Google Patent Examined

    Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control.

    Publication Date: 4/7/2005 7:41:24 AM

    By Jim Hedger, StepForth News Editor, StepForth Placement Inc.

    Thoughts on Google's patent... "Information retrieval based on historical data."

    Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control. Some of the ideas and concepts covered in the document are almost certainly worked into the current algorithm running Google. Some are being worked in as this article is being written. Some may never see the blue-light of electrons but are pretty good ideas so it might have been considered wise to patent them. Google's not saying which is which. While not exactly War and Peace, it's a pretty complex document that gives readers a glimpse inside the minds of Google engineers. What it doesn't give is a 100% clear overview of how Google operates now and how the various ideas covered in the patent application will be integrated into Google's algorithms. One interesting section seems to confirm what SEOs have been saying for almost a year, Google does have a "sandbox" where it stores new links or sites for about a month before evaluation.

    Google is in the midst of sweeping changes to the way it operates as a search engine. As a matter of fact, it isn't really a search engine in the fine sense of the word anymore. It isn't really a portal either. It is more of an institution, the ultimate private-public partnership. Calling itself a media-company, Google is now a multi-faceted information and multi-media delivery system that is accessed primarily through its well-known interface found at www.google.com.

    Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll's over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn't work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn't work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.

    In its recent SEC filing, the first it has produced since going public in August 2004, Google said it was going to spend a lot of money to continue outpacing its rivals. This year they figure they will spend about $500 million to develop or enhance newer technologies. In 2004 and 2003, Google spent $319 million and $177 million respectively. The increase in innovation-spending corresponds with a doubling of Google's staff headcount which has jumped from 1628 employees in 2003 to 3021 by the end of 2004.

    Over the past five years Google has produced a number of features that have proven popular enough to be included among its public-search offerings. On their front page, these features include Image Search, Google Groups, Google News, Froogle, Google Local, and Google Desktop. There are dozens of other features which can be accessed by cli

  • Old Story (Score:1, Informative)

    by Anonymous Coward on Tuesday May 10, 2005 @12:40PM (#12489606)
    This story showed up on New scientist [newscientist.com] before (30 April).

    From the article: GOOGLE has plans that will dramatically improve the results of internet news searches, by ranking them according to quality rather than simply by their date and relevance to search terms. The ambitious system is revealed by patents filed in the US and around the world (WO 2005/029368) by researchers based at the company's headquarters in Mountain View, California.

  • by DeadSea ( 69598 ) * on Tuesday May 10, 2005 @01:11PM (#12489941) Homepage Journal
    How about sites that already provide a useful service and want to get as much exposure as possible? I can't count the number of useful sites that I've visited that are not ranked as well as google as I would like (so I can find them more easily) because they do non-Google-friendly things like:
    • Session IDs in urls
    • Doorway pages
    • Content that expires or changes urls
    • Javascript navigation

    Sometimes search engine optimization isn't about making a hack site rank well. Sometimes it is about getting the traffic that a really nifty site deserves.

    In fact, I wish all the legit sites did everything they should morally do in terms of SEO. Then the spam sites wouldn't have such an easy time pushing them out of the way.

    From a business perspective, money spent on making non-spammy search engine optimizations can be much more effective than money spent on marketing or public relations.

    --
    Scientific calculator with hex, octal, decimal, and binary [ostermiller.org]

  • Re:SEO (Score:3, Informative)

    by Intron ( 870560 ) on Tuesday May 10, 2005 @01:17PM (#12489999)
    There's a whole range. Some will tell you how to rewrite your web page so that search engines will classify it better. That seems legit. Others will try to sell you on "link farms" and other hacks to improve your ratings - not so legit. I've also seen spamming websites that have google-accessible logs with fake referrers, or spamming blogs like /. with links in your sig [place link here].
  • by Eric Damron ( 553630 ) on Tuesday May 10, 2005 @01:23PM (#12490068)
    There seems to be a lot of weight put on web page freshness. I host a friend's site containing the collection of poems by Ella Wheeler Wilcox. She lived in the 1800s so one cannot expect to see any new material from her.

    The site is mostly static but is rich with cultural value. It's currently the number one hit on Google. I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.
  • by Anonymous Coward on Tuesday May 10, 2005 @02:16PM (#12490688)
    Posting as AC to avoid the inevitable karma hit so here goes...

    I'm a former SEO guy...I've worked with many companies, large and small, to optimize their websites. I've done everything from online pharmaceuticals to christian mission trips. I've tried every trick in the book over a number of years...and I can tell you that as long as search engines exist, I seriously doubt the SEO companies will disappear.

    Why? Simply put, people (and companies) do NOT understand how to present their content in a way that an automated bot can read and rank them.

    Towards the end of my SEO consulting days, my advice was over and over again: Content is king. Build a good website, with good content, and make sure to include all the necessary elements to identify it as a good website.

    Usually, that meant that I would go through their site, fill in missing pieces and recommend additional content. No schemes, no crazy link deals ... just a good, search engine friendly site.

    So, keep in mind not ALL SEO guys are bad...just most of them...but companies will always need SEO guys to come in and fill in their site's holes.
  • by daeley ( 126313 ) on Tuesday May 10, 2005 @02:39PM (#12490950) Homepage
    Incase you hadn't noticed google links are direct.

    You sure about that? Try copying and pasting a Google results link.

    For example, let's search Google for "elluusive" [google.com]. The first result was your slashdot "homepage", at http://slashdot.org/~eluusive [slashdot.org], which at first glance seems to be a direct link. But if you right-click on the link and copy it, paste it somewhere and you'll find something along these lines:

    http://www.google.com/url?sa=U&start=1&q=http%3A// slashdot.org/~eluusive&ei=A_-AQubaOq2gYNujqccO [google.com]
  • by Fëanáro ( 130986 ) on Tuesday May 10, 2005 @02:59PM (#12491211)
    with onmousedown events.

    Each link in the search results on google has a onmousedown event attached.

    If you have javascript enabled and click on it, then your browser will also execute the javascript, which sends a get request to google. They do log each link you click on.

    check the source of any google search page.
    The function that gets called for each onmousedown is called clk():
    function clk(el,ct,cd){if(document.images){(new Image()).src="/url?sa=T&ct="+escape(ct)+"&cd="+esc ape(cd)+"&url="+escape(el.href).replace(/\+/g,"%2B ")+"&ei=gwKBQoX7GJKmQcONmN4B";}return true;}
  • by Tsu Dho Nimh ( 663417 ) <abacaxi@@@hotmail...com> on Tuesday May 10, 2005 @03:20PM (#12491464)
    "inserting very small text that's the same color as the background"

    Puh-leeeeze! That trick became ineffective last century. It's very easy for the search engine to check background colors and FONT tags and penalize the page that uses text that is too close to the background color.

  • Re:SEO (Score:4, Informative)

    by hankwang ( 413283 ) * on Tuesday May 10, 2005 @05:38PM (#12492883) Homepage
    spamming blogs like /. with links in your sig [place link here].

    Doesn't work in slashdot because:

    • Sigs are only visible for logged-in users (i.e. not for robots)
    • Posts without a karma bonus have the REL=NOFOLLOW attribute in the links, so that they don't count for Google.
  • by Gamasta ( 557555 ) on Tuesday May 10, 2005 @06:51PM (#12493509)
    Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young.
    Have you already seen DOMAI [domai.com]? (NSFW)

The one day you'd sell your soul for something, souls are a glut.

Working...