Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet Patents

Cracking the Google Code... Under the GoogleScope 335

jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""
This discussion has been archived. No new comments can be posted.

Cracking the Google Code... Under the GoogleScope

Comments Filter:
  • Great (Score:2, Interesting)

    by future assassin ( 639396 ) on Tuesday May 10, 2005 @12:27PM (#12489454)
    Now I'll see more Get ranked #1 in search engines" spam.

    http://www.anologger.com/ [anologger.com]
  • It just occurred to me that, as Google changes its algorithms, it'll just create more business for the Search Engine Optimization consultant. When web sites drop in the Google rankings, they'll want to make changes to move back up, and will hire the SEO again to do so.
  • After link analysis (Score:5, Interesting)

    by Ars-Fartsica ( 166957 ) on Tuesday May 10, 2005 @12:30PM (#12489482)
    Its obvious Google and Yahoo are moving on to trust-based (or perceived trust) ranking for sites based on what they see users clicking on through the web accelerator, Yahoo's MyWeb, etc. Hopefully this will help grade down the obvious spam...although you only find out its spam by going to the page...we'll see.
  • by wcitech ( 798381 ) on Tuesday May 10, 2005 @12:31PM (#12489500)
    ...that google is still a "not evil" company? This proxy "web-accelerator" thing really still has me freaked out. Am I just paranoid or is there legitimate reason for concern?
  • SEO (Score:2, Interesting)

    by Anonymous Coward on Tuesday May 10, 2005 @12:37PM (#12489573)
    What do those guys actually *do* in any case? I mean, legitimately. I guess you can tweak things a bit, but... how much does that actually get you if you simply aren't a popular site?
  • Here's a thought: How about companies try to offer useful services rather than "optimize" their search engine results? I've gotten several top hits on Google by the complete accident of providing useful services or information in the past. Traditional advertising such as adclicks and dmoz listings also help. Not once have I wasted my time trying to game the system.

    Companies need to start realizing that making money is about providing what customers want. Advertising is a great way of getting your name out, but only a good product or service will actually carry through. So in that frame of thinking, I highly recommend that companies:

    • Stop looking at "cost cutting" by reduction, and start looking at "using existing resources to provide relavent products"
    • Start hiring employees who know what they're doing and listen to them
    • Stop wasting your money on search engine optimizations.
    • Be good to the customer, and the cutomer will be good to you. If you don't know why people are upset or unhappy, grab a couple off the street and ask.
  • Re:Yes (Score:2, Interesting)

    by scribblej ( 195445 ) on Tuesday May 10, 2005 @12:49PM (#12489711)
    Why? I wouldn't be the least bit surprised to learn that even now, installations of the OS known as "Tiger" vastly outnumber actual tigers.

    If there are more copies of Tiger than there are Tigers, then I'd say Google's just doing it's job.

    Not to be a Google apologist. I think this filing is too obvious to be patented. It's just taking the obvious things you would look at to rank a page, and looking at it one level removed. Instead of asking how many links there are to my page, we're asking how many and when they were created. Big deal.

  • Perhaps, or perhaps if Google changes its rankings enough, the SEOs' credibilities will be destroyed

    That would be great. Now that I've read TFA, it looks like Google's techniques a long way toward eliminating the fakery done by SEO's currently.

    As an aside, the article looks like it was written by an SEO consultant, as it contains a lot of advice about how to get good rankings under Google's patented approach. Interestingly, the recommended actions are mostly legitimate (offer interesing content, update regularly, don't try to create fake links to your site), but also some less-upfront techniques (make link-exchange deals with other sites and encourage bookmarking, for example).

  • Re:Yes (Score:3, Interesting)

    by 99BottlesOfBeerInMyF ( 813746 ) on Tuesday May 10, 2005 @01:06PM (#12489898)

    Interestingly enough, the top the results for "tiger" are a page about tigers, tiger direct, and the Apple page. These seem pretty reasonable to me. The OS is obviously something a lot of people are going to be looking for, but I'd still find it weird if real tigers were not the first link. For "panther" the results are Apple's page, then some pages on real panthers. For "jaguar" you get the car manufacturer, Apple, then real panthers. I wonder what will happen if you do a search on "tiger" a year from now.

  • Thank goodness (Score:1, Interesting)

    by Anonymous Coward on Tuesday May 10, 2005 @01:12PM (#12489956)
    I've been running a fairly popular website now for over two years, the main search term for it yields us at about position 6, a URL that hasn't even been online for that entire time is ranked 5. Maby now we can finally get moved up over the non-existent website.
  • by RonBurk ( 543988 ) on Tuesday May 10, 2005 @01:17PM (#12489995) Homepage Journal
    The first big mistake webmasters make when trying to understand how Google ranks search results is failing to grasp the idea of data mining. The Google folks come from a data mining background, the constantly write about data mining algorithms, it would be highly surprising if the bulk of the Google algorithm was not constructed via data mining.

    What does that mean? At the highest level, it means that most of the Google algorithm is constructed by a machine. You give the machine human-constructed examples of how to rank a sample set of pages (notice those want ads where Google is hiring people who can inspect and assess the quality of web pages?) and it then uses essentially brute-force techniques to test every possible combination of your ranking variables to find the simplest formula that ranks pages the same way the human did.

    There is no human at Google "twisting dials" to alter individual parameters of a formula. The machine constructs the algorithm, and it can therefore easily be so complex that no human can understand it. Tweaking the algorithm becomes a process of changing or adding to your "training set" of human-ranked pages, and letting the data mining process come up with a revised algorithm.

    For example, Google could invent a new variable called "category", and identify each page as belonging to category Astronomy, Botulism, Country, [...] and Other. Once that variable is thrown into the mix, then the Google "aglorithm" is essentially free to vary wildly from one type of subject matter to the next. For example, you might see someone with a Real Estate site swearing up and down that inbound links are no longer as important, while someone with an Astronomy site might swear that, no, inbound links are more important than ever. You can see exactly this kind of bickering in most of the forums that people who hope to do Search Engine Optimization frequent.

    The other big mistake people make in trying to see how to game the Google algorithm is "delay". In studying how people manage (or fail to manage) complex systems, psychologists learned that people generally would fail if a delay was introduced between their actions and the results of their actions.

    In one very simple test, people were charged with trying to stabilize the temperature in a virtual refridgerator. They had one dial, and there was exactly one piece of feedback: the current temperature in the fridge. However, they were not explicitly told that there was a delay between moving the dial and when the results of that action would stabilize.

    The responses of those test subjects was eerily similar to what we see in Google-gaming webmasters these days. Some people swore up and down that some human behind the scenes was directly tweaking the results to thwart whatever they did. Others became frustrated and decided that nothing they did really mattered, so they would just swing the dial back and forth between its minimum and maximum settings.

    What does this have to do with Google? These days, Google can change their algorithm relatively frequently, and the algorithm can vary by the relative date of various things. The net sum is, there's a delay between when your page is first ranked and when it is likely to arrive at a relatively stable ranking. This can drive webmasters nuts as they think they've done something clever to rank their page high, but then it drops a week later. Although it doesn't occur to them, the important question is: did the change cause the high ranking or did it cause the sudden decline?

    The few people who did master the simple refridgerator system? Well, they sounded more like some of the people who are more successful at gaming Google. Those folks tend to say things like: "just make one change and then leave it alone for a while to see what happens."

    Can you still game the Google algorithm? Undoubtedly in specific cases. But it's getting harder. The Google algorithm was always complex, but what's changing is that the days when a few variables (such as inbound link count) generally swamped the effects of all the others is drawing to a close. We are approaching the day when the best technique to rank highly with Google will be: sit down at your keyboard and make more good content every day.

  • by eluusive ( 642298 ) on Tuesday May 10, 2005 @01:26PM (#12490119)
    Click history? Incase you hadn't noticed google links are direct. There's no link to a google page that redirects. So,then, by what method do they obtain this mystical click information on me?
  • by killtherat ( 177924 ) on Tuesday May 10, 2005 @01:29PM (#12490157)
    What if Google starts to use a filter designed to elimnate the effect of text that is deemed 'unviewable'. Just check to see if the text color is the same as the background, if it is, ignore it.

    I thought of that is less then 30 seconds, what are the odds Google has already thought about it?
  • by Fëanáro ( 130986 ) on Tuesday May 10, 2005 @05:50PM (#12492991)
    Ok, this is geting weird

    i look at this page:
    http://www.google.com/search?q=test [google.com]

    The first result link looks like this:
    <a href=http://www.ets.org/toefl/ onmousedown="return clk(this,'res',1)">Welcome to TOEFL: The <b>Test</b> of English as a Foreign Language</a>
    at least in IE.
    In opera the javascript is missing

    try this (remove the space before the ?):
    wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "http://www.google.de/search
    ?q=test"
    compared to this:
    wget --user-agent="." "http://www.google.de/search
    ?q=test"
    the first page contains the javascript but the second does not, at least on my system.

    Can anyone confirm that?

  • by epee1221 ( 873140 ) on Tuesday May 10, 2005 @06:28PM (#12493329)
    I have to wonder about the effectiveness of using click histories. It seems to me that the only way any site is going to get a lot of clicks from google is if they're already near the top to begin with. A site that is good but new will be buried so far down that nobody will actually get to it. Is there any way around this?
  • by jrtom ( 821901 ) on Tuesday May 10, 2005 @09:28PM (#12494706) Homepage
    The parent post is largely composed of misinformation, ignorance and irrelevance. I'd suggest to its author that it might be a good idea to do some basic research before posting on a subject which is, I suspect, outside his area of expertise.

    (1) What you have described as Google's "algorithm" is a distortion of one particular technique used in data mining (actually machine learning, but we'll let the vocabulary slide); furthermore, no one other than a first-year AI/machine learning student would use exhaustive search in parameter space ("brute force") to come up with a solution. In fact, a very brief search on your favorite search engine (for, say, "PageRank algorithm") would reveal that the basic algorithm is actually very simple, and does not in fact involve learning from labeled examples, as you suggest. (More recent versions of the Google ranking mechanism may safely be assumed to be more sophisticated, but I'd bet serious cash that they're nothing like what you describe.)

    (2) PageRank--the basic algorithm, that is--is not, and never has been, based, even in part, on inbound link count. This can also be easily verified by a few minutes' research as above.

    (3) Your refrigerator example doesn't actually support your point. If Google's ranking algorithm is continually changing, as you suggest, then you can never know whether any change you made had any effect on your ranking. (And "algorithm can vary by the relative date of various things"? Say what?)

The use of money is all the advantage there is to having money. -- B. Franklin

Working...