Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet

Google Previews New Search Infrastructure 129

Google has announced a "developer preview" of a new search infrastructure, though one wouldn't have to be a developer to try it out. Google is asking for feedback on how the search results in the new regime stack up against the old. Matt Cutts has posted a mini FAQ. Some early testing indicates that the new search may be faster in some cases, and return more relevant results, than the old one. Those who attempt to game Google search for a living will be scrambling henceforth. Has anyone identified the new crawler bot in log files?
This discussion has been archived. No new comments can be posted.

Google Previews New Search Infrastructure

Comments Filter:
  • New crawler bot... (Score:5, Insightful)

    by Gavin Scott ( 15916 ) on Tuesday August 11, 2009 @02:17AM (#29020411)

    Why would there be a new crawler?? How many more copies of the Interwebs does Google need?

    G.

    • Re: (Score:3, Funny)

      by Thanshin ( 1188877 )

      Why would there be a new crawler?? How many more copies of the Interwebs does Google need?

      The answer to your question is: "Yes. Yes indeed."

      Thank you for betatesting our new rethoric responder.

    • Re: (Score:2, Interesting)

      by libcrypto ( 599315 )
      My thoughts exactly. They probably developed a new algorithm for finding the best results. There is no need for a new crawler. Found this link on search engine architecture which is helpful. http://infolab.stanford.edu/~backrub/google.html [stanford.edu]
    • Sure; because we all know the web is static and never changes.

      1. The web is growing at an exponential rate.
      2. The existing part of the web must be rechecked every so often for updates.

      The result; an ever-increasing demand for data processing. Smarter algorithms for what to crawl and how to process the resulting data is definitely a necessity to keep on top of things.

    • Re: (Score:3, Interesting)

      New crawlers are needed because the web is changing.

      1. The automated cross referencing system on some blogs requires new logic to identify which article is the true search target, and which ones are simply referencing that article.
      2. The increasing use of ajax techniques to update portions of a web page requires a new approach to crawling.
      3. Other new ways of delivering content are also forcing changes, but these two are sufficient to make the point. Teh intarwebs is changing, and teh spiders need to be redesign
  • by maxwell demon ( 590494 ) on Tuesday August 11, 2009 @02:24AM (#29020445) Journal

    The more relevant results may be just because the algorithm is new, so the SEOs couldn't yet optimize for it. If it really gives more relevant results will be seen after it is the main search algorithm for some time.

    Remember, in the beginning the old algorithm used to be very good in finding relevant results.

    • by CarpetShark ( 865376 ) on Tuesday August 11, 2009 @02:58AM (#29020579)

      Remember, in the beginning the old algorithm used to be very good in finding relevant results.

      I'm not convinced that the degradation is entirely due to SEO. Google used to be a much more technical search -- when you used specific terms, you got specific matches. It seemed to be very much like Altavista with AND between each term. Now, you get a mix of things, as if it was OR between each term. Granted, *that* could be just SEO.

      Secondly though, if you search for X, you're asked if you meant Y, and your search results already seem to be for the popular Y result they think you meant.

      Likewise, you used to be able to search for hyphenated-terms (I hyphenated all time because it's usually a character less, and requires less editing after the fact than putting quotes around words), but now, it seems to split them into two terms.

      I think google have dumbed down their search for people who don't know how to use search engines.

      • Re: (Score:3, Interesting)

        by dublindan ( 1558489 )
        I agree. What I hate is if I search for "foo bar baz" it seems to ignore that I put quotes around it.. If I put quotes, I'm looking for EXACT matches.. but Google seems to still treat it as foo OR bar OR baz... :'(
        • by dnwq ( 910646 ) on Tuesday August 11, 2009 @03:52AM (#29020791)
          I don't know about you, but I get exact matches for "foo bar baz" [google.com].
          • Re: (Score:2, Insightful)

            Well - I guess "EXACT" means different things to us then ...

            In my world "foo bar baz" is not the same as:

            "foo, bar, baz"
            "foo, :bar, :baz"
            "foo = bar = baz"
            "foo->bar->baz"

            Oh well ... could just be me ...

            • by Serious Callers Only ( 1022605 ) on Tuesday August 11, 2009 @05:56AM (#29021375)

              Google seems to ignore punctuation, that's why you'd get those results.

              You put in "foo, bar, baz", it searches for "foo bar baz". It does not search for foo OR bar OR baz, as you suggested, it just strips the punctuation, and then searches for that exact phrase. There's a guide to the methodology you can google for [google.com].

              I understand why they omit punctuation, but It'd be nice if you could ask it to search including punctuation easily (not sure if you can), as it makes searching for code or precise phrases (with puncutation) very difficult.

              • by arielCo ( 995647 )

                I understand why they omit punctuation, but It'd be nice if you could ask it to search including punctuation easily

                You can. Try +sig.ma [google.com] as opposed to sig.ma [google.com] and sigma [google.com]. You're basically telling it to be strict-er about that search term - less fancy stemming and all that.

                Still +u.n.c.l.e. [google.com] is basically a fail. Sigh.

              • by RealTime ( 3392 )
                If you are searching for code, use Google Code Search [google.com]. It supports regular expression matching, but it only searches source files (not the entire Web).
        • "foo bar baz" may be a bad example, but Google does selectively ignore terms even when you put them in quotes. It didn't used to and it drives me crazy. But you are correct, even if the example is flawed.

      • by CAIMLAS ( 41445 ) on Tuesday August 11, 2009 @04:34AM (#29020933)

        Too bad this can only be modded to +5. It needs to be made 'sticky' to the top of the thread (and every goddamn Google programmer's forehead, ever).

        Seriously: can we PLEASE have the ability to accurately filter things via syntax include/exclude and grouping again? I know it still 'works' but it doesn't work half a damn. Every once in a while I'll google for an error or some such and i'll have to prune it down to a handful of terms to even get results (and I know there should be more than just a handful for these kinds of things, because it's not uncommon.) Google is becoming almost useless for technical searches.

        • by PsychoSlashDot ( 207849 ) on Tuesday August 11, 2009 @07:00AM (#29021683)

          I could live with the current semantics just fine if there were two Google modes: research and purchase. When I search for "Laserjet 4000" in research mode, I'm explicitly saying that I'm searching for pages ABOUT Laserjet 4000 printers, and absolutely not looking for a way to BUY a Laserjet 4000. Contextually isolating these two modes would be hugely helpful. When I want to buy a Widget and I'm simply looking for the best deals, I don't want a bunch of pages where people are reviewing or discussing the product. When I want to fix my Widget, I don't want a bunch of pages trying to sell me a new one. Sometimes a mixture is good, but for me it usually isn't.

          • One word:
            ADS.
          • Re: (Score:1, Informative)

            by Anonymous Coward

            Try searching for -price

          • When I want to buy a Widget and I'm simply looking for the best deals, I don't want a bunch of pages where people are reviewing or discussing the product

            A catalyst for this may be the growing trend toward research on Google, buy on Bing (for the cashback). Bing is being relegated to a "purchase engine", for better or worse.

            If a tipping point is reached where people go to Bing first and avoid G altogether, then it seems logical that G would bring Froogle front and center to meet your needs.

          • Re: (Score:3, Informative)

            Comment removed based on user account deletion
      • by Paaskonijn ( 1220996 ) on Tuesday August 11, 2009 @04:41AM (#29020961)

        Secondly though, if you search for X, you're asked if you meant Y, and your search results already seem to be for the popular Y result they think you meant.

        Try searching for +X.

        • Try searching for +X.

          This is confusing lots of people. I should probably go find a Firefox extension that automatically fixes + and lets.me.use.dots for phrases again.

          • I also find the way Google drops terms annoying. The problem is that you can't simply add + (plusses) to all your search terms, because then Google won't search for near hits, like words with plurals and misspellings. You may not like that feature anyway, but personally I'm OK with that sort of doctoring of my search results.
      • Re: (Score:3, Interesting)

        by value_added ( 719364 )

        Google used to be a much more technical search ...

        I tend to agree, but IIRC, casual searches for technical terms were never that good. In my case, I invariably still get an unfiltered (read "near-endless") list of links to mailing list posts (identical content hosted by different list aggregators), or my favourite, the same frigging README file stored on what seems to be every other server on the internet. At least in the past, some of us could rely on usenet (as archived by Google groups) searches to sep

    • Re: (Score:3, Insightful)

      by Trepidity ( 597 )

      The web itself has changed too, for reasons other than SEO (though it's sometimes hard to tell which is which). PageRank isn't a universal law of nature, with the "best" result to any particular query being related to how many incoming links a particular site has. Rather, it's a heuristic based on something that often happened to be true--- the most useful information was located on pages at sites that were frequently linked to. It's possible that correlation is no longer as strong as it used to be.

      • That is a perfect point. And everyone seems to be saying the same thing as you, but they call it Google being gamed by SEO. No one should know how to optimize their site for search engines, the people doing the searching should be the ones doing absolutely all the optimizing. Based on if they liked the site, if it was helpful, etc.
    • The more relevant results may be just because the algorithm is new, so the SEOs couldn't yet optimize for it. If it really gives more relevant results will be seen after it is the main search algorithm for some time.

      Remember, in the beginning the old algorithm used to be very good in finding relevant results.

      I don't know if you've been on the Internet long enough to remember, but back in the days Before Google, search engines were uniformly horrible. Because of this, almost every website had a "links page"

  • beautiful

    http://www2.sandbox.google.com/ [google.com] - google without the ads!

  • by Zocalo ( 252965 ) on Tuesday August 11, 2009 @03:24AM (#29020669) Homepage
    Actually, I'm mostly fine with the speed and typical results I'm getting at the moment. What annoys me the most about searching is when the first several pages of results are full of links to places that require you to have an account before you can access the answer or download the file. If I could define a blacklist that automatically excludes some of the worst offenders from my queries, that would be worth far more to me than shaving a few milliseconds of each search.
    • Re: (Score:1, Informative)

      by Anonymous Coward

      You mean, like you _can_ do now if you're logged in?
      experts-exchange.com is completely banned from my searches.

      • by Anonymous Coward on Tuesday August 11, 2009 @04:31AM (#29020915)

        You can see content of experts-exchange.com "answer" using the "cached" link under the Google result, Then just scroll down past the bogus posts and you'll see the real posts.

        • by astro-g ( 548659 )
          why bother? its all awefull anyway
        • Re: (Score:3, Interesting)

          All that matters is that your referrer is google. Doesnt have to be cached-- if what you see on the live page is different from what the googlebot sees, google will drop them from the results for SEO violations.
        • Re: (Score:1, Informative)

          by Anonymous Coward

          You can see content of experts-exchange.com "answer" using the "cached" link under the Google result, Then just scroll down past the bogus posts and you'll see the real posts.

          You don't even need to use the cached version: the real pages themselves contain the answers at the bottom.

    • by shird ( 566377 )

      You can already do this using domain exclusion, possibly also by creating a 'custom search'. Or you may be looking for a site such as:
      http://www.googeefree.com/ [googeefree.com]

      However, keep in mind experts exchange does actually publicly display the results, you just have to scroll down.

    • Re: (Score:3, Informative)

      by cyclomedia ( 882859 )
      And the fact that if you ever search for the name of a piece of software the first 100 results are brothersoft.com, getyourfreeshithere.com, freesoftwarefix.com, warezfactory.com etc etc etc etc
  • by Anonymous Coward

    I entered "search engine" on the old infrastructure as well as the new. On the old engine, two of the hits on the first page were for bing.com and msn.com. On Google's new infrastructure neither of those sites shows up on the first page.

    Maybe they are taking a page out of Microsoft's book?

  • Turns out I'm much more relevant according to the new search than in the older one.

    I have a long name (first name + 3 names). Previously, I would need to include at least my first name and two other names so I would be the first result. Now, a search for first name + second name already shows me at the top (even though there was a famous soccer player in Brazil, before I was born, with the same name).

    So, it is more relevant *for me*, but it's likely anyone who's isn't related to software development, would

  • by CAIMLAS ( 41445 ) on Tuesday August 11, 2009 @04:29AM (#29020905)

    I don't know about anyone else, but I used to get much more search-contextual information on fringe information from Google, even when compared to a highly-tailored search. I don't know if Google does its indexing differently now, or if it's indexing/crawling different subsets of data, but the results are not only different, but often less useful in an academic/info-junkie sense.

    For instance, searing for "hammurabi" now results in Wikipedia being the first link. This is true for most searches where there's a wiki page, and for many where the search phrase is simply mentioned in the wp page (yet there is no individual wp page for the topic). A lot of the sites I've got bookmarked when researching superstitions and myth surrounding his code (giants, atlantis, etc.) which are still present do not show up in the search results today - but did around 2003.

    Likewise, search for anything which might have current cultural significance ('bush war crimes') and then compare it to something that had cultural significance just a couple years ago ('saddam war crimes'). The results are drastically different and (in the case of the former) cater to lazy people; they also make actually finding a -site- (as opposed to just a 'current event' article) on the topic somewhat more frustrating. (This is just an example, though there are plenty of other similar situations - forgive my 3am brain.)

    Now, it might be that Google has actually gotten a lot better at returning pertinent results: so good that those little things I see and go "ohhh interesting! *click*" don't occur nearly as often, and as an info junkie, I view google as having degraded.

    Who knows. Still head over heels better than Bing or anything else out there, as far as I'm concerned. I'm glad more progress on 'searching better' is being made. I just wish they'd not clog the works making -cultural- assumptions about what I'm after and stick to the semantics of my search phrases.

  • by Paaskonijn ( 1220996 ) on Tuesday August 11, 2009 @04:46AM (#29021005)
    I see that name searches for unimportant people (like myself) don't put the Facebook, Netlog, Myspace, ... results on top anymore.
    Progress!
    • by pamar ( 538061 )

      I see that name searches for unimportant people (like myself) don't put the Facebook, Netlog, Myspace, ... results on top anymore.

      Progress!

      You have pipl.com for that...

  • by Anonymous Coward

    1. From what I have seen, improved results are not coming from a different algorithm, but from an improved indexing. Long tail keyword searches are more likely to be influenced in these cases (where sites that rank might also be on the verge of falling through the cracks of Google's new indexing patterns)

    2. From my experience, there appears to be a marked improvement in speed.

    3. Don't under estimate the power of the Top 10. One thing that Google does very well is it only rarely screws with a simple top 10

  • by Tsu Dho Nimh ( 663417 ) <abacaxi@hotm[ ].com ['ail' in gap]> on Tuesday August 11, 2009 @07:26AM (#29021829)

    This is going to mess up the content spinners and the paragraph swappers who are trying to either attract ads or build a link farm. Those who have well-build, informative, content-rich pages can sit back and watch the fun.

    "Content Spinning" [associatedcontent.com] explained, kinda sorta

  • I wonder if the new search engine will crawl this page appropriately to get the feedback they're after ;)
  • Try out the timeline view. It's pretty cool.

    Then try to input a search query that makes the timeline go back further than 4500BC.

    You can't do it, can you?

    We reason thusly:

    1. Google knows everything.
    2. Google says nothing happened before 4500BC, which is very close to the date calculated for creation in the Bible.
    3. Therefore, the universe must have been created by God about 6000 years ago.

    QED.

    (Did I do better or worse than an ID troll?)
  • I have a site with a keyword that occurs only in relation to my topic. Before, I was #5 in search, with the wikipedia page and the government site above mine, then 2 .com websites that feature a ton of other content besides my topic. The new search dumped the two .com sites, ha ha, now I'm #3! I'm sure they did a ton of SEO to get there, because I used to be #3 about 2 years ago until they came on the scene and bumped me down.

    Stupid wikipedia link is stuck at #1 and has been forever. And it's not becaus

  • My friends it all about money thru advertising.Google wins and we lose.
  • I'm been trying out Bing for the past month and prefer their results. I have to wonder if Google timed this new update because of the focus Bing is getting? Google thrives on media attention and this release puts the webmaster focus back on them.

    ---

    Google results are not as clean and relevant as they once were...some result pages show video, news (plus it's irrelevant news most of the time), and some domains have sub domain search results. What happened to clean page results?

    ---

    Also one of the main reaso

  • comparegoogle.com has been helpful in finding difference in search results for the two algorithms. Just put in some keywords and see what changed. Could be helpful for SEO engineers.

"Today's robots are very primitive, capable of understanding only a few simple instructions such as 'go left', 'go right', and 'build car'." --John Sladek

Working...