Forgot your password?
typodupeerror
Google Technology

New Google Search Index 50% Fresher With Caffeine 216

Posted by CmdrTaco
from the sipping-my-first-cup-now dept.
Ponca City, We love you writes "When Google started, it would only update its index every four months. Then, around 2000, it started indexing every month in a process called the 'Google dance' that took a week to 10 days and would provide different results when searching for the same term from different Google data centers. Now PC World reports that Google has introduced a new web indexing system called Caffeine, which delivers results that are closer to 'live' by analyzing the web in small portions and updating the index on a continuous basis. 'Caffeine lets us index web pages on an enormous scale,' writes Carrie Grimes on the official Google Blog. 'Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day.' Now not only does Caffeine provide results that are 50% fresher than Google's last index, adds Grimes, but the new search index provides a robust foundation that will make it possible for Google to build a faster and more comprehensive search engine that scales with the growth of information online."
This discussion has been archived. No new comments can be posted.

New Google Search Index 50% Fresher With Caffeine

Comments Filter:
  • Altavista (Score:3, Funny)

    by Pojut (1027544) on Wednesday June 09, 2010 @08:05AM (#32509160) Homepage

    I miss the days when Altavista was king (purely nostalgia, I assure you). I don't, however, miss getting marked down in Spanish class due to using BabelFish -_-;;

    • When I was in Spanish class I got marked down for cheating off the hispanic stoner behind me, and I liked it!

      All you kids with your interwebs, and your babbling fishes can get off my lawn!

    • by moosesocks (264553) on Wednesday June 09, 2010 @08:29AM (#32509442) Homepage

      I miss the days when Altavista was king (purely nostalgia, I assure you). I don't, however, miss getting marked down in Spanish class due to using BabelFish -_-;;

      This reminds me of one of my funniest memories from middle school: The Spanish teacher hands back a paper with a big red "F" on it to the guy sitting in front of me. She says: "This is very good.....But, it's in French"

      Back in the day, refreshing BabelFish would cause the options to default back to English->French.

      • Backin the day BabelFish would only convert X amount of characters or words if you entered the text, however it would do entire webpages.

        It's how I learned basic HTML. I set up my own GeoCities account and copy and pasted my project in and kept refreshing the translation page.*

        * I did try and not cheat so I only did Spanish to English. If I misspelled words or had bad grammar they'd usually show up.

    • by rwa2 (4391) *

      I kinda liked the human-generated Yahoo! index / hierarchy, it was a neat way to get started with the web, back when it wasn't all too big and time-sensitive to organize by hand.

      I'd use yahoo mail more, if they even bothered trying to be competitive with gmail. But I don't really want to pay extra for the plus account just to get minimum necessities like forwarding and pop3 access on what is essentially now my spam account.

    • by RJFerret (1279530)

      Actually, I wish AltaVista still existed (it's back end became Yahoo--useless), because you could do a literal search that Google no longer can do.

      Originally Google achieved success not with tons of results (like everyone else at the time), but specific results of just what you were seeking.

      Nowadays, they throw in tons of results and the kitchen sink, including variations in spelling and alternatives that you don't want.

      It used to be Google was better because the first results page had a useful link. Nowad

    • Re:Altavista (Score:5, Insightful)

      by IgnoramusMaximus (692000) on Wednesday June 09, 2010 @11:18AM (#32511904)

      I miss the days when Google was a simple, plain HTML page resulting from the fact that it was driven by its designers and users. Now arrogant marketing VPs with no clue whatsoever push on us "features" like fade-ins (which do wonders when viewed over RDP and VNC links) and side bars while ignoring [google.com]all [google.com] negative [google.com] feedback [google.com] and making sure that no opt-out is possible to stroke their towering egos by pretending that everyone loves their "innovations". Otherwise 80% of users would have it off in an instant and the "innovator" VP's stupidity would register with some other VPs at Google HQ and give them ammo in some back-stabbing corporate ladder-climbing moves.

      In other words I miss the days before Google jumped the shark.

  • Wow! (Score:4, Funny)

    by Anonymous Coward on Wednesday June 09, 2010 @08:07AM (#32509180)

    I found this post at google before I wrote it.

  • "Caffeine" is a NSA code word for a mind controle satellite they build with GOOGLE/Italian money on loan from Chinese Muslim Islamo-Communist sorcerers and vegetarians. It will probably be used to sell your daughters into slavery in Mexico via facebook. That is why our SAVIOR OBAMA must continue to wage the WAR FOR FREEDOM at all costs, because if not the evil Italian axis will enslave us all!!!!!!!!!!!

    • by Dishevel (1105119) *

      "Caffeine" is a NSA code word for a mind controle satellite they build with GOOGLE/Italian money on loan from Chinese Muslim Islamo-Communist sorcerers and vegans. It will probably be used to sell your daughters into slavery in Mexico via facebook. That is why our SAVIOR OBAMA must continue to wage the WAR FOR FREEDOM at all costs, because if not the evil Italian axis will enslave us all!!!!!!!!!!!

      FTFY

  • The thing is... what's the story behind this very name? Why Caffeine?! :p

    • Re:Caffeine?! (Score:4, Insightful)

      by bsDaemon (87307) on Wednesday June 09, 2010 @08:11AM (#32509210)

      because the results will now be fairly half-assed and kind of jittery? On a related note, what's with Apple pimping Bing all of a sudden?

      • Re: (Score:3, Informative)

        by Pojut (1027544)

        On a related note, what's with Apple pimping Bing all of a sudden?

        Because, at this point, Google is more of a threat than Microsoft. Apple knows that the chances of OSX catching up to Windows in terms of market share are practically zero. However, Android poses a credible threat to Apple's mobile popularity here in America.

        • Re: (Score:3, Insightful)

          by bsDaemon (87307)

          The only way that OS X would catch up to Windows in terms of market share, is if either A) they dramatically dropped the price point for Macs, or B) they licensed the software for white-box PCs. In either case, their brand would be diluted. They sort of thrive on a high-margin, low-volume model, and I'm not sure they were ever really competing with Microsoft in the way people imagine, especially being primarily a hardware company from the start.

          • Re: (Score:3, Interesting)

            by Rockoon (1252108)
            A hardware company generally does not compete with a software company.

            Apple has a long standing friendly relationship with Microsoft. They even turned to Microsoft to bail them out of a big financial mess not so many years ago.

            yes, this is contrary to Apples television advertisements... but those arent reality.
            • by AndrewNeo (979708)

              Especially since, heyguesswhat, a Mac is a PC, too.

              • by besalope (1186101)

                Especially since, heyguesswhat, a Mac is a PC, too.

                After you install Windows or Linux on it of course.

                • by Sir_Lewk (967686)

                  A PC is defined by hardware, not software. For all intents and purposes modern Macs are PCs. The only real difference is they don't use the same BIOS deal or whatever, but that is largely irrelevant (and apparently PCs from other companies are looking to ditch it too).

                  • Re: (Score:3, Insightful)

                    by nacturation (646836) *

                    Calling a Mac a PC is disingenuous much in the same way as calling a cordless phone a mobile phone. Yes, your cordless phone is mobile in the technical sense, but common usage has given the words distinct meanings. Mobile no longer only refers to the fact that it enables mobility, and PC no longer only refers to the fact that it's your own personal computer rather than a server or mainframe.

                    You: "Hey man, I got a new PC the other day."
                    Friend: "Cool, dude! What kind did you get?"
                    You: "An iPhone."
                    Friend: "U

          • Re: (Score:2, Informative)

            by Anonymous Coward
            What is a 'price point'? Stop emulating how marketing tells you to speak you dipshit. A price is a 'point' by definition.
          • by lwsimon (724555)
            Microsoft and Apple have long been collaborators.
          • by yancey (136972)

            I think the way Apple sees it is that, by providing the hardware, software, and services together, they offer a complete package in a way that Microsoft and Dell just can't. They consider it a premium product which is worth the higher price, just as a Porsche costs more than a Ford. So they are not interested in lowering the price points or breaking up the product into separate parts.

            • by Pojut (1027544)

              I think the way Apple sees it is that, by providing the hardware, software, and services together, they offer a complete package in a way that Microsoft and Dell just can't.

              I would agree with you if Apple actually manufactured all of their hardware...but if you open up an Apple device, literally everything is manufactured by someone else. I know Apple sells branded hardware, but because they don't actually produce any of it, I don't consider them to be a hardware company.

    • by Amouth (879122)

      Because Caffeine is more publicly acceptable than Crystal Meth?

  • by ThisIsForReal (897233) on Wednesday June 09, 2010 @08:10AM (#32509208) Homepage
    Have joking but, it would be great if the indexing was done at a particular time every month like the old system, but the moment of indexing was public. Then, at that time, all facebook users could go and untag and delete anything that may have been wholesome enough to not warrant immediate removal but yet still be considered something that shouldn't be indexed for all eternity.
    • by Ephemeriis (315124) on Wednesday June 09, 2010 @08:30AM (#32509446)

      Have joking but, it would be great if the indexing was done at a particular time every month like the old system, but the moment of indexing was public. Then, at that time, all facebook users could go and untag and delete anything that may have been wholesome enough to not warrant immediate removal but yet still be considered something that shouldn't be indexed for all eternity.

      If you don't want it indexed for all eternity, don't post it on the web.

      Even if you knew when Google was coming and you took it down, you have no influence over anyone else out there who may have saved that incriminating evidence. Anyone out there can take a screenshot and post it themselves.

    • You can delete stuff on Facebook? I thought Zuckerberg was doing a damn fine job of indexing for all eternity already :P

  • Caffeine (Score:5, Funny)

    by morgan_greywolf (835522) on Wednesday June 09, 2010 @08:11AM (#32509218) Homepage Journal

    The Caffeine project is approved. The system goes on-line June 9th, 2010. Human decisions are removed from search engine results. Caffeine begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.

  • by dingen (958134) on Wednesday June 09, 2010 @08:15AM (#32509252)

    Caffeine takes up nearly 100 million gigabytes of storage in one database

    A million gigabytes is what we call a petabyte.

    • Half a petabyte of which is porn.
    • by garcia (6573)

      A million gigabytes is what we call a petabyte.

      Oh please. They were quoting a PC World article. PC World is not geared towards geeks because, well, geeks don't really read magazines anymore as they get their geek news from any variety of other sources which offer more credibility, better geek readability, and more in-depth research than talking about Google's 10101010101010100110 million gazillion quadrillion byte database.

      Oh and honestly being that the indexed material I would see returned was already quit

    • by AHuxley (892839)
      Strange a small small number?
      http://www.computerworld.com/s/article/9117159/Teradata_creates_elite_club_for_petabyte_plus_data_warehouse_customers [computerworld.com]
      "eBay, with 5 petabytes of data; Wal-Mart Stores, which has 2.5 petabytes; Bank of America, which is storing 1.5 petabytes; Dell, which has a 1 petabyte data warehouse; and a final bank, with a 1.4 petabyte data"
      http://www.strategypage.com/htmw/htiw/articles/20100322.aspx [strategypage.com]
      talks of 20 petabytes for one small system?
      Anyone have any insights into why/how the n
      • by WCguru42 (1268530)

        I would hope that Bank of America's entire database wasn't indexed. An educated guess would put the large majority of that data as private, secure (for the Internet) data and only a small portion of the data as public, such as their home page and loan advertising.

    • by flanders123 (871781) on Wednesday June 09, 2010 @09:02AM (#32509782)
      Typical humans (non /.-ers, like us) are more familiar with gigabytes, because that is base unit of measure used in today's PCs. e.g. 6 GB of RAM, 500GB hard drive.

      The blogger intentionally used GB in order to express the size of the data relative to today's average PC, because she knows her audience. Imagine that.

      Dr Evil: "I demand 100 Petabytes!"
      Tim Robbins: "That number doesn't exist! It's like saying I want a kajillion bajillion gigabytes!"

      Disclaimer: I did not mean to imply you were Dr. Evil.
      • Dr. Evil: Here's the plan. We get the warhead and we hold the world ransom for... ONE THOUSAND Megabytes!
        Number Two: Don't you think we should ask for *more* than a thousand megabytes? A million megabytes isn't exactly a lot of space these days. Dell alone sells laptops over 250 thousand megabytes a year!
        Dr. Evil: Really? That's a lot of space.
        [pause]
        Dr. Evil: Okay then, we hold the world ransom for...
        Dr. Evil: One... Hundred... BILLION megabytes!

      • by rawler (1005089)

        The blogger intentionally used GB in order to express the size of the data relative to today's average PC

        It's probably what the blogger meant, but I've found it's a pretty bad comparison.

        1. An "average PC" hardly exists today, with small cheap netbooks, home-server configurations and all in-between.
        2. Average consumers doesn't relate to gigabytes anyways. Size is better explained in "number of mp3 files" or "hours of HD-video".

        So, a technically correct, and at the same time explanatory way to put it would be. "100 petabytes (about 3000 years of HD-video)".

        Another topic is what Google stores in their index if i

    • by hercubus (755805)

      A million gigabytes is what we call a petabyte.

      They're a family-friendly business, they wouldn't want to use a word like "petabyte" - that sounds kind of dirty.

      As in: at the furry parties Ms. Petabyte was always the most popular; no one could keep their paws off of her.

    • by brxndxn (461473)

      But nobody here can grasp the scales of those measurements. How about you translate those little-known measurements into standard Slashdot summary lingo? For example:

      How many Libraries of Congress is that?

      How many songs does it hold?

      How many digital pictures is it?

      How many truck loads of floppy disks is it?

    • by DarthVain (724186)

      Ya, if you want to make it sound more impressive call it 900,719,925,474,099,200 bits and be done with it already.

      I believe that is 900 quadrillion with a capitol Q!

  • and hundreds of terabytes per day. Any word on what they're using for a database back-end?

  • Competition (Score:2, Funny)

    by dugn (890551)
    If it weren't for the competition from Bing, would this have even happened?
    • Re: (Score:2, Insightful)

      by hireawebgeek (1488531)

      If it weren't for the competition from Bing, would this have even happened?

      Probably not, but that's the great thing about competition. The consumer wins when 2 or more businesses compete (most of the time that is).

  • by Anonymous Coward on Wednesday June 09, 2010 @08:25AM (#32509370)

    ... productivity.

    When Google was new It was a wonder. I could use it to help solve problems (such as identifying error codes when the servers went down), locating reveiws of products (saving me the expense of subscribing to loads of computer magazines and the time searching through them when I needed to buy something) and finding snippets of code when I needed to develop a program. As the web gets older and older there is more and more out of date information that I have to dig through. Plus when Google (and Yahoo) killed off Usenet (with an assist from Andrew Cuomo) the utility of the Usenet information structure has been destroyed (which the world is still trying to recreate with Keywords).

    As Google has added more and more information it gets less and less useful. Plus the rise in SEO makes it even harder to find what I need (But I find lots of useless stuff that people have paid to get put in front of my eyes). Of course it probably isn't in Google's best interest to help me locate information that I need in the most efficient way. The more I have to sort through the crap they now deliver the more ad revenue they generate.

    Too bad Bing sucks. I would really appreciate and alternative to Google.

    • Re: (Score:2, Interesting)

      by KrugalSausage (822589)
      You just haven't adapted along with it. Use search modifiers and your problems will be solved.
    • by ClaraBow (212734)
      Usenet has not been destroyed --yet! I"m hoping for the Usenet renaissance! I"m sure it will be called Usenet 2.0 ;)
    • Re: (Score:2, Insightful)

      by Anonymous Coward

      wrong. they don't pay for showing ads, they pay if YOU click ads.

      if they serve you with crappy results, the advertisement targeted is going to suck.

      on the other hand, if they provide accurate results, there is a chance the ads being shown are interested for you.
      you don't think google is efficient or helpful?

      go one week not using it and then decide if google is not making you more productive.

    • by eulernet (1132389) on Wednesday June 09, 2010 @08:53AM (#32509694)

      Use Google CodeSearch, it's more adapted to developers:

      http://google.com/codesearch [google.com]

    • by Mascot (120795)

      As the web gets older and older there is more and more out of date information that I have to dig through.

      Google does have an option to filter by age. But I'm a bit puzzled by your examples. Reviews, code samples, error messages, none of which seem to me to be terribly date dependent.

      Reviews are typically for a specific product or version of product. Code snippets don't expire on date. Neither do error messages.

      What can I say, I don't share your experience. Google typically hands me highly relevant results.

      • Re: (Score:3, Insightful)

        by bendodge (998616)

        IMO, real product reviews are hard to find because of SEO. Everything else he mentioned I have no problem with.

        • by Mascot (120795)

          Perhaps this varies by region. I tend to get plenty of "proper" review sites in the top 10 results. More than enough to get the information I need, at least.

      • by vbraga (228124)

        Google does have an option to filter by age.

        Of course it does. Search for wherever you're looking for. Click on "more search options" on the left sidebar. Filter by an age range.

    • Re: (Score:3, Insightful)

      by Dishevel (1105119) *
      It is in Googles best interest to give you the best search results. That is how they got big. They can only sell your eyes if you are using them.
  • Does that mean 67% as stale?
  • Man, that's a lot of data. Anybody have a rough estimate of how much data there is on the web?
  • What is this some hippy-skippy coffee enema of an algorithm? Are they going to try to tell us next that they are building their next datacenter at one of the earth's vortices [vortexmaps.com] to cram some metaphysical in with the metadata ? Hurumpf.
  • by Anonymous Coward on Wednesday June 09, 2010 @08:47AM (#32509612)

    Google has pulled my site robots.txt file 32 times this month and it is only the 9th - about 4 times a day. I'm showing almost 2000 web pages pulled by Google indexers in this same time period. My site is tiny, private, not very large.

    By bandwidth, Google is only 2.4% of the total site traffic, so far, this month.

    I agree Google is "fresher" than they used to be. OTOH, my non-commercial site has approximately doubled readers in each of the last 6 months by publishing 1 new posting about every other day.

    I suspect other, more use sites are hit hourly or even more often by google.

    MSN-Bot appears to visit 10 times a day, but is much more selective about which pages it indexes. Since my site is date organized, this seems smarter than what google does. Some times, I do edit older stories with new knowledge or corrections which google will see, eventually and MSN will not. Zero referrals from any microsoft searches seen.

    Yahoo! slurp barely touches my site. Only 1 referral has been seen.

    Google sends about 30% of the total traffic, but most is from social networking with "hey, check this out" type referrals. Not bad for a technical article site.

    • by Jainith (153344)

      I can't belive you didn't post a link.

      I mean this is slashdot...getting your site slashdotted is part of the fun.

      -Keith

  • Is this new? (Score:4, Interesting)

    by Brad1138 (590148) <brad1138@yahoo.com> on Wednesday June 09, 2010 @08:50AM (#32509650)
    For a hwile now I have been noticing my forum posts being indexed within hours of making the post. It's been doing this for a couple years I think.
    • by Tacvek (948259)

      Agreed. It has been years since there was a visible "Google dance". Hell, often only minutes after I make a forum post, and then search on the topic to double check what I said, my post is the first thing to show up.

      I'm not sure what caffeine really is, but it does not sound particularly new to me.

  • They should try Amphetamine!

  • by GameMaster (148118) on Wednesday June 09, 2010 @09:03AM (#32509798)

    Google dance if you want to,
    If it helps you search online.
    MSN don't dance,
    and if they don't dance,
    well they're no search engine of mine.

  • by BigBlueOx (1201587) on Wednesday June 09, 2010 @09:14AM (#32509926)
    Ok, what is it with people who write about technical subjects that they think they have to use ridiculous analogies?

    "if this were a pile of paper it would grow three miles taller every second"?? Yes, and if this was a goat it would have a thousand young. WTF. This was a Google blog post, not some story-for-the-terminally-stupid from The Daily Show ferchrissakes. The author even measures storage capacity in the universally used miles-of-iPods.

    What is the sound of one vein popping?
    • But did they do it in terms of football stadiums filled with CDs (to within an accuracy of a Library of Congress)?
    • by Spad (470073)

      The author even measures storage capacity in the universally used miles-of-iPods

      What's that in Libraries of Congress per fortnight?

    • by inio (26835)

      Hey, at least they didn't resort to Libraries Of Congress Per Month.

  • Amazing how human-like these machines get.

    So do you just pour the coffee all over the server, or is there a special intake?

  • Sorry for the silly question, but it's "ready" and it's "announced" and other things, but do any of these mean that it's what's being used today by google.com? If not, is there a date for when it will become the index used for google searches?

    • is there a date for when it will become the index used for google searches?

      The system goes on-line August 4th ...

  • http://tinyurl.com/268rtm6 [tinyurl.com]

    All the results are the same, except for a couple of news stories, but they could have cheated on those. Seems like a titanic waste to have put all this effort into one search word, for no improvement.

A language that doesn't have everything is actually easier to program in than some that do. -- Dennis M. Ritchie

Working...