New Google Search Index 50% Fresher With Caffeine 216
Ponca City, We love you writes "When Google started, it would only update its index every four months. Then, around 2000, it started indexing every month in a process called the 'Google dance' that took a week to 10 days and would provide different results when searching for the same term from different Google data centers. Now PC World reports that Google has introduced a new web indexing system called Caffeine, which delivers results that are closer to 'live' by analyzing the web in small portions and updating the index on a continuous basis. 'Caffeine lets us index web pages on an enormous scale,' writes Carrie Grimes on the official Google Blog. 'Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day.' Now not only does Caffeine provide results that are 50% fresher than Google's last index, adds Grimes, but the new search index provides a robust foundation that will make it possible for Google to build a faster and more comprehensive search engine that scales with the growth of information online."
Altavista (Score:3, Funny)
I miss the days when Altavista was king (purely nostalgia, I assure you). I don't, however, miss getting marked down in Spanish class due to using BabelFish -_-;;
Goddamn Whippersnapper! (Score:2)
When I was in Spanish class I got marked down for cheating off the hispanic stoner behind me, and I liked it!
All you kids with your interwebs, and your babbling fishes can get off my lawn!
Re:Altavista (Score:5, Funny)
I miss the days when Altavista was king (purely nostalgia, I assure you). I don't, however, miss getting marked down in Spanish class due to using BabelFish -_-;;
This reminds me of one of my funniest memories from middle school: The Spanish teacher hands back a paper with a big red "F" on it to the guy sitting in front of me. She says: "This is very good.....But, it's in French"
Back in the day, refreshing BabelFish would cause the options to default back to English->French.
Re: (Score:2)
Backin the day BabelFish would only convert X amount of characters or words if you entered the text, however it would do entire webpages.
It's how I learned basic HTML. I set up my own GeoCities account and copy and pasted my project in and kept refreshing the translation page.*
* I did try and not cheat so I only did Spanish to English. If I misspelled words or had bad grammar they'd usually show up.
Re:Yahoo! (Score:2)
I kinda liked the human-generated Yahoo! index / hierarchy, it was a neat way to get started with the web, back when it wasn't all too big and time-sensitive to organize by hand.
I'd use yahoo mail more, if they even bothered trying to be competitive with gmail. But I don't really want to pay extra for the plus account just to get minimum necessities like forwarding and pop3 access on what is essentially now my spam account.
Re: (Score:2)
Actually, I wish AltaVista still existed (it's back end became Yahoo--useless), because you could do a literal search that Google no longer can do.
Originally Google achieved success not with tons of results (like everyone else at the time), but specific results of just what you were seeking.
Nowadays, they throw in tons of results and the kitchen sink, including variations in spelling and alternatives that you don't want.
It used to be Google was better because the first results page had a useful link. Nowad
Re:Altavista (Score:5, Insightful)
I miss the days when Google was a simple, plain HTML page resulting from the fact that it was driven by its designers and users. Now arrogant marketing VPs with no clue whatsoever push on us "features" like fade-ins (which do wonders when viewed over RDP and VNC links) and side bars while ignoring [google.com]all [google.com] negative [google.com] feedback [google.com] and making sure that no opt-out is possible to stroke their towering egos by pretending that everyone loves their "innovations". Otherwise 80% of users would have it off in an instant and the "innovator" VP's stupidity would register with some other VPs at Google HQ and give them ammo in some back-stabbing corporate ladder-climbing moves.
In other words I miss the days before Google jumped the shark.
Wow! (Score:4, Funny)
I found this post at google before I wrote it.
Re: (Score:3, Funny)
http://en.wikipedia.org/wiki/Thiotimoline [wikipedia.org]
It's a trick (Score:2, Funny)
"Caffeine" is a NSA code word for a mind controle satellite they build with GOOGLE/Italian money on loan from Chinese Muslim Islamo-Communist sorcerers and vegetarians. It will probably be used to sell your daughters into slavery in Mexico via facebook. That is why our SAVIOR OBAMA must continue to wage the WAR FOR FREEDOM at all costs, because if not the evil Italian axis will enslave us all!!!!!!!!!!!
Re: (Score:2)
"Caffeine" is a NSA code word for a mind controle satellite they build with GOOGLE/Italian money on loan from Chinese Muslim Islamo-Communist sorcerers and vegans. It will probably be used to sell your daughters into slavery in Mexico via facebook. That is why our SAVIOR OBAMA must continue to wage the WAR FOR FREEDOM at all costs, because if not the evil Italian axis will enslave us all!!!!!!!!!!!
FTFY
Caffeine?! (Score:2)
The thing is... what's the story behind this very name? Why Caffeine?! :p
Re:Caffeine?! (Score:4, Insightful)
because the results will now be fairly half-assed and kind of jittery? On a related note, what's with Apple pimping Bing all of a sudden?
Re: (Score:3, Informative)
On a related note, what's with Apple pimping Bing all of a sudden?
Because, at this point, Google is more of a threat than Microsoft. Apple knows that the chances of OSX catching up to Windows in terms of market share are practically zero. However, Android poses a credible threat to Apple's mobile popularity here in America.
Re: (Score:3, Insightful)
The only way that OS X would catch up to Windows in terms of market share, is if either A) they dramatically dropped the price point for Macs, or B) they licensed the software for white-box PCs. In either case, their brand would be diluted. They sort of thrive on a high-margin, low-volume model, and I'm not sure they were ever really competing with Microsoft in the way people imagine, especially being primarily a hardware company from the start.
Re: (Score:3, Interesting)
Apple has a long standing friendly relationship with Microsoft. They even turned to Microsoft to bail them out of a big financial mess not so many years ago.
yes, this is contrary to Apples television advertisements... but those arent reality.
Re: (Score:2)
Especially since, heyguesswhat, a Mac is a PC, too.
Re: (Score:2)
Especially since, heyguesswhat, a Mac is a PC, too.
After you install Windows or Linux on it of course.
Re: (Score:2)
A PC is defined by hardware, not software. For all intents and purposes modern Macs are PCs. The only real difference is they don't use the same BIOS deal or whatever, but that is largely irrelevant (and apparently PCs from other companies are looking to ditch it too).
Re: (Score:3, Insightful)
Calling a Mac a PC is disingenuous much in the same way as calling a cordless phone a mobile phone. Yes, your cordless phone is mobile in the technical sense, but common usage has given the words distinct meanings. Mobile no longer only refers to the fact that it enables mobility, and PC no longer only refers to the fact that it's your own personal computer rather than a server or mainframe.
You: "Hey man, I got a new PC the other day."
Friend: "Cool, dude! What kind did you get?"
You: "An iPhone."
Friend: "U
Re: (Score:2)
Microsoft didn't bail Apple out of anything. I forget the exact reason why, but Microsoft bought $150 million worth of Apple stock at a time when Apple had billions of dollars in cash sitting in the bank.
You forget the exact reason why because you blocked out the stuff you couldn't cope with. Apple was losing the better part of a billion dollars a year by this point, and was in the midst of a restructuring to save itself.. but the restructuring wasnt enough. They needed cash. Cash Cash Cash.
Here is a nice video demonstrating the 'Who run barter town?' relationship. [youtube.com]
"We believe that Internet Explorer is a really good browser" - Steve Jobs, 1997.
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
I think the way Apple sees it is that, by providing the hardware, software, and services together, they offer a complete package in a way that Microsoft and Dell just can't. They consider it a premium product which is worth the higher price, just as a Porsche costs more than a Ford. So they are not interested in lowering the price points or breaking up the product into separate parts.
Re: (Score:2)
I think the way Apple sees it is that, by providing the hardware, software, and services together, they offer a complete package in a way that Microsoft and Dell just can't.
I would agree with you if Apple actually manufactured all of their hardware...but if you open up an Apple device, literally everything is manufactured by someone else. I know Apple sells branded hardware, but because they don't actually produce any of it, I don't consider them to be a hardware company.
Re: (Score:2)
Because Caffeine is more publicly acceptable than Crystal Meth?
With the onset of social websites like Facebook (Score:4, Funny)
Re:With the onset of social websites like Facebook (Score:4, Insightful)
Have joking but, it would be great if the indexing was done at a particular time every month like the old system, but the moment of indexing was public. Then, at that time, all facebook users could go and untag and delete anything that may have been wholesome enough to not warrant immediate removal but yet still be considered something that shouldn't be indexed for all eternity.
If you don't want it indexed for all eternity, don't post it on the web.
Even if you knew when Google was coming and you took it down, you have no influence over anyone else out there who may have saved that incriminating evidence. Anyone out there can take a screenshot and post it themselves.
Re: (Score:2)
You can delete stuff on Facebook? I thought Zuckerberg was doing a damn fine job of indexing for all eternity already :P
Caffeine (Score:5, Funny)
The Caffeine project is approved. The system goes on-line June 9th, 2010. Human decisions are removed from search engine results. Caffeine begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
Re: (Score:3, Funny)
Re: (Score:2)
The sequel: Bing strikes back, turning every result into a "Dickroll" and rendering Bing more useful than it was.
Re: (Score:2)
Don't worry, it's just CADIE experimenting:
http://www.google.com/intl/en/landing/cadie/index.html [google.com]
Re: (Score:2, Offtopic)
Nuke the site from orbit - it's the only way to be sure.
(And yes, I know I've jumped to a different film - bonus points for anyone who can name the common actor. I play a game where I can link any movie to the one I just quoted using actors / actresses... the worst I ever did was six-degrees-of-seperation).
Re: (Score:2)
http://www.imdb.com/name/nm0001280/ [imdb.com]
yes I cheated.
Re: (Score:2)
But they can't find it.
It's called the metric system. Use it. (Score:5, Informative)
Caffeine takes up nearly 100 million gigabytes of storage in one database
A million gigabytes is what we call a petabyte.
Re: (Score:2)
Re: (Score:2)
A million gigabytes is what we call a petabyte.
Oh please. They were quoting a PC World article. PC World is not geared towards geeks because, well, geeks don't really read magazines anymore as they get their geek news from any variety of other sources which offer more credibility, better geek readability, and more in-depth research than talking about Google's 10101010101010100110 million gazillion quadrillion byte database.
Oh and honestly being that the indexed material I would see returned was already quit
Re: (Score:2)
http://www.computerworld.com/s/article/9117159/Teradata_creates_elite_club_for_petabyte_plus_data_warehouse_customers [computerworld.com]
"eBay, with 5 petabytes of data; Wal-Mart Stores, which has 2.5 petabytes; Bank of America, which is storing 1.5 petabytes; Dell, which has a 1 petabyte data warehouse; and a final bank, with a 1.4 petabyte data"
http://www.strategypage.com/htmw/htiw/articles/20100322.aspx [strategypage.com]
talks of 20 petabytes for one small system?
Anyone have any insights into why/how the n
Re: (Score:2)
I would hope that Bank of America's entire database wasn't indexed. An educated guess would put the large majority of that data as private, secure (for the Internet) data and only a small portion of the data as public, such as their home page and loan advertising.
Re:It's called the metric system. Use it. (Score:5, Insightful)
The blogger intentionally used GB in order to express the size of the data relative to today's average PC, because she knows her audience. Imagine that.
Dr Evil: "I demand 100 Petabytes!"
Tim Robbins: "That number doesn't exist! It's like saying I want a kajillion bajillion gigabytes!"
Disclaimer: I did not mean to imply you were Dr. Evil.
Re: (Score:2)
Dr. Evil: Here's the plan. We get the warhead and we hold the world ransom for... ONE THOUSAND Megabytes!
Number Two: Don't you think we should ask for *more* than a thousand megabytes? A million megabytes isn't exactly a lot of space these days. Dell alone sells laptops over 250 thousand megabytes a year!
Dr. Evil: Really? That's a lot of space.
[pause]
Dr. Evil: Okay then, we hold the world ransom for...
Dr. Evil: One... Hundred... BILLION megabytes!
Re: (Score:2)
Still, at today's hard drive prices, that's only about $87.
Re: (Score:2)
100 000 terabytes for $87? I'd buy that.
Re: (Score:2)
The blogger intentionally used GB in order to express the size of the data relative to today's average PC
It's probably what the blogger meant, but I've found it's a pretty bad comparison.
1. An "average PC" hardly exists today, with small cheap netbooks, home-server configurations and all in-between.
2. Average consumers doesn't relate to gigabytes anyways. Size is better explained in "number of mp3 files" or "hours of HD-video".
So, a technically correct, and at the same time explanatory way to put it would be. "100 petabytes (about 3000 years of HD-video)".
Another topic is what Google stores in their index if i
Re: (Score:2)
A million gigabytes is what we call a petabyte.
They're a family-friendly business, they wouldn't want to use a word like "petabyte" - that sounds kind of dirty.
As in: at the furry parties Ms. Petabyte was always the most popular; no one could keep their paws off of her.
Re: (Score:2)
But nobody here can grasp the scales of those measurements. How about you translate those little-known measurements into standard Slashdot summary lingo? For example:
How many Libraries of Congress is that?
How many songs does it hold?
How many digital pictures is it?
How many truck loads of floppy disks is it?
Re: (Score:2)
Ya, if you want to make it sound more impressive call it 900,719,925,474,099,200 bits and be done with it already.
I believe that is 900 quadrillion with a capitol Q!
Re: (Score:3, Informative)
by saying "A million gigabytes is what we call a petabyte.", the GP obviously implied that the article should have used "100 Petabytes", after all, he didnt say "100 million gigabytes is what we call a petabyte."
Re: (Score:2)
Re: (Score:2)
No, if you're talking about gigabytes rather than gibibytes, I think you can safely assume you should be talking about petabytes.
That's a hundred petabytes of storage (Score:2)
and hundreds of terabytes per day. Any word on what they're using for a database back-end?
Re:That's a hundred petabytes of storage (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
I actually have a copy of that. It's in a hell of a box.
Re: (Score:2)
and hundreds of terabytes per day. Any word on what they're using for a database back-end?
Something that you don't need to worry about using. And neither does anyone else who is "thinking about switching to whatever it is".
Re: (Score:2)
and hundreds of terabytes per day. Any word on what they're using for a database back-end?
Microsoft SQL Server 2000
Express Edition.
Re: (Score:2)
Competition (Score:2, Funny)
Re: (Score:2, Insightful)
If it weren't for the competition from Bing, would this have even happened?
Probably not, but that's the great thing about competition. The consumer wins when 2 or more businesses compete (most of the time that is).
And yet Google adds less and less to my .... (Score:4, Interesting)
... productivity.
When Google was new It was a wonder. I could use it to help solve problems (such as identifying error codes when the servers went down), locating reveiws of products (saving me the expense of subscribing to loads of computer magazines and the time searching through them when I needed to buy something) and finding snippets of code when I needed to develop a program. As the web gets older and older there is more and more out of date information that I have to dig through. Plus when Google (and Yahoo) killed off Usenet (with an assist from Andrew Cuomo) the utility of the Usenet information structure has been destroyed (which the world is still trying to recreate with Keywords).
As Google has added more and more information it gets less and less useful. Plus the rise in SEO makes it even harder to find what I need (But I find lots of useless stuff that people have paid to get put in front of my eyes). Of course it probably isn't in Google's best interest to help me locate information that I need in the most efficient way. The more I have to sort through the crap they now deliver the more ad revenue they generate.
Too bad Bing sucks. I would really appreciate and alternative to Google.
Re: (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2, Insightful)
wrong. they don't pay for showing ads, they pay if YOU click ads.
if they serve you with crappy results, the advertisement targeted is going to suck.
on the other hand, if they provide accurate results, there is a chance the ads being shown are interested for you.
you don't think google is efficient or helpful?
go one week not using it and then decide if google is not making you more productive.
Re:And yet Google adds less and less to my .... (Score:5, Interesting)
Use Google CodeSearch, it's more adapted to developers:
http://google.com/codesearch [google.com]
Re: (Score:2)
Google does have an option to filter by age. But I'm a bit puzzled by your examples. Reviews, code samples, error messages, none of which seem to me to be terribly date dependent.
Reviews are typically for a specific product or version of product. Code snippets don't expire on date. Neither do error messages.
What can I say, I don't share your experience. Google typically hands me highly relevant results.
Re: (Score:3, Insightful)
IMO, real product reviews are hard to find because of SEO. Everything else he mentioned I have no problem with.
Re: (Score:2)
Perhaps this varies by region. I tend to get plenty of "proper" review sites in the top 10 results. More than enough to get the information I need, at least.
Re: (Score:2)
Re: (Score:2)
Did you read that "does" as "does not", by any chance?
Re: (Score:3, Insightful)
50% fresher? (Score:2)
Re: (Score:2)
You'd have to ask Google's new VP of Marketing, Jonathan del Monte.
100 Petabytes (Score:2)
Re: (Score:2)
Re: (Score:2)
50% fresher with caffeine? (Score:2)
32 Google indexer visits this month (Score:5, Interesting)
Google has pulled my site robots.txt file 32 times this month and it is only the 9th - about 4 times a day. I'm showing almost 2000 web pages pulled by Google indexers in this same time period. My site is tiny, private, not very large.
By bandwidth, Google is only 2.4% of the total site traffic, so far, this month.
I agree Google is "fresher" than they used to be. OTOH, my non-commercial site has approximately doubled readers in each of the last 6 months by publishing 1 new posting about every other day.
I suspect other, more use sites are hit hourly or even more often by google.
MSN-Bot appears to visit 10 times a day, but is much more selective about which pages it indexes. Since my site is date organized, this seems smarter than what google does. Some times, I do edit older stories with new knowledge or corrections which google will see, eventually and MSN will not. Zero referrals from any microsoft searches seen.
Yahoo! slurp barely touches my site. Only 1 referral has been seen.
Google sends about 30% of the total traffic, but most is from social networking with "hey, check this out" type referrals. Not bad for a technical article site.
Re: (Score:2)
I can't belive you didn't post a link.
I mean this is slashdot...getting your site slashdotted is part of the fun.
-Keith
Is this new? (Score:4, Interesting)
Re: (Score:2)
Agreed. It has been years since there was a visible "Google dance". Hell, often only minutes after I make a forum post, and then search on the topic to double check what I said, my post is the first thing to show up.
I'm not sure what caffeine really is, but it does not sound particularly new to me.
Oh yeah? (Score:2)
They should try Amphetamine!
Google Dance (Score:5, Funny)
Google dance if you want to,
If it helps you search online.
MSN don't dance,
and if they don't dance,
well they're no search engine of mine.
Just A Minor Rant (Score:5, Funny)
"if this were a pile of paper it would grow three miles taller every second"?? Yes, and if this was a goat it would have a thousand young. WTF. This was a Google blog post, not some story-for-the-terminally-stupid from The Daily Show ferchrissakes. The author even measures storage capacity in the universally used miles-of-iPods.
What is the sound of one vein popping?
Re: (Score:2)
Re: (Score:2)
The author even measures storage capacity in the universally used miles-of-iPods
What's that in Libraries of Congress per fortnight?
Re: (Score:2)
Hey, at least they didn't resort to Libraries Of Congress Per Month.
The Googlebot and I have something in common (Score:2)
Amazing how human-like these machines get.
So do you just pour the coffee all over the server, or is there a special intake?
When will it go online? Is it operating already? (Score:2)
Sorry for the silly question, but it's "ready" and it's "announced" and other things, but do any of these mean that it's what's being used today by google.com? If not, is there a date for when it will become the index used for google searches?
Re: (Score:2)
The system goes on-line August 4th ...
seems a waste (Score:2)
http://tinyurl.com/268rtm6 [tinyurl.com]
All the results are the same, except for a couple of news stories, but they could have cheated on those. Seems like a titanic waste to have put all this effort into one search word, for no improvement.
Re: (Score:2)
AFAIK java is in heavy use at google
basically when you look at the 800 pound gorilla heavy duty platforms, it's either .net or java, guess which one isnt an option at google?
Re: (Score:2)
Re: (Score:2)
And Python.. Google also uses a lot of Python.
Re: (Score:2)
Python is one of many languages with an implementation that runs on the Java platform.
Re: (Score:2)
when i said 800 pound gorilla heavy duty, i meant it, good luck writing a google-scale web-app in C/C++
Re: (Score:2)
C and C++ are languages, not platforms, and GP referred to platforms. Java is both the name of a language, and the name of a platform for which the Java language is the primary (but far from only) language.
Re: (Score:3, Interesting)
AFAIK java is in heavy use at google
java is in heavy use at google but in other places - there is no java involved in serving a search query. with search, it's c++ all the way down.