Slashdot Log In
Behind the Scenes At Google
Posted by
CmdrTaco
on Sun Apr 03, 2005 09:48 AM
from the they-should-document-the-cafeteria dept.
from the they-should-document-the-cafeteria dept.
An anonymous reader writes "University of Wahington TV Presents "behind the Scenes With Google." From the site: 'Search is one of the most important applications used on the internet and poses some of the most interesting challenges in computer science. Providing high-quality search requires understanding across a wide range of computer science disciplines. In this program, Jeff Dean of Google describes some of these challenges, discusses applications Google has developed, and highlights systems they've built, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. He also shares some interesting observations derived from Google's web data.' "
Related Stories
[+]
Developers: MapReduce — a Major Step Backwards? 157 comments
The Database Column has an interesting, if negative, look at MapReduce and what it means for the database community. MapReduce is a software framework developed by Google to handle parallel computations over large data sets on cheap or unreliable clusters of computers. "As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is: a giant step backward in the programming paradigm for large-scale data intensive applications; a sub-optimal implementation, in that it uses brute force instead of indexing; not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago; missing most of the features that are routinely included in current DBMS; incompatible with all of the tools DBMS users have come to depend on."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Google's dirty secret revealed (Score:5, Funny)
Re:Google's dirty secret revealed (Score:2, Funny)
Re:Google's dirty secret revealed (Score:3, Funny)
Network everybody together, eh? (Score:5, Funny)
Parent
Re:Google's dirty secret revealed (Score:5, Funny)
Parent
What -- I Have To Watch TV Now? (Score:5, Funny)
I can't absorb information I can't copy/paste.
Fsking video format. (Score:2, Insightful)
Re:Fsking video format. (Score:2)
Re:Fsking video format. (Score:4, Insightful)
Parent
Re:Fsking video format. (Score:4, Informative)
Download the
Parent
UW mirror (Score:4, Informative)
http://norfolk.cs.washington.edu/htbin-post/unrest ricted/colloq/details.cgi?id=274 [washington.edu]
Jeff Dean
Abstract Search is one of the most important applications used on the internet, but it also poses some of the most interesting challenges in computer science. Providing high-quality search requires understanding across a wide range of computer science disciplines, from lower-level systems issues like computer architecture and distributed systems to applied areas like information retrieval, machine learning, data mining, and user interface design. I'll describe some of the challenges in these areas, discuss some of the applications that Google has developed over the past few years. I'll also highlight some of the systems that we've built at Google, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. Along the way, I'll share some interesting observations derived from Google's web data. Jeff Dean joined Google in 1999 and is currently a Distinguished Engineer in Google's Systems Lab. While at Google he has worked on Google's crawling, indexing, query serving, and advertising systems, implemented several search quality improvements, and built various pieces of Google's distributed computing infrastructure. Prior to joining Google, he was at DEC/Compaq's Western Research Laboratory. He received a Ph.D. from the University of Washington in 1996 working with Craig Chambers on compiler optimization techniques for object-oriented languages.
OK then where the hell is (Score:2, Interesting)
i.e.
((gopher OR shrew OR egret) AND -(mole OR newt)) NEAR(range) ((evil OR "satan incarnate") AND (roe AND -chicken))
"In Italy for thirty years under the Borgias they had warfare, terror, murder and bloodshed but they produced Michelangelo, Leonardo da Vinci and the Renaissance. In Switzerland, they had brotherly love; they had five hundred years of democracy and peace and what did they produce? The cuckoo clock." -- Orson Welles (1915--1985
G4/TechTV (Score:2, Insightful)
Re:G4/TechTV (Score:5, Insightful)
Parent
5.6 Mbps? (Score:2, Funny)
I use Google at work (Score:2, Interesting)
Now I have some pretty important lists which I need to keep tight control over. The information really ought not be distributed outside my office. However, because of the nature of my business, I must do frequent searches using various search engines to fill in my lists.
How am I assured that my searches remain
Re:I use Google at work (Score:5, Informative)
If you want to keep something private, don't put it on the publicly accessible internet. Including searches. Duh.
How am I assured that my searches remain anonymous and secure with Google?
You aren't. Did you sign a contract to that effect? No.
And frankly, if you can find things with google, it isn't too secret.
Parent
Re:I use Google at work (Score:4, Funny)
You are about as anonymous as it gets.
Parent
Re:I use Google at work (Score:2, Funny)
Re:I use Google at work (Score:5, Insightful)
b) Use a different anonymizing proxy for _each_ single search, preferably using SSL.
c) Assume your searches AND non-encrypted web requests aren't anonymous and secure.
If I were running the NSA or some other spook agency, I'd tap the pipes leading to Google (and a few other sites too).
Same if I were a dubious org/agency.
Lots of finance institutions/orgs/ppl get the bulk of their info from just a few sources e.g. Bloomberg. So if Bloomberg gets/sends the bulk of their info down just a few pipes...
Parent
Re:I use Google at work (Score:3, Funny)
Hi Receptionist, Im looking at you" [google.com]
Few women in CS. (Score:3, Interesting)
50% female is the goal (Score:5, Interesting)
One of the thecnical female googerls mentioned how that was probably impossible, but by shooting for the impossible you acheive a lot more than you would have otherwise.
Parent
Re:50% female is the goal (Score:3, Informative)
Google & Backup (Score:3, Interesting)
Backups are for pussies. (Score:2, Funny)
Re:Google & Backup (Score:2, Insightful)
Images of clowns (Score:2, Interesting)
Re:Images of clowns (Score:2, Funny)
"Behind the scenes at Google" invokes images of clowns and mimes. Is it just me?
Yup - it's only you.
GFS (Score:2, Insightful)
It's quite nice to see a large corporation make a contribution to Open Source, especially in such a "R&D-esque" field as supercomputing.
Who said that Open Source only rehashes existing technologies and never does anything new?
Re:GFS (Score:4, Interesting)
I'm sorry, did I miss the point at which Google made an open source implementation of GFS? Last I knew, the only docs for GFS were the papers that Google published on the concept. And those papers (unfortunately) seemed to lack a few of the finer details of implementation.
Parent
Re:GFS (Score:5, Informative)
Here's Red Hat:
http://www.redhat.com/software/rha/gfs/ [redhat.com]
Here's Google:
http://www.cs.rochester.edu/sosp2003/papers/p125-
http://64.233.161.104/search?q=cache:m0TMQYgIlIoJ
Parent
Re:GFS (Score:4, Insightful)
Considering that it's in vogue to name file systems with one letter in front of "FS"? About 1 in 26. The odds are even better if you discount commonly used file systems such as XFS, UFS, FFS, NFS, and JFS.
Parent
Re:GFS (Score:2, Informative)
WTFV? (Score:5, Funny)
The average slashdotter has an attention span of 5 secon.. ooh look a birdie!
MiMMS (Score:3, Informative)
mimms mms://media-wm.cac.washington.edu/ifs/uw_cse05_goo gle_1300k.asf
Of course, a torrent would be even bette
here is a transcript of the first 12 minutes (Score:3, Informative)
(speech from this point...)
lots of people use google but i want to give you a flavour for what happens and what we are working on for our new systems and products. i'll focus on what are the interesting problems that crop up when you organize large amounts of information, like we do, and what you can do with lots of data and computational resources. i'll also talk about our engeneering organization.
google ha a mission statement that i like - to organize the worlds information and make it universally accessible and useful. we've moved from web searching to mail and news and searching books by scanning/ocr'ing them. this mission statment covers everything and means we won't run out of work!
a lot of our issues are to do with scale. we have 4B webpages with average 10kb/page, and lots and lots of searches per sections. it's a big problem but you solve it with lots of computers and disks and network them well.
dealing with scale comes about in a number of areas. hardware/network; what do you use. distributed systems; dealing with unreliable things. algorithims/structures; processing efficiently and in interesting ways. machine learning/info retrevial; improving quality of results by analyzing lots of data. user interfaces; we haven't done much on this yet but it would be interesting to provide new and interesting ways to naviage and refine the query by doing better things than just typing in new query words - i'd expect to see more developments in this area.
one thing we've made a decision about is that we tend to build on low cost commodity PCs. example setup: ibm eserver xseries 440, 8 2-ghz xexon, 64GB ram 8TB disk = 758,000. we use this: 88 machines that total, 172 2-ghz xeons, 176 GB ram, ~7TB = 278,000. this is 1/3x price, more cpu.
google was founded in 97 by two people at stanford working on interesting ways to use the search, but needed new hardware to do this. they'd go to the loading dock and offer to setup machine for other reasearch projects - but keep them for a while themselves to get work done. over time google was formed in 1999, and we've learned a lot since then - such as how to scale better and have good datacenter practices.
hosting centers were charging for the square foot, which is strange since their costs come from things like cooling and electricity so we got good at putting a lot of servers in one place. we know are very good at setting up large clusters quickly, such as our gigantic 2001 datacenter move configured in 3 days.
if you have that many machines you have to worry about failure. one machine might fail every thousand days, but thousands of machines mean at least a failure a day. you have to deal with this in software with replication and redundancy. one nice property of dealing with this problem is that having six copies for capacity reasons also means we now have six copies available for distributed application and load balancing. a lot of the applications we deal with are read-only, which helps handling so many querys easy.
the director... (Score:4, Funny)
Pfffft. (Score:3, Funny)
Behind the scenes? (Score:5, Interesting)
That having been said, as a long time insider I have a pretty good idea about what really happens "behind the scenes" and let me tell you, both conspiracy theories crackpots and our slashdot fanboys are quite amusing, but the boring fact is that we are neither trying to take over the world, nor are we the best thing since the second coming of Jesus.
We used to be a very successful startup, yes, and now we are a fairly successful corporation. Yes, there are a lot of smart people working here, but don't fool yourself, "the most interesting challenges in computer science" are happening in academia, not in corporations. (Besides, anyone who knows Jeff is perfectly aware that he often tends to grossly exaggerate our importance, but to be honest that is a part of his job which he is doing really great.)
All in all, I love to work here, I thing there are a lot of very smart people here, but if you think that we are the only place on the planet where geniuses cluster lately, you are just not being reasonable. If you want to find real discoveries you have to look in places where people don't have shareholders telling them what to do. The point is that we haven't done anything new per se, only the scale of our implementations is unprecedented.
For example, in my 20% time (Google allows us to spend 20% of paid work time on personal projects) I am working with KeyKOS right now and let me tell you, this is what I call innovation. It was done in the '70s and no mainstream OS has implemented its ideas to this day so far. I'm sure that when after a decade or two a Big Corporation (be it Google, Microsoft, Apple, or IBM) reimplements KeyKOS, the Slashdot crowd will wet their pants screaming "wow, what an innovation!" completely forgetting that it was an innovation back in the '70s of the 20th century when Norm Hurdy et al. were working on it quitely with no buzz and fanfares. Please remember that "The Next Big Thing" is always an old idea but this time backed with $$$ and marketing. Please never forget it, or otherwise the people who are worth their salt will only consider you uneducated.
University Recruiting Talks (Score:4, Interesting)
They aren't really news worth reporting on slashdot, since they all contain the same content.
Equal Time (Score:5, Informative)
Booooooooooring... (Score:3)
Dan East
(finally able to post for the first time in two weeks - wonder if anyone else had a problem)
Google innovates? It's news to me. (Score:5, Interesting)
Some of the other search engines are comparable in quality to Google (Teoma [teoma.com], Vivisimo [vivisimo.com]), and may be better, depending on how many points you take away from Google for spam-infested results, too many blogs, too many Wikipedia clones, too many commercial sites, etc. And some sites are so much further on the innovation scale (meet BrainBoost [brainboost.com], an artifically intelligent Internet reference desk answering any questions asked in natural English, with amazing quality and accuracy in a very friendly and usable interface) that they put Google to shame.
Re:mediocre or no Linux support! (Score:3, Informative)
Re:mediocre or no Linux support! (Score:3, Insightful)
Re:want real dirt? go to www.fuckedgoogle.com (Score:2, Insightful)
Re:want real dirt? go to www.fuckedgoogle.com (Score:3, Insightful)
Just like Slashdot then? Except this fuckedgoogle site has the opposite viewpoint. How is it OK to be biased in one direction, but not the other? Why is it that some people on this site seem to have a vested interest in quashing any criticism of their favourite giant corporation? What have you got to hide?
Dirt? That more like modelling clay (Score:4, Insightful)
Think about it, if someone really hated any of the Fortune 500 companies and bothered to dig up some dirt, there'd be tons more dirt.
I suppose Google is a young company. Give it a few more years and more parasites would have found their way into Google. Then you'd have a lot more dirt.
Parent
Re:Dirt? That more like modelling clay (Score:3, Insightful)
So either put up (evidence) or shut up.