NCSA Compares Google and Yahoo Index Numbers

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

NCSA Compares Google and Yahoo Index Numbers 395

Posted by ScuttleMonkey on Monday August 15, 2005 @02:11PM from the searching-for-truth dept.

chrisd (former Slashdot editor and now Google employee) writes "Recently, Yahoo claimed an increase of index size to "over 20 billion items", compared to Google's 8.16 billion pages. Now, researchers at NCSA have done their own, independent, comparison of the two engines. "

This discussion has been archived. No new comments can be posted.

NCSA Compares Google and Yahoo Index Numbers

Search 395 Comments Log In/Create an Account

Comments Filter:

They might have a larger index file (Score:4, Insightful)

by BlackCobra43 ( 596714 ) writes: on Monday August 15, 2005 @02:14PM (#13323006)

but they can't sift through it nearly as well as Google, so what does it matter? Even if you have a bigger dictionnary, if you can't speak English at all it won't do you much good.

Share
twitter facebook
Flawed conclusion? (Score:5, Insightful)

by Prong_Thunder ( 572889 ) writes: on Monday August 15, 2005 @02:16PM (#13323025)

Sorry, but if Google consistently returns more results, it could just as easily mean that the filtering isn't as good.

I still prefer Google though.

Share
twitter facebook
Re:Accurate results? (Score:1, Insightful)

by Anonymous Coward writes: on Monday August 15, 2005 @02:18PM (#13323036)

Actually, this was the result of a bloggers' linking campaign to do just that.

In response, you can see Michael Moore in the #2 position.

Parent Share
twitter facebook
English Language (Score:4, Insightful)

by morcheeba ( 260908 ) * writes: on Monday August 15, 2005 @02:18PM (#13323038) Journal

They only used words from the English Ispell word list. Besides the english-language bias, this is probably limited in other ways. News websites use a limited vocabulary, but a lot of proper names -- so if one engine indexed these better, they wouldn't necessarily get a better rating. News sites are also very dynamic and have a large number of webpages, so they would be influential in the count.

Share
twitter facebook
Yahoo returns dupes... (Score:3, Insightful)

by Marnhinn ( 310256 ) writes: on Monday August 15, 2005 @02:18PM (#13323039) Homepage Journal

Yahoo returns a lot of dupes.

They may have more unique information simply futher down the result list, but since the search engines terminate the results at not quite 1k (1,000), the researchers have no way of testing that out.

All they can really show is that google returns more unique results per 1000 (which usually means that more items are indexed, but could be from Google's Pagerank also)...

Parent Share
twitter facebook
Re:Flawed conclusion? (Score:5, Insightful)

by Ossifer ( 703813 ) writes: on Monday August 15, 2005 @02:24PM (#13323103)

Exactly! I find the conclusions of the research to be quite specious. Yahoo may simply have tighter controls of what is considered a match, which, by the way, is no simple algorithm.

In any case, I am usually not so interested in the numbers of matches, but in the quality of the list returned--hopefully one website will have exactly what I need...

Parent Share
twitter facebook
Study has poor assumption (Score:2, Insightful)

by Anonymous Coward writes: on Monday August 15, 2005 @02:25PM (#13323115)

The study noted that although Yahoo says that have ~twice as many pages indexed as google, when they queried each engine with two arbitrary words from the dictionary, they got less responses from Yahoo.
From this they concluded yahoo's claim of twice as many pages is suspicious.

What's suspicious is that these people consider themselves scientific. What if, for example, Yahoo just returns meaningful results, whereas google returns anything with those words in? For example, what if you search for "faience" and "urbanity" -- maybe google has more results, but maybe they are less pertinent - in other words maybe not only Yahoo has more pages indexed, but they have an algorithm that returns only the most relevent stuff

Not saying that's the case necessarily, but not mentioning that assumption makes for a worthless study/conclusion. (also if google says they return x results, often when you go to the last page of their results listing you'll notice their total went down, and its more like x - 10%)

-Josh

Share
twitter facebook
Interesting but... (Score:3, Insightful)

by kf6auf ( 719514 ) writes: on Monday August 15, 2005 @02:25PM (#13323119)

While it is true that more results could mean worse filtering, that is a separate test entirely.
I tend to think that ordering is more important than filtering down to a small number of results, since having lots of results returned doesn't hurt if the search engine can order well so that what you want is most likely to be in the top 10-25. This is especially true when there will be at most a couple of results where I'd rather have the search engine try at the ordering and have me do most of the filtering because no search engine is as good as a person at really figuring out what people want, yet.

Parent Share
twitter facebook
Methodology (Score:5, Insightful)

by enjo13 ( 444114 ) writes: on Monday August 15, 2005 @02:26PM (#13323125) Homepage

The very methodology used in this case seems rather incorrect to me.

The assumption (as stated in the paper): Since Yahoo claims to have indexed twice as much as google, searches should return twice as many entries.

That assumption is flat out incorrect. There are actually multiple problems.

First, the scope of the search (based on index terms) is really up to the search engine itself. Since each search engine does not return the entire database as search results, it is very much up to the individual search algorithm to determine the depth of entries considered to 'match' a set of terms. That's what is really being reflected in these results.. it is not the overall size of the index, but simply how aggressive the search algorithm is in matching terms to entries.

Even if the algorithms where identical (same algorithm being run across both indexes), the nature of search does not scale in that way. If Yahoo has, for instance, becomre more aggressive in indexing message board and forum content, then only searches that play to those subjects should return more results than Google. Since searches are by definition narrowing on a data set, a methodology needs to be developed that more effectively tests the BREADTH of the results more than simply testing the depth.

Share
twitter facebook
Re:The results (Score:3, Insightful)

by mi ( 197448 ) writes: <slashdot-2017q4@virtual-estates.net> on Monday August 15, 2005 @02:26PM (#13323130) Homepage Journal

Based on this random sample, we found that on average Yahoo! only returns 37.4% of the results that Google does and, in many cases, returns significantly less.
Informative. But do they also explain, why this (Google's results) is a good thing? From my experience, Google's results beyond the second page are never useful, so they may as well not be there at all.
I don't see, how NCSA's findings can prove or disprove's Yahoo's earlier claims.

Parent Share
twitter facebook
Re:Conclusion (Score:5, Insightful)

by nutshell42 ( 557890 ) writes: on Monday August 15, 2005 @02:26PM (#13323132) Journal

And Nutshell42's New Amazing Search Engine gives you even more results. Even though my index size is only 1.something million. I simply return every single wikipedia article in every language as result no matter what you search.
Concluding that Yahoo's index has to be smaller because they return fewer results seems a bit overzealous. Only a thorough study comparing results and how useful they were (which is hard to do, expensive and time consuming) has any meaning that goes beyond producing lots of funny numbers and percentages.
96.34% of all percentages are completely useless.
btw. I use google, not yahoo

Parent Share
twitter facebook
International Listings (Score:5, Insightful)

by Dominatus ( 796241 ) writes: on Monday August 15, 2005 @02:27PM (#13323138)

The study only checked English words. Is it possible that the increase came from Yahoo expanding into more international website markets?

Just a thought

Share
twitter facebook
This is what passes for CS research nowadays? (Score:5, Insightful)

by adrizk ( 137574 ) writes: on Monday August 15, 2005 @02:28PM (#13323151)

Seriously. 'We wrote a script and here are the results'? This would take an average PERL programmer what -- 30 minutes of work? Has academic research in computing really sunk to this level?

Maybe it's not even worth pointing out how badly flawed (and lazy) the underlying assumption of 'twice the results = twice the index size' probably is, as I'm sure we're going to see a few dozen posts to that effect (unless PageRank really means nothing), but at least I can complain about the slant they put on this, and how strong a conclusion they seem to derive.

Share
twitter facebook
Re:Flawed conclusion? (Score:2, Insightful)

by Lewisham ( 239493 ) writes: on Monday August 15, 2005 @02:28PM (#13323158)

Agreed, whoever conducted this "research" is pretty idiotic. The pages returned != pages available.

This isn't worthy of the NCSA, or indeed any university, to be shown in any public format with any conclusions at *all*. You'd be laughed out of the conference hall if you presented this.

Parent Share
twitter facebook
Re:Accurate results? (Score:5, Insightful)

by jrallison ( 857135 ) writes: on Monday August 15, 2005 @02:28PM (#13323160) Homepage

It is odd however the #1 result for failure is a webpage without the word "failure" in it.

Parent Share
twitter facebook
Re:What would you want them to return? (Score:5, Insightful)

by Intron ( 870560 ) writes: on Monday August 15, 2005 @02:29PM (#13323171)

The top of the page return for Yahoo is

"Failure on eBay Find failure items at low prices. "

which illustrates the most important difference between Yahoo and Google.

Parent Share
twitter facebook
Who cares about... (Score:2, Insightful)

by Ignignokt ( 803398 ) writes: on Monday August 15, 2005 @02:30PM (#13323178)

the number of results anyways? Who makes it to page 5000 when doing a search?

Share
twitter facebook
More results == better search engine? (Score:3, Insightful)

by RunzWithScissors ( 567704 ) writes: on Monday August 15, 2005 @02:30PM (#13323185)

So in the conclusion, the author writes that since Google displayed more results, based on their random test data, it was the superior search engine? That seems so wrong somehow...

Wouldn't a better search engine return less, but more appropriate results? I mean, how many of us have found the information we were actually looking for on page ten or twelve of a search. And, isn't less more, but better? %insert Linux geek laughs here%

One would think that volume of results would not a better search engine make, although it may indicate a larger engine index size; an expicit statement to that effect seems to be missing from the NCSA report.

-Runz

Share
twitter facebook
Re:Conclusion (Score:3, Insightful)

by rossifer ( 581396 ) writes: on Monday August 15, 2005 @02:30PM (#13323189) Journal

Concluding that Yahoo's index has to be smaller because they return fewer results seems a bit overzealous.

No, it's accurate. They're testing Yahoo's claim of how many pages they've indexed, which just means that all indexed pages that contain the requested words should be returned from the search request. If yahoo returns fewer unique pages, yahoo has indexed fewer pages.

What you're talking about is measuring the effectiveness of page ranking, which is a completely different measure of how good a search engine is. Note: Google wins on that measure too.

Regards,
Ross

Parent Share
twitter facebook
Re:Accurate results? (Score:5, Insightful)

by MindStalker ( 22827 ) writes: <mindstalker@[ ]il.com ['gma' in gap]> on Monday August 15, 2005 @02:30PM (#13323192) Journal

Well google also indexes based upon refering links and not just the context in the page itself. So if many websites refer to GW as a failure, GWs page itself will turn up as a high hit. Yahoo does this as well, but doesn't not nessesarly give it the same weight. This could highly affect amounts of returns. Because if we say that google returned X pages for a search on term "y" many of these pages may not actually mention "y" thus giving a larger page count for "y". While with yahoos method, it will mainly return pages that mention "y" themself. And possibly add some pages that are mentioned to include "y" by links. This can vastly alter the count.

Parent Share
twitter facebook
Quality Quantity (Score:2, Insightful)

by hagrin ( 896731 ) writes: on Monday August 15, 2005 @02:31PM (#13323195) Homepage Journal

This is just another example in the age old argument of which is better. IMO, the quality of the search results is what matters more than the sheer quantity of information. One relevant find is more valuable than 100 inaccurate results. A test of accuracy might be more valuable and one that would be difficult to engineer. For instance, if I type in a word that has a direct correlating .com domain, that should be the first result (assuming no other words in the title - i.e. "hagrin" brings me my home page as the first result). I am sure a test of accuracy could be further derived from such logic.

The other side of the argument probably relates back to something my fiancee once told me - "Size doesn't matter, but it's the great equalizer when it comes to two guys not knowing what they are doing". Yahoo!, especially since the researches couldn't perform queries on topics returning more than 1,000 results, may be indexing and crawling deeper into sites or it has a "double dipping" problem.

Either way, I don't see Yahoo! falsely reporting their numbers - I would tend to think that this "study" is highly flawed due to its exclusion of larger result topics, etc.

Share
twitter facebook
Re:Yahoo pants down, egg on face, no WMD either. (Score:2, Insightful)

by Sandor at the Zoo ( 98013 ) writes: on Monday August 15, 2005 @02:39PM (#13323274)

Yeah, this "study" seems to be something whipped together over a weekend. Particularly:
Thus, for the purposes of this study, we were forced to restrict our searches to those queries that returned less than 1,000 results on both Yahoo! and Google.
So, anything popular gets tossed. What if Yahoo! indexes all the pages with popular search terms, but Google only indexes the first 1,000? I doubt very much that it's the case, but this whole approach seems suspect at best.
They threw out what is probably a huge chunk of the results they got, didn't tell us (that I can find) how much they threw out, then make conclusions based on the small sample left over. Seems like a very odd research method.

Parent Share
twitter facebook
Don't even contain the search term (Score:3, Insightful)

by Midnight Thunder ( 17205 ) writes: on Monday August 15, 2005 @02:42PM (#13323300) Homepage Journal

The interesting thing is that the top three results make no reference to the word failure. Of course it is probably based on pages linking to these three, but I wonder if they should even be included for the lack of the search term?

Parent Share
twitter facebook
Re:Yahoo pants down, egg on face, no WMD either. (Score:5, Insightful)

by loose_cannon_gamer ( 857933 ) writes: on Monday August 15, 2005 @02:54PM (#13323399)

After reading half the comments on this page, I'm amused at how many alert readers are making the same mistake that they accuse Yahoo of -- misstating results.
Can we conclude from this study that Google has a bigger index than Yahoo? No. Can we conclude that when you pick two English words that when entered into both Google and Yahoo, both return less than 1000 results, that Google has consistently more results? Yes.
The real question is, what can we infer from the actual indisputable findings of this study? I find no ready method of generalization. If you are inclined to believe google is better, you feel happy inside. If you think yahoo is better, you have many options to dispute the idea that the study result generalizes to search engine index size.
As a google fan, I enjoy the warm fuzzies, but I don't see that much to get excited about either way.

Parent Share
twitter facebook
Re:Conclusion (Score:5, Insightful)

by barawn ( 25691 ) writes: on Monday August 15, 2005 @02:55PM (#13323415) Homepage

No, it's accurate. They're testing Yahoo's claim of how many pages they've indexed, which just means that all indexed pages that contain the requested words should be returned from the search request. If yahoo returns fewer unique pages, yahoo has indexed fewer pages.

Actually, it might not be, thanks to their methodology.

They only used searches with less than 1000 results. They therefore got a lot of searches with small results numbers (because they were searching for bizarre word combinations, like "promotion bedabble"). The total number of results was something like 500,000 or so (order of magnitude) for 10,000 searches. That's an average of 50 results/search, and I'd bet there's a large, large tail, so the most common search is probably something like 10 results.

The problem with this is that in their word list, the same sites are being returned over and over!. For instance, sites containing dictionary lists appear in both "promotion bedabble" and "foliolate defecations" because, duh, that's the only place they'll appear. Since they're just searching the same type of site over and over, they get the same result magnified a lot: Google has more "dictionary lists" in its index than Yahoo. Most of the "dictionary list" word searches returned about 10-20 for Google, and few, if any, for Yahoo.

It's a pretty serious flaw in the methodology, as far as I can tell - they're double counting huge numbers of results, and so they're not really getting a good statistical sample of the index.

Parent Share
twitter facebook
Re:Queries with 1,000 results (Score:3, Insightful)

by barawn ( 25691 ) writes: on Monday August 15, 2005 @03:04PM (#13323494) Homepage

That makes sense, but it does stand to reason (or, at least, to my reason) that these queries that garner large numbers of results could have had a significant impact on the bottom line of the survey.

Well, there's a worse bias. They're grabbing words from an Ispell word list.

There are websites which contain the Ispell word list. There appear to be more of those returned in Google as results than in Yahoo. (here [nd.edu] is one returned in Google for "apprizers expense", but which is not returned in Yahoo.)

This basically contributes a pedestal to their result - they'll never get zero results, because they'll always get the Ispell lists back, and because those results always return the same number (about 8 Google to 1 or 2 Yahoo), you'll bias the results of the entire set to that result.

They needed to remove results which are returned in common to multiple searches, as that's essentially double counting.

Parent Share
twitter facebook
you are WRONG (Score:2, Insightful)

by alarch ( 830794 ) writes: on Monday August 15, 2005 @03:08PM (#13323552) Homepage

try it. for example search for "swans" : you got 1 510 000 results, the first one is the SWANS rock band site. search for "swan" then - 8 550 000 results, the first is some SWAN social network - the rockers are not on the first page at all

Parent Share
twitter facebook
Re:Yahoo returns dupes... (Score:3, Insightful)

by NickFortune ( 613926 ) writes: on Monday August 15, 2005 @03:10PM (#13323565) Homepage Journal

Yahoo returns a lot of dupes.
Interestingly however, for the search results analysed, google performed noticeably better whether dupes were included or discarded.
They may have more unique information simply futher down the result list, but since the search engines terminate the results at not quite 1k (1,000), the researchers have no way of testing that out.
That isn't actually what they did. They only analysed results that scored less that 1000 results on both google and yahoo. If either engine scored over that, the results were discarded.
So, for every search analysed, the full results from each engine were always considered.
All they can really show is that google returns more unique results per 1000
Errm, nope. You could make a case for the study only showing that google performs better where information is scarce - but that's exactly when you want a good search engine, so I'm not too worried. There's a limit to how many Britney Spears links I can find a use for.
(which usually means that more items are indexed, but could be from Google's Pagerank also)
Well, the researchers provide links to the perl script and the dictionary used and also a log of the search results. If you think they're skewing the results, or just that they've made some logical errors in the study, you have all the materials you need to make a detailed refutation, or to repeat their experiment and release your own findings.
And if you really believe the study is flawed then I encourage you to do so,

Parent Share
twitter facebook
Re:Flawed conclusion? (Score:3, Insightful)

by barawn ( 25691 ) writes: on Monday August 15, 2005 @03:17PM (#13323671) Homepage

Or it could mean that Google has more Ispell lists in its index.

Which appears to be the case.

A search for "inabilities hydrocephalic" returns almost all dictionary lists in Google, except 2. There's only 2 results in Yahoo, one of which is a dictionary list (or equivalent).

But the official results for this? 16 for Google, 2 for Yahoo.

The reason this is a problem is because almost every search returns the same dictionary lists, so it amounts to double (or probably around 5000-fold) weighting of those sites in the results.

Without excluding results that are just dictionary lists (which is quite hard from a simple analysis like this) you heftily bias your results to mimic the "Number of Google dictionary list sites/Number of Yahoo dictionary list sites" ratio.

They probably should've only included sites that returned between 100 and 1000 results, but I'd bet that would take a ton more time, as it looks like almost all of the results they used were the "10-50" result range.

Parent Share
twitter facebook
Re:not so fast (Score:2, Insightful)

by betsywetsy ( 12592 ) writes: on Monday August 15, 2005 @03:18PM (#13323679)

Testing further, so far I've found dictionary files in G's results in all of the edge cases in which neither engine returns significant results, and a couple of times in Y's results.

centerable's heterolecithal
or's depigmentation
apprizer's expense
inabilities hydrocephalic
unobservable Oistrakh
apparentness nucleophile ...

At this point, I think the conclusion that you'll get more results on Google arguably stands, the methodology of the test and the idea that anything can be concluded about the relative index sizes are clearly discredited.

(Thanks, Dr. K!)

Parent Share
twitter facebook
Re:Conclusion (Score:3, Insightful)

by barawn ( 25691 ) writes: on Monday August 15, 2005 @03:34PM (#13323888) Homepage

Correction to myself: the total responses to their list was ~150,000 to ~10,000 searches for Yahoo, and ~400,000 for Google. So the average is 15 results for Yahoo and 40 for Google. Given that most "dictionary list" results were between 10 and 40, that should pretty much tell you that their entire result is just a massively multiplied reflection of those searches.

As an interesting aside, though: if you dig through their log, you can see several interesting things. If you look at only results which return between 100 and 1000 results, you get things like "battening liberate", which returned 186 for Google, and 97 for Yahoo. Those aren't dictionary list results - the interesting thing is that in almost all of those results, you see an extremely similar pattern.

"battening liberate":
Ratio of Google/Yahoo for this query:
Duplicates Omitted Estimate: 0.522305
Duplicates Omitted Total: 1.917526
Duplicates Included Estimate: 0.533962
Duplicates Included Total: 2.350427

"convexity hac"
Ratio of Google/Yahoo for this query:
Duplicates Omitted Estimate: 0.573593
Duplicates Omitted Total: 3.340000
Duplicates Included Estimate: 0.583700
Duplicates Included Total: 2.490566

"meekness goatee"
Ratio of Google/Yahoo for this query:
Duplicates Omitted Estimate: 0.607053
Duplicates Omitted Total: 2.207692
Duplicates Included Estimate: 0.604010
Duplicates Included Total: 2.745562

So Yahoo claims it has 2X as much as Google, but actually only returns about 30-50%.

Interestingly, these mimic the "dictionary list" results, which is curious. So their conclusions seem right, but their methodology seems very wrong.

Parent Share
twitter facebook
Those are estimates (Score:5, Insightful)

by mcc ( 14761 ) writes: <amcclure@purdue.edu> on Monday August 15, 2005 @03:35PM (#13323906) Homepage

Of course the study also demonstrates that on the searched terms, Yahoo's estimate numbers vastly overestimated the number of available results they actually found. So if the pages from the study are even close to representative in that regard then this would make the numbers you quote utterly meaningless.

Which is the entire reason, of course, why they kept the limits under 1,000 in the first place-- that for any number over 1,000, if the search engine says, say, "I found "2.5 million results for 'Valerie Plame'", you have no way to tell whether it's telling the truth or not.

Parent Share
twitter facebook
Re:Yahoo pants down, egg on face, no WMD either. (Score:2, Insightful)

by icemann476 ( 859104 ) writes: on Monday August 15, 2005 @03:49PM (#13324048)

Actually, their decision to throw out any queries resulting in more than 1000 pages returned seems very logical to me. How many times have you typed in a search, perused through the 1000 pages and felt like you just needed more options? Yahoo may very well have more than double the total indexed pages Google has but what good is it to have 10,000 pages returned for 1 query; it becomes redundant at some point. I think the research did a good job of showing that Google produces more options (indexed pages) per search than Yahoo does, regardless of who actually has more "total pages" indexed.

Parent Share
twitter facebook
With Google pages do not have to have all words (Score:2, Insightful)

by trelony ( 825975 ) writes: on Monday August 15, 2005 @04:13PM (#13324339)

With Google for a page to be found, other pages that reference the page may contain the requested words, but not the returned page itself.

Share
twitter facebook
Re:Yahoo pants down, egg on face, no WMD either. (Score:3, Insightful)

by Iriel ( 810009 ) writes: on Monday August 15, 2005 @04:16PM (#13324397) Homepage

I agree on that. Based on the methods used to test a general index size, I think it leaves a lot of holes. When you're talking about millions of items, a generalization can be woefully innacurate.

Rather than talking about indexed content, it seems like this test is actually more appropriate to use as some sort of analysis on the overall usefullness of the search engines. Even then, though, the results could be skewed to say that it's better to provide a wealth of pages (Google) or to have fine tuned and narrowed results that you're looking for (Yahoo!). Numbers matter to a program, results matter to people. This test only portrays the former, yet the latter is what we're really trying to get at.

Either way, I don't think radom tests can really do justice to Google or Yahoo!. Rather than perfomring a radomized test upon each, I think the better gauge of each's usefullness would be something more like a practical application study. In other words, evaluate real everyday kind of searches on each site instead of an unlikely combination of two random english words like politics and truth ;)

In other words, while I commend the effort to debunk any misinformation about which search engine is better endowed, so to speak; the numbers given don't provide useful information to anyone but a spin doctor.

(As a side note, I'm actually more of a Google fan for search and applications, but I love Yahoo! as a lifestyle portal for things like movie listings and such)

Parent Share
twitter facebook
Results of this study are not accurate (Score:1, Insightful)

by Anonymous Coward writes: on Monday August 15, 2005 @04:57PM (#13324928)

I did a few spot searches myself and one thing that makes a huge difference is google does "smart" searching.. if you type in a phrase and google suggests that you meant something else, it will search for that as well and combine the results. This would give google a larger result set. Therefore it is impossible to determine whose indexe is bigger because the way they build their search results is inherently different.

Share
twitter facebook
Re:Yahoo pants down, egg on face, no WMD either. (Score:3, Insightful)

by dustmite ( 667870 ) writes: on Monday August 15, 2005 @05:18PM (#13325165)

I think it is possible that Yahoo! has more items indexed than Google ... Yahoo can search subscription based content. That has got to boost their numbers considerably beyond the range of queries that typically return less than one thousand results
If we assume that Yahoo has offered subscription-based content searching for about two years (not sure of the exact length of time), then to get even close to the difference they are citing here in their marketing (over 11 billion more items), they would have to have added over 116 subscription-based items per second, every single second since they started. This seems rather unlikely. Far far far more likely is that this is just a case of extremely "creating (ac)counting" on Yahoo's part.

Parent Share
twitter facebook
Re:Yahoo returns dupes... (Score:3, Insightful)

by NickFortune ( 613926 ) writes: on Monday August 15, 2005 @05:42PM (#13325401) Homepage Journal

Could be that Google's "smaller" index is searched by a less picky search tool that gives more results because it doesn't sucessfully eliminate as many useless pages.
Could be. And it could be that Google's results are both both more numerous and of better quality. The tests did not, as you quite rightly point out, consider the relevance of the results. As is proper, the researchers make no claims regarding relevance.
On the other hand, their findings to cast doubt upon Yahoo's claims regarding index size.

Parent Share
twitter facebook
Re:What would you want them to return? (Score:2, Insightful)

by -brazil- ( 111867 ) writes: on Monday August 15, 2005 @06:15PM (#13325668) Homepage

it is the reality of state of the Web. At least as far as Google's formula ranks/weights pages/links.

That second part is the important one. If search results can be manipulated by relatively small groups of people, this can be abused, e.g. for search engine spamming, thereby limiting the usefulness of the search engine.

Parent Share
twitter facebook
Re:Yahoo returns dupes... (Score:4, Insightful)

by NickFortune ( 613926 ) writes: on Monday August 15, 2005 @06:49PM (#13325932) Homepage Journal

On the other hand, their findings to cast doubt upon Yahoo's claims regarding index size.
These findings don't do anything of the sort. In fact, Google could have only 999 pages in index, and if it returned all 999 for every query it would have won this test. There's too many assumptions here for the results to be useful.

'Scuse me: I said "cast doubt upon" not "conclusively disproved".
If Yahoo's indices are, as they claim, more than twice the size of Google's, then we might reasonably expect them to return more hits for an arbitary query. That they do not do so suggests that Yahoo may well be telling fibs.
Yes, there are other explanations, like for example, Google deliberately falsifying all sub 1000 hit queries, as you point out. However, one likely, arguably the most likely explanation is that Yahoo is being a bit sparing with the truth in its press releases.
Hence "cast doubt upon".

Parent Share
twitter facebook
Re:Perl Code (Score:1, Insightful)

by Anonymous Coward writes: on Monday August 15, 2005 @07:15PM (#13326130)

I know you are trying to be funny, but that code is acutally not very good by any modern standard. Fork and open wget instead of using LWP; lots of printf's instead of using heredocs or templates; unnecessary use of C-style for loops...

Parent Share
twitter facebook
Re:Perl Code (Score:3, Insightful)

by pfafrich ( 647460 ) writes: <rich@@@singsurf...org> on Monday August 15, 2005 @08:40PM (#13326672) Homepage
Readable code because:
- Well laid out and indented
- Long and meaningful function and variable names
- Good logical structure, no fancy tricks
- It looks like C!
Parent Share
twitter facebook
Re:Yahoo returns dupes... (Score:3, Insightful)

by sam1am ( 753369 ) writes: on Monday August 15, 2005 @10:58PM (#13327425)

As soon as I read this, I had the following thought...
One interesting statistic would be the number of searches for which Google had over 1,000 results, compared to the number of searches for which Yahoo had more than 1,000 results.

If Yahoo caused 80% of the "over-popular result" discards, well, I'd say that would be highly relevant.

But then I read footnote [3]:

[3] In a small number of cases, one search engine (almost always Google) will return results over 1,000 while the other search engine will not. Although we discard this data, we recognize that the data is meaningful and we hope to refine our code to take this into account. However, since the frequency this occurs is small (and almost always favoring Google) we do not feel it changes our findings.

I'd still like the statistics, but this resolves one of my concerns with the methodology.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

They might have a larger index file (Score:4, Insightful)

Flawed conclusion? (Score:5, Insightful)

Re:Accurate results? (Score:1, Insightful)

English Language (Score:4, Insightful)

Yahoo returns dupes... (Score:3, Insightful)

Re:Flawed conclusion? (Score:5, Insightful)

Study has poor assumption (Score:2, Insightful)

Interesting but... (Score:3, Insightful)

Methodology (Score:5, Insightful)

Re:The results (Score:3, Insightful)

Re:Conclusion (Score:5, Insightful)

International Listings (Score:5, Insightful)

This is what passes for CS research nowadays? (Score:5, Insightful)

Re:Flawed conclusion? (Score:2, Insightful)

Re:Accurate results? (Score:5, Insightful)

Re:What would you want them to return? (Score:5, Insightful)

Who cares about... (Score:2, Insightful)

More results == better search engine? (Score:3, Insightful)

Re:Conclusion (Score:3, Insightful)

Re:Accurate results? (Score:5, Insightful)

Quality Quantity (Score:2, Insightful)

Re:Yahoo pants down, egg on face, no WMD either. (Score:2, Insightful)

Don't even contain the search term (Score:3, Insightful)

Re:Yahoo pants down, egg on face, no WMD either. (Score:5, Insightful)

Re:Conclusion (Score:5, Insightful)

Re:Queries with 1,000 results (Score:3, Insightful)

you are WRONG (Score:2, Insightful)

Re:Yahoo returns dupes... (Score:3, Insightful)

Re:Flawed conclusion? (Score:3, Insightful)

Re:not so fast (Score:2, Insightful)

Re:Conclusion (Score:3, Insightful)

Those are estimates (Score:5, Insightful)

Re:Yahoo pants down, egg on face, no WMD either. (Score:2, Insightful)

With Google pages do not have to have all words (Score:2, Insightful)

Re:Yahoo pants down, egg on face, no WMD either. (Score:3, Insightful)

Results of this study are not accurate (Score:1, Insightful)

Re:Yahoo pants down, egg on face, no WMD either. (Score:3, Insightful)

Re:Yahoo returns dupes... (Score:3, Insightful)

Re:What would you want them to return? (Score:2, Insightful)

Re:Yahoo returns dupes... (Score:4, Insightful)

Re:Perl Code (Score:1, Insightful)

Re:Perl Code (Score:3, Insightful)

Re:Yahoo returns dupes... (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals