Google Raises Word Limit 71
Philipp Lenssen writes "Google quietly raised their web search limit to 32 words. Previously, only up to 10 words were allowed per query, with succeeding words being ignored. This is not only important to specific approaches of advanced searching (for example, when you need to exclude many different keywords using the minus operator), but it's also of great help to certain tools using the Google API. While there doesn't seem to be any official statement from Google yet, some more details can be found at my Google blog."
I wonder.... (Score:2, Insightful)
Finnally. (Score:3, Insightful)
Re:Finnally. (Score:2)
Mod Parent Up (Score:2)
Great (Score:5, Interesting)
Re:Great (Score:4, Informative)
Re:Great (Score:2)
Google doesn't have wildcards... (Score:1)
very complex (Score:3, Interesting)
So, um, wow.
Re:very complex (Score:5, Insightful)
Re:very complex (Score:2)
Oh. I see. In that case the complexity only increases with the number of times the document is passed over for each word, or 3.2x which is probably over twice as high as the average number of times a document needs to be scanned for a word before finding that it doesn't contain a word...
That's not too hard. Why is there a limit?
Re:very complex (Score:1)
So say you have search terms a, b, and c. The documents that contain these are found using the index that they construct, and the documents that contain a are set A, the documents that contain b are set B, and the documents that contain c are set C. You must then find the intersection of A, B, and C. A
Re:very complex (Score:2, Insightful)
With 32 words you will be able to find theoreticaly almost any page. The difference is much more that 3.2x
With 10 words - you can search for about <NumberOfWords> ^ 10 ( number of words in power 10 ), but with 32 words - this will be <NumberOfWords> ^ 32.
Now think about number of words in all languages Google can support.
There are fewer than a thousand of the world's 6800 languages have writing systems ( http://www.ethnologue.com/language_index.asp )
Let's assume that all languages ha
Re:very complex (Score:1, Insightful)
First of all, all Google search words are required to be present on a webpage, so adding more words lowers the number of hits.
Besides that, the reasoning above is absurd. Why should the number of possible searches correspond to the number of hits?
And the number of languages in the world is appearing in the equations above? Even when probably 90% of all webpages are written in english?
Theres nothing interesting about the sheer number of possible searches. Af
Re:very complex (Score:1)
Are you sure about that?
house - 294,000,000
house car - 24,700,000
house car boat - 6,250,000
house car boat dog - 1,570,000
house car boat dog smoke - 412,000
house car boat dog smoke funny - 163,000
house car boat dog smoke funny slashdot - 2,200
searching for non a-z characters (Score:5, Insightful)
Re:searching for non a-z characters (Score:2)
Matching MSN Search? (Score:5, Interesting)
MSN's new search [msn.com] whih has has sported a bigger word limit for quite some time.
Great! (Score:1, Funny)
Good for searching multiple sites (Score:3, Interesting)
Re:Good for searching multiple sites (Score:5, Insightful)
what I would hope for them to introduce would be a word blacklist that would be personal, and that you could include at least a thousand terms in it.
why? TO AVOID THOSE FUCKING LINKFARMS, they usually have the same advert links in them so just adding the referral id of the owner of a certain farm will get a lot of meaningless sites out of the search. it's doable now if you make your own program that does the filtering(using googleapi. there's two ways, either go to the sites yourself or request the cache from google.. massive traffic in any case for you and the search will take ages to complete).
Re:Good for searching multiple sites (Score:2)
However, I don't think that's a good solution for getting rid of link farms. Google should deal with those itself because they mess things up for everybody. They should keep tweaking their alogirthms to detect link farms better and encourage people to report them [google.com].
How To Use 32 Words To Improve Your Searches... (Score:5, Informative)
1. Break your search into 2-4 principal, independent concepts- In my example, the concepts are NYC (the location) moving company (the company type) and antiques (the specialty)
2. For each concept, come up with as many terms as you can that are descriptions or examples of the concept that are very specific and won't trigger homonyms- For instance, you wouldn't want to use the word "New York" because it is too vague and could refer to the state (a company in Albany, NY won't help you). However, "NYC" "Long Island" "Brooklyn" "Queens" "New York City" are great, even if they seem overly specific- You just need one of them to cause a hit on a relevant page.
3. Put parenthesis around the terms for each concept (be sure to put quotes around each compound term) and OR together the items inside parentheses.
This is what the entire search might look like:
("NYC" OR "Long Island" OR "Brooklyn" OR "Queens" OR "Manhattan" OR "Bronx" OR "New York City" OR "Big Apple") ("moving company" OR "moving companies" OR "specialy movers" OR "professional movers" OR "u-haul" OR "apartment movers") ("fragile" OR "antiques" OR "china" OR "difficult to move")
It takes a bit of time to put together (and google will run slooooow because this kind of logic is very difficult for the search engine), but a search like this will give you the best possible results on hard queries.
Re:How To Use 32 Words To Improve Your Searches... (Score:2)
Unfortunately, what this didn't tell me is which moving company isn't going to rip you off [google.com].
Re:How To Use 32 Words To Improve Your Searches... (Score:1)
Re:How To Use 32 Words To Improve Your Searches... (Score:2)
And just as pedantic as ever...
-Adam
Re:How To Use 32 Words To Improve Your Searches... (Score:2)
Re:How To Use 32 Words To Improve Your Searches... (Score:1)
Re:How To Use 32 Words To Improve Your Searches... (Score:1)
A Homonym is a word that *sounds* like another word, but has a different meaning. Bear and bare are homonyms. Their and They're. Too, to and two. You get the idea.
"the same word that means different things" is called a homograph. Commonly we say that word as several meanings or senses.
I thought so... (Score:1)
[/curiousity]
Regexp (Score:4, Insightful)
Re:Regexp (Score:2)
They probably heard this statement [granger.free.fr].
Re:Regexp (Score:2)
Classic newbie mistake. The biggest problem with search engines is that they return too many answers not too few. Adding regular expressions or stemming makes your answer set even bigger.
What we need is ways to make the answer set smaller, not larger. Hence the benefit of clustering, for example (see for example http://vivisimo.com/search?query=search+trees&v%3 A sources=Web [vivisimo.com]).
Re:Regexp (Score:2, Insightful)
The problem that annoys you is not the size of the answer set, but the lack of a proper sorting function (by relevance) to satisfy you. The fact that you find your desired answer at the 10th or the 30th position is a sign that sorting doesn't work like you'd expect it to. It has nothing to do with the size of the answer set.
I don't want a
Re:Regexp (Score:2)
Maybe you do, but most users don't. Less than 30% click next page.
Reading on, I think what you mean to say is that you would like the answer to be selected from a larger set expanded perhaps to include stemming. In principle that sounds fine, in practice a decent answer is almost always contained in the 31 million+ pages that google returned.
The problem was that google didn't understand that in the search for "tree" the user meant binary search tree
Re:Regexp (Score:2)
And what percentage of users can write regular expressions? Probably less than 0.3%, so what's the problem?
Anyway, your thesis that regexps will lead to longer result lists is incorrect. If I really want to search for "Windows (95|98)", today my only recourse is to enter "Microsoft Windows" and then manually skip the (majority) of irrelevant hits, or to search for both "Windows 95" and "Windows 98", then manually unify the two returned l
Re:Regexp (Score:2)
Re:Regexp (Score:2)
Wow, self-contradiction within the scope of a single sentence. If they're "rarely used", then they can't possibly increase workload very much,
Re:Regexp (Score:2)
Easy: if an operation is sufficiently expensive, the actual cost is noticeable, even when rare. This is known as heavy tail meaning that "a relatively small number of very high cost events skews a mean calculation".
No contradiction there.
Re:Regexp (Score:1)
In this example I believe "Windows 95" OR "Windows 98" [google.com] would do the trick.
Of course regular expressions would be nice, but I just don't see them happening any time soon due to inherit resource requirements.
Google API? Useless. (Score:3, Insightful)
Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application. About the only thing I can think of that it would be useful for is a desktop program in which the user would only perform a limited number of searches.
Re:Google API? Useless. (Score:2)
Re:Google API? Useless. (Score:2)
Re:Google API? Useless. (Score:2)
Re:Google API? Useless. (Score:1)
Re:Google API? Useless. (Score:2)
Well, it appears to be useless for your web application. In my opinion, 1,000 queires a day seem a lot for a non-commercial product. Google may add a commercial program that allows more than 1000 queries per day: (google answer: http://www.google.com/apis/api_faq.html#gen15 [google.com].
Lastly, I always like to mention the API is a new, free, and beta service. My gut says that if you need more than 1,000
Re:Google API? Useless. (Score:2)
It's said that they may open a commercial program for it for years now, it's not going to change anytime soon.
Re:Google API? Useless. (Score:2)
Sure it could, and if all the web app did was search, the the 300 to 500 users (or more) would exceed the 1,000 queries per day. On the other hand, if all the web app did was search, why would Google want you to freely take people away from their search engine?
IMO, this API could be put to a good, supplemental use in an application (one where searching could happen, but is not the primary focus).
Re:Google API? Useless. (Score:2)
And who said anything about taking people away from google freely? The problem is they don't allow you to purchase more searches.
Has Anybody Called Google? (Score:2)
Perhaps for a pure non-profit web app, but if you're collecting advertising revenue you might be able to slide some of this Google's way for a higher limit.
Has anybody actually talked to someone at Google about licensing? (i.e. not just what's on the FAQ)
Google Grid (Score:2)
Whats next... Adding contacts lists to gmail? (Score:1)
Now I'll be able to search for... (Score:2)
If google had raised it's limits earlier, I could have skipped that school diploma and just went right into I.T. support.
What I'd like to see (Score:2)
In no particular order:
* A better query language, with wildacrds ("Word*") or stemming, proximity operators, parentheses, complex boolean expressions (something like what Dejanews and the pre-Yahoo AltaVista used to offer).
* Filtering out linkfarms and search-pages.
Re:What I'd like to see (Score:1)
Already in there, it seems.
"* Filtering out linkfarms and search-pages."
They're working on that, help them out. [google.com]
Re:What I'd like to see (Score:2)
>> "*
>Already in there, it seems.
I seem to have missed it. Have a pointer?
Re:What I'd like to see (Score:1)
Re:What I'd like to see (Score:2)
Parentheses (as in "(A and B) or (C and D)") don't work