Is Microsoft Crawling Google? 480
triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."
They been crawling like mad lately (Score:5, Interesting)
Meta-search? (Score:4, Interesting)
In the first case, it's a slimy business practice. In the second, it's fairly cunning ( and has been tried before ).
In either case, I doubt google is in any real danger. They are to search engines what MS is to the desktop. And while MS has squandered that advantage in the desktop arena ( reader homework: 250 word essay as to why ), google is only improving on their work.
Re:Don't concern yourself with this crap... (Score:1, Interesting)
The bot should be treated as no different from another anonymous human. If not the Googlebot, one of the other search engines is bound to find it.
Re:Try this term on MSN search (Score:2, Interesting)
Probably Not.. (Score:2, Interesting)
If they are searching Google, they haven't done it recently, or else they haven't gotten to my site yet.
Re:Try this term on MSN search (Score:2, Interesting)
They really only need to seed their crawler... (Score:5, Interesting)
I could imagine that Microsoft just needs a few thousand URL's evenly-spread across the internet just to seed their crawler, which they can get from Google by using a list of most popular queries.
Once their crawler has so many starting points it can do the rest itself.
Re:Microsoft stealing someone elses technology??? (Score:3, Interesting)
I won't steal your oven, but I'll steal your food!
Re:Does it violate Google's Terms of Service (Score:5, Interesting)
Re:Don't concern yourself with this crap... (Score:5, Interesting)
There's more to it than that. Google caches your pages and makes that cache of your copyright material available. Arguably if you have used your robots.txt file to tell it not to index (and therefore cache) your pages and it still does they are breaching copyright. OK, the Google cache is the world's largest breach of copyright anyway, but if you have told its spider not to index and it does regardless, that's a different ballgame.
Putting it out there on the web does not give anyone the right to do with it as they please.
Re:Difficult to do if Google doesn't want them to (Score:5, Interesting)
Re:Microsoft stealing someone elses technology??? (Score:5, Interesting)
Lo! Note how the review articles of the last few days mention the innovative NEW FEATURE of MSN search called, "Search Near Me" which stores the calculated lat/long of addresses on web pages and returns matches near you.
Note how Google's long in beta Google Local (http://local.google.com) [google.com] stores the calculated lat/long of addresses on pages and returns matches near you. Google Local works better.
Another Microsoft innovation! Let's hope WE remember who had it first!
Re:Don't concern yourself with this crap... (Score:3, Interesting)
Sure, I see crawlers on my site all the time sometimes hitting the same URL over and over again. Do I understand their repetitive behavior? No.
Google gives a partial answer to this on their GoogleBot page [google.com]:
If they're playing around with new indexing algorithms then I would expect to see more of these multiple hits.
EricHow to (gently) detect Internet Explorer [ericgiguere.com]
Re:Don't concern yourself with this crap... (Score:5, Interesting)
Full Circle (Score:5, Interesting)
It's interesting to know that Bill Gates has been forced to go back to his roots...
Arg I hate M$ (Score:4, Interesting)
Re:Try this term on MSN search (Score:5, Interesting)
Before you mod me down for that, I'd like to mention that this isn't Microsoft bashing since I am an atheist too and so are Linus [celebatheists.com] and RMS [celebatheists.com].
Re:Violates Google's TOS (Score:3, Interesting)
Ahhh. So, let's see. If you use google at work, you should be going to jail. Sounds fair.
Can anybody take your comments seriously after you say something like "you should be going to jail?" I don't know when Google became a government agency that could send officers to your door for violating a TOS. No, at best it would be a civil issue. More likely, as you say, they have that clause as a justification if they choose to block usage.
However, of all the companies out there, Google would be the one of the least anal ones I could think of. Almost certainly that clause exists for only the purpose of blocking people doing what MS is (rightly or wrongly) accused of: Crawling them to offer a competing service. And THAT is taking money directly out of their pockets--you can bet if it were true and could be proven, they would do more than start firewalling. They'd be sueing somebody's ass off.
Frankly, I think that is a perfectly legitimate attempt to protect one's business. But hey, if you think it's moronic and crappy, that's your call.
Re:Don't concern yourself with this crap... (Score:3, Interesting)
Googlebot (Google) 74 945.51 KB 11 Nov 2004 - 03:02
Netcraft Web Server Survey 13 0 10 Nov 2004 - 23:48
Mirago 6 76.44 KB 02 Nov 2004 - 04:13
MSNBot 6 76.44 KB 05 Nov 2004 - 05:58
It's interesting that Mirago and MSNBot have taken exactly the same bandwidth in the same amount of visits. Are MS innov^H^H^H^H^H buying new technology again?
Bob
Re:Difficult to do if Google doesn't want them to (Score:3, Interesting)
Re:Try this term on MSN search (Score:4, Interesting)
Not anymore. They apparently hand-edited their own company out of the results about an hour ago.
Re:Difficult to do if Google doesn't want them to (Score:4, Interesting)
For google I get: crawl-66-249-64-167.googlebot.com [66.249.64.167]
for msn I get: fj1011.inktomisearch.com [66.196.91.16]
and msn beta I get: 65.54.188.83 (can't find associated domain)
So we can tell that at least this result wasn't stolen from Google.