Is Microsoft Crawling Google?

Follow Slashdot blog updates by subscribing to our blog RSS feed

Is Microsoft Crawling Google? 480

Posted by CmdrTaco on Thursday November 11, 2004 @03:36PM from the put-on-your-foil-hat dept.

triplecoil writes "Jason Dowdell over at WebProNews has written a piece questioning a tactic Microsoft might be using to beef up its new search engine. He thinks they might be dipping into Google's results to supplement its own. Dowdell likens it to leaving your garbage on the curb--anyone could conceivably go through it and take whatever is there for their own."

This discussion has been archived. No new comments can be posted.

Is Microsoft Crawling Google?

Search 480 Comments Log In/Create an Account

Comments Filter:

Msn Crawling (Score:4, Informative)

by clinko ( 232501 ) writes: on Thursday November 11, 2004 @03:46PM (#10790936) Journal

If you've been watching the logs to your site lately Microsoft has been RAPING most servers. Most crawlers will pick through pages with large lists 1 at a time, then come back every hour or so.

MSN starting last week has been pulling EVERY LINK in sequence from my site. Even the larger Artist Index pages [clinko.com] of my site.

Seriously, I've had this same spider on my site for about 36 hours now.

Share
twitter facebook
Violates Google's TOS (Score:5, Informative)

by Anonymous Coward writes: on Thursday November 11, 2004 @03:46PM (#10790937)

From Google's Terms of Service [google.com]

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

Share
twitter facebook
Re:LOL (Score:1, Informative)

by Anonymous Coward writes: on Thursday November 11, 2004 @03:49PM (#10790960)

Habbit = What a priest wears
Habit = A regular behavior for a person/thing

Parent Share
twitter facebook
Re:Try this term on MSN search (Score:2, Informative)

by fireshipjohn ( 20951 ) * writes: on Thursday November 11, 2004 @03:51PM (#10790998) Homepage

Now try it on google and you get articles about the 'more evil that....' debate.

I know which search engine I'm sticking with :)

Parent Share
twitter facebook
Spike the results, then sue (Score:5, Informative)

by G4from128k ( 686170 ) writes: on Thursday November 11, 2004 @03:52PM (#10791014)

It would be easy for Google to insert a small fraction of non-sequiturs in the results, look at Microsoft's search results, and then sue for misuse. Even if MSFT uses random proxies to avoid detection, it cannot manually recheck all the hits to make sure they are correct (if they could, they had the resources to check all the sites, then they not need to crawl Google. A few made-up sites or inappropriate search hits would be enough to establish a pattern of abuse.

Share
twitter facebook
Re:Try this term on MSN search (Score:3, Informative)

by Garion Maki ( 791172 ) writes: on Thursday November 11, 2004 @03:57PM (#10791084)

pritty funny :)

but it seems like google started it several years ago.

http://www.cnn.com/TECH/computing/9911/15/search.e ngine.ms.idg/ [cnn.com]
and
http://searchenginewatch.com/sereport/article.php/ 2167621 [searchenginewatch.com]
btw, it doesen't seem to work on google anymore...

Parent Share
twitter facebook
Re:Microsoft stealing someone elses technology??? (Score:2, Informative)

by isometrick ( 817436 ) writes: on Thursday November 11, 2004 @04:03PM (#10791181)

Google Terms of Service [google.com]

" ... You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance ..."

Parent Share
twitter facebook
Re:Does it violate Google's Terms of Service (Score:5, Informative)

by nick13245 ( 681899 ) writes: on Thursday November 11, 2004 @04:22PM (#10791413)

Yes it does.
From Googles Privacy Center (http://www.google.com/terms_of_service.html):

Personal Use Only

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales. You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google. If you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Please contact us for more information.

Parent Share
twitter facebook
Re:This could be entirely natural... (Score:3, Informative)

by IIH ( 33751 ) writes: on Thursday November 11, 2004 @04:47PM (#10791679)

If MSN's bot happened upon a link to a Google search page, is it somehow wrong for the MSN bot to follow that link, and spider as normal?
Find a link, fine
Follow the link, fine
Spider the link, not fine - google's Robots.txt [google.com] does not give them permission to.

Parent Share
twitter facebook
Re:Don't concern yourself with this crap... (Score:2, Informative)

by Jahf ( 21968 ) writes: on Thursday November 11, 2004 @05:31PM (#10792183) Journal

IANAL but I would see this as falling under fair use.

1) the LoC is not profitting from your works nor is it re-using them (with the exception of providing an archive to others, see next item).

2) the LoC regularly tells people requesting copies of their information to first obtain permission from the copyright holder (in other words, as with any library, you can browse but you can't copy without permission and copy permission does not equal permission to reuse in a commercial work).

3) Copy protection schemes require active protection to fall under the DMCA, even if it is so simple that anyone can defeat it. Robots.txt is -passive- protection because you have to purposefully search for the file and then purposefully utilize it. To be active protection the document should not come up without the viewer (or blocked viewer) performing some form of action. When someone/something visits an unprotected public web page there is not a way for your web server to invoke the robots.txt file, therefore it is not an active mechanism.

Parent Share
twitter facebook
Re:Don't concern yourself with this crap... (Score:5, Informative)

by ad0gg ( 594412 ) writes: on Thursday November 11, 2004 @05:33PM (#10792199)

If don't want your site indexed or cached by google. Go here and follow the directions.
Remove yourself from google [google.com]
"Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, your webmaster must first insert the appropriate meta tags into the page's HTML code. "

Parent Share
twitter facebook
Re:Spike the results, then sue (Score:3, Informative)

by Dogun ( 7502 ) writes: on Thursday November 11, 2004 @05:54PM (#10792446) Homepage

Seems you don't understand how search engines work^^

What a normal spider does is generally try different IP's, see if they're running a webserver. Then they do a DNS lookup, fetch http:///robots.txt and read that to decide if indexing is allowed, and where. Then it just walks through the website. A number of places on the website might not be directly accessible, but also not disallowed for indexing by robots.txt.

If some other site has a link to that webserver in some disconnected region of the website, then the crawler generally makes sure it's okay to index that against the robots.txt, and if so, indexes.

The accusation here is that Microsoft isn't finding these adresses on their own, but instead using google's 'site:host.domain' results as a shortcut, which would constitute a violation of google's terms of service.

Parent Share
twitter facebook
Re:More lies from cowardly trolls (Score:2, Informative)

by Asphalt ( 529464 ) writes: on Thursday November 11, 2004 @05:59PM (#10792500)

They do profit from your data. However, being that it is publically available on an HTTP server, that's pretty much their right. That's like you handing me $5 for me to tell you which magazines you might like to buy.
And MSN crawling Google's site is really no different. As long as the Google data is on a public server, it is fair game to crawl.

Parent Share
twitter facebook
Re:Try this term on MSN search (Score:4, Informative)

by StikyPad ( 445176 ) writes: on Thursday November 11, 2004 @06:19PM (#10792745) Homepage

His thought process probably started here [google.com]

Parent Share
twitter facebook
google doesn't allow bots to crawl google.com... (Score:2, Informative)

by lixlpixel ( 747466 ) writes: on Thursday November 11, 2004 @06:33PM (#10792885) Homepage Journal

see their http://www.google.com/robots.txt [google.com] robots.txt
so if the msn bot does what they say it doesn't do what it's supposed to do.

Parent Share
twitter facebook
Re:Don't concern yourself with this crap... (Score:4, Informative)

by djcapelis ( 587616 ) writes: on Thursday November 11, 2004 @09:26PM (#10794405) Homepage

To remove all the images on your site from our index, place the following robots.txt file in your server root:
User-agent: Googlebot-Image
Disallow: /

That should work? No?

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Is Microsoft Crawling Google? 480

Is Microsoft Crawling Google? More Login

Is Microsoft Crawling Google?

Msn Crawling (Score:4, Informative)

Violates Google's TOS (Score:5, Informative)

Re:LOL (Score:1, Informative)

Re:Try this term on MSN search (Score:2, Informative)

Spike the results, then sue (Score:5, Informative)

Re:Try this term on MSN search (Score:3, Informative)

Re:Microsoft stealing someone elses technology??? (Score:2, Informative)

Re:Does it violate Google's Terms of Service (Score:5, Informative)

Re:This could be entirely natural... (Score:3, Informative)

Re:Don't concern yourself with this crap... (Score:2, Informative)

Re:Don't concern yourself with this crap... (Score:5, Informative)

Re:Spike the results, then sue (Score:3, Informative)

Re:More lies from cowardly trolls (Score:2, Informative)

Re:Try this term on MSN search (Score:4, Informative)

google doesn't allow bots to crawl google.com... (Score:2, Informative)

Re:Don't concern yourself with this crap... (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot