Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 1 +-   Google Crawls The Deep Web on Wednesday April 16 2008, @08:23AM mikkl666

Submitted by mikkl666 on Wednesday April 16 2008, @08:23AM
google
mikkl666 writes "In their official blog, Google announces that they are experimenting with technologies to index the Deep Web, i.e. the sites hidden behind forms, in order to be 'the gateway to large volumes of data beyond the normal scope of search engines'. For that purpose, the engine tries to automatically get past the forms: 'For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML'. Nevertheless, directions like 'nofollow' and 'noindex' are still respected, so sites can still be excluded from this type of search, and forms requiring personal information are promised to be a no-no: 'We omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc.'"
submission

This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • For text boxes, our computers automatically choose words from the site that has the form


    Along with some spa--- I mean, content-relevant links to paying sponsors, perhaps?
Tart words make no friends; a spoonful of honey will catch more flies than a gallon of vinegar. -- B. Franklin