Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Internet Technology

Company Offers Customizable Web Spidering 46

TechReviewAl writes "A company called 80legs has come up with an interesting new web business model: customized, on-demand web spidering. The company sells access to its spidering system, charging $2 for every million pages crawled, plus a fee of three cents per hour of processing used. The idea is to offer Web startups a way to build their own web indexes without requiring huge server farms. 'Many startups struggle to find the funding needed to build large data centers, but that's not the approach 80legs took to construct its Web crawling infrastructure. The company instead runs its software on a distributed network of personal computers, much like the ones used for projects such as SETI@home. The distributed computing network is put together by Plura Processing, which rents it to 80legs. Plura gets computer users to supply unused processing power in exchange for access to games, donations to charities, and other rewards.'"
This discussion has been archived. No new comments can be posted.

Company Offers Customizable Web Spidering

Comments Filter:
  • Buried in Digsby (Score:4, Informative)

    by Anonymous Coward on Monday September 28, 2009 @05:50PM (#29572821)

    This is apparently the service that caused a lot of controversy when people discovered it was somewhat hidden in Digsby [wikipedia.org].

  • by pburt ( 244477 ) on Monday September 28, 2009 @06:06PM (#29572995) Journal

    There is a spider crawling the web that claims to be building a free, downloadable web index for similar purposes.
    Torrent link for the index and information at http://www.dotnetdotcom.org/ [dotnetdotcom.org].

  • by mgkimsal2 ( 200677 ) on Monday September 28, 2009 @08:49PM (#29574745) Homepage
    "I mean, if you're going to be some kind of start-up search engine or "semantic company" (whatever that means), shouldn't Web spidering be your core competency? If you're going to differentiate yourself in the market, how can you buy spidering as a commodity?"

    Raw spidering is pretty much a commodity already. You're issuing GET requests over HTTP (for the most part). The "semantic" stuff comes in to play analyzing the results and doing interesting things with raw information you get back. If people can spend more time focused on doing the 'interesting bits' and less time on having to scale up to pull in the raw data to analyze, they'll be better off for it and more likely to be able to focus on creating something new/interesting/distinguishing.

    People (generally) don't write their own web servers, nor their own TCP/IP stacks, often don't write their own session handling logic, or security code. All of these things have been commoditized. Perhaps too many people are relying on 'cloud computing' these days, but hosting and storage 'in the cloud' is where all the cool kids are playing right now (I don't necessarily agree with it, and probably wouldn't put all my eggs in that basket myself, but others are doing so). Spidering may be the next frontier to get commoditized.

    Perhaps not everyone is comfortable 'partnering' with Google for everything? If someone was going to work on developing the 'next big thing', would you rather invest in something where the people had spent an inordinate amount of time building network capacity up to do drone work, or used a service like 80legs, or built the prototypes on Google's servers? Depending on the project, any of those make sense, but I'd prefer to use a service like 80legs myself. They're small enough and hungry enough they should give top notch customer service at this stage, whereas Google's not going to give you a number to call for direct service (maybe they do if you're spending loads of money, but then you're back to wise use of money).

    The P2P aspect of how they're doing the spidering may be clever, but I'd rather see a more direct use of data-center resources around the globe, rather than relying on a seti-like participation model.
  • by Jack9 ( 11421 ) on Monday September 28, 2009 @09:08PM (#29574927)

    Advertising uses a fair amount of spidering for such things as contextual searching (where has a user been and what are their interests). Amazon was completely apatheic, in regards to a company who offered 50 mil for sending them crawling business. I was surprised, to say the least. When it was attempted to do so piecemeal, Amazon got very upset. So there's a demand, but it's probably not very large (# of capitalized consumers).

"When it comes to humility, I'm the greatest." -- Bullwinkle Moose

Working...