Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Technology

Gnutella Technology Powers New Search Engine 67

Matrium writes: "News.com (owned by CNet) is running an article on how the makers of Gnutella have turned their decentralized model of information swapping away from music and porn, and are now looking at search engines. InfraSearch is still in beta, but it does offer an interesting look in the evolution of the Internet." InfraSearch presently paws through only a few search sites, but as a concept really intrigues me. For one thing, it introduces the long-overdue concept of "how long to search" right into the query dialogue.
This discussion has been archived. No new comments can be posted.

Gnutella Technology Powers New Search Engine

Comments Filter:
  • by Anonymous Coward
    One of the gnutella clones, furi, has similar features, and it's GPL with all the sources and everything. It has a built-in web-server that can index and serve HTML and MP3 files. Basically it let your HTML pages to participate in the search space of the gnutella net. People with the gnutella clients can search your web pages in realtime. People with a web browser can go through one of those web-to-gnutella gateway to do the search. Check it out: http://www.jps.net/williamw/furi/
  • The 'net has traditionally been completely non-commercial. And it was originally only for Military or academic purposes. There was a considerable amount of accountablity when all sites were administered by real people, not companies like CompuServe or AOL.

    It would be a real error to claim that there was an origin where it was all totally 'free' and anonymous.
  • One possible problem: each site that is searched will be quite able (given processing/storage/blahblahblah) to collect and analyze the search requests that go through it.

    Even if the software as distibuted to disallow this, it's quite possible for the site to tweak the code to this purpose (one of the few downsides to open source). This could be used for targetted spamming, building enemies lists, etc. Since there's no way to know what systems your search will hit, there's no way of knowing what their (stated or actual) privacy policies are.

    I'm less paranoid than some here, but this still lit my worry button.
  • to deal with problem #2. What if along with each page returned, a site also returned a relevancy rating and identified the procedure or algorithm used to reach that rating. This would allow the querying site to verify that rating with the same procedure, and provide a general trustworthiness rating for the site.

    Of course, this means that the procedure(s) used to generate the relevance rating must be publically available. But there's nothing stopping the source site from using a different rating system internally to find the pages, as long as they use a public algorithm for the rating that's returned with the page...

    Would this help at all?

  • One way that it could be enforced would be if all of the ISPs followed the lead of some "always" on suppliers who, "to protect the customer", filter all incoming TCP connections (ie packets with the SYN but not ACK bits set) from their customers. That way customers will not be able to directly connect with other customers, but only with "central" sites. This would break the gnutella (and other) modes of operation. This is not common now, but who is to say what may happen in the next few years?
  • Sorry if someone has already said this, but I do not have time to read the posts right now, BUT

    Inference find http://www.infind.com
    has LONG had a time to search option right after the field where you do the search.
  • Yeah. A few high profile, especially pro-corporate, applications would be a very strategic move to protect this new technology. It's a shame that we'd have to go to this length, considering that many new technologies are brought into the mainstream by the fringe, e.g. pornography and video. I have been tossing around an idea (I'm sure I'm not alone, but I haven't found it yet) of using these new distributed filesystems (gnutella, freenet) as ways to expand the web. With the right XML programming and some encryption technology, we can code distributed auctions, search engines, or any type of application that benefits from a large, participatory userbase. We can make GPL versions (or whatever public license best applies) of any centralized web site.

    I'd like to see a distributed version of consumer reports.
  • As far as do it yourself DDOS:

    1) Most of the results would probably be "nope, don't got that", or "not really". Just don't bother to send the info in unless you are above a certain level of matching.

    2) Even if there were a large amount of results, why could't you sort of decentralize that too? You have your own search client on your computer. No centralized search site. You send out a query to several other computers. They talk amongs themselves, expanding at a geometric rate. The info gets collected to say 20 different nodes throughout the internet. Those 20 different nodes send you just one reference to the html/whatever pages summarizing their results. You display that in your browser. I'm not quite sure if that made sense, but you should get the general idea. Why just distribute a little bit?

    Finally about your idea, that would keep entities from returning dynamic content. This might not be a bad thing, as other people have pointed out. I've got an idea bout that too. Somewhere in the distributed computing, you'd have a computer actually search the page to see if it matches what you want. You could still have people falsely strew what you wanted in the page, along with porn and printer cartridge deals, I guess. I also like the idea of "moderation". You could even have a search for more "spamlike instances" in the pages. Search for porn, toner cartride deals, etc (unless that's what you serached for of course), and make these more likely to be moderated. Maybe you could also take away the ability to be in the searches if you get moderated as a spammer too much.

    Just some thoughts.

  • Try to check out my search engine AEIWI [aeiwi.com]
    It lets you filter the search result,
    a type of automatic directory.
  • "Unlike Napster, however, it allows people to search for any kind of files; a random sampling of the search terms being used at any given time ranges from MP3s to blockbuster movies to pornography."

    Unlike Napster, however, AltaVista, HotBot, Google, etc. can be used to search for pirated software, pornography and blockbuster movies...
  • The circle is complete. What short memories people have!

    At least *I* am old enough to remember that this technology has already been
    implemented and is now dead because relevancy doesn't scale to an adversarial
    Internet.

    This sort of system used to be called Archie, remember? Hello?

    It stopped being relevant because it was supplanted by something new called a
    <finger quote gesture> "search engine" </finger quote gesture>. The problem with Archie was
    anyone could claim to have just what you were looking for, so they did. Duh.

    Now Andreeson et al say this is great because it "handles dynamic content",
    which of course "search engines can't do". The dynamic content they refer to is
    a spec sheet for a computer that Dell supposes you are looking for. So the point
    is that Dell can just *create* the spec sheet you're looking for in response to
    your search, right? Sounds spam-proof to me!

    Duh.
  • Long overdue my ass... inference find [infind.com]
  • 1) An obvious point: if a site itself decides which queries to respond to, there'll be a lot of spamming the index. Doesn't anybody remember the fate of the [meta] tags?

    They're still there, and still used correctly by some sites.

    There is already a lot of spamming of search engines. And the search engines often aren't that good at weeding out the spammers. Perhaps collaborative trust-based filtering is the way to go. Something like this: Anyone can register a vote for a filter, and the individual filterings are also rated by everyone (meta-moderation in other words). Those with the highest Karma and no reputation of censoring things - and eventually, those most knowlegable in the area and/or those of similar political disposition to the user in question - will tend to be trusted by other users.

    (2) This search technology essentially turns a search into an advertising stream. Since the site decides what to return, it'll return a blurb instead of a context around the match. And if the site can returns graphics and not just text strings... oh, my! Advertising banners as search results! Joy.

    In some cases, this could actually drive users away. But yes, it is a problem. Filtering would be a good idea here too. The server software could come with a warning attached - "You MUST provide an ALT text alternative to any images, otherwise you will drive away viewers who choose to block images in search results."

    (3) The results are going to be dependent on the location of the query. Same question asked from a machine in California is likely to return different results if asked from a machine in Germany (especially with low timeouts). This isn't horrible, but not all that good. In particular, it means that I cannot tell other people "Search for 'foo', you'll find the site I am talking about on the first page".

    Well, this already happens. You get different results depending on which search engine or directory you choose.

  • Distributing the searches is a waste of resources, IMHO you should distribute the indexing mechanism and centralise the searching.

    No, certain searches should be distributed as well. How many cluebies search for "sex" or "MP3s" on Yahoo every day? Many. What are the results going to be? More or less exactly the same (low-quality) results each time.

    Yet these results aren't cached at all because the results of CGI scripts or servlets or other dymanic content providers aren't cached, according to the HTTP specification. This is a big waste. We need smarter protocols, and this idea might be a step in the right direction.

  • Correction to my previous post: these results aren't cached at all, except maybe on the Yahoo server itself (and when you press the Back button on some browsers). But caching them inbetween the server and the browser could be even more efficient.
  • D'oh! Chances are, someone else has already done the search for what you are looking for, and the URL information is already in their computer.

    Add a mechanism or two to let users rate pages - or better, to automatically rate them according to how long they are actually displayed in browsers, if their links are used, content saved, printed, made a part of bookmarks, et cetra...

    Run browsers through the engine, and off of URLs/pages read by the distributed search engine, so it can do the above rating, and keep a more appopriate cache, maybe whole smallish pages themselves to return instead of URLs.

    Combine the data from several sources to keep someone from skewing the ratings, and in the end-user's node rate the results from other nodes (as above, preferably automatically too) according to usefulness/cluelessness/spamishness. share that info too!

    Then you have a distributed search engine worth something.
  • > Without "legitimate" applications for
    > technology, they will be viewed as simply tools
    > for pirating or other illegal use.

    Puh-leeze. The reason these technologies are viewed as aids to pirates is that they *are* aids to pirates. Not that Lars is the expert, but I tend to believe him when he says of all the Napster traffic they monitored, only a negligible portion was legitimate. Likewise, gnutella turns around and says, we have a tool that does just what Napster does, but can avoid messy lawsuits through decentralization!

    Is it any wonder people are skeptical?
  • Though 'push' has gotten a pretty bad rap (mainly for poor execution IMHO), I think you're on to something here.

    Instead of a Gnutella based search engine, it's a Napster-based search engine (mind you, the offerings stay live even when the client isn't connected).
  • Well, if the programmers do actually form a company, then they could create their own daemon that would search the site. If they did this, and didn't use any of the previous code from Gnutella, then they could release it as a binary only.

    This would stop some of the potential abuse of the search engine (your points 1 & 2).

    As for your third point, you could always use email/icq/aim/irc/etc to send the person the url.

  • Why do more work to go beyond a search for music and porno? Those are the only 2 things on the net anyway! :)

    Mike Roberto (roberto@soul.apk.net [mailto]) -GAIM: MicroBerto
  • FerretSoft did an entire suite of software just like this years ago. They have software to do metasearches for webpages, auctions, news groups, irc, email addresses, phone numbers, general information, and files. If you're a windows user, it's incredibly good software. Webferret automatically filters out multiple hits, and the returns are generally very good. Of course, it has a banner ad at the bottom, but that's to be expected. Anyways, my point is that this concept is hardly new, and it's surprising that it's taken this long to come to linux. I'm surprised that Ferretsoft hasn't released it for linux. We should all write to them and suggest it.

  • electronic transfer = fast physical transfer = slow The highways can transport the same materials as the internet (and more), but to Joe Nobody, it's all about instant gratification. When it can be had for little to no effort, then it will be had... just cuz. But when it takes time or effort, there's a certain barrier to entry that keeps a lot of people out.
    -- kwashiorkor --
    Pure speculation gets you nowhere.
  • Some relevant material from the article Gnutella and Freenet Represent True Technological Innovation [oreillynet.com]:

    The exponential spread of requests opens up the most likely source of disruption: denial-of-service attacks caused by flooding the system with requests. The developers have no solution at present, but suggest that clients keep track of the frequency of requests so that they can recognize bursts and refuse further contact with offending nodes.

    Furthermore, the time-to-live imposes a horizon on each user. I may repeatedly search a few hundred sites near me, but I will never find files stored a step beyond my horizon. In practice, information may still get around. After all, Europeans in the Middle Ages enjoyed spices from China even though they knew nothing except the vaguest myths about China. All they had to know was some sites in Asia Minor, who traded with sites in Central Asia, who traded with China.

    Some relevant material from the article The Value of Gnutella and Freenet [webreview.com]:

    The spread of MP3 files, and their centrality to Napster, skew the debate over free and copyrighted content. Lots of people are willing to download free music files from strangers, because if they find out that the sampling quality is lousy or the song breaks off halfway through, nothing has been lost. They can go back to Napster and try another site.

    Matters would be entirely different if you tried to get free software from strangers, especially in binary form. You'd never know whether a Trojan Horse was introduced that, two years later, would wipe your hard disk clean and send a photo of a naked child to the local police chief. (And you thought UCITA's self--help provision was as bad as it could get!)

  • I'm a bit confused by the idea that setting the time to search is a new idea - metacrawler [metacrawler.com] has been doing that for years. In any event - a meta search en gine for up-to-date news is way cool, and it's about time. (I'm surprised it took so long, actually.)
  • You failed to mention any of the Magic Buzzwords (B2B, E-commerce,...) so your firm will surely fail.
  • Have they released the source code for their engine yet? Or will they ever?
  • There is only one problem: server needs to be able to tell when information has gone old. Often when you generate content on the fly you cannot say if it's going to be old after one second or after one week (or perhaps even year). Only possible place to cache this is in the browser (like nowadays) because it's up to you how old information you accept to increase speed. For example I have set my limit to reload when I after I have restarted my browser or press reload. I would hate if my ISP (in case ISP provided such cache) decided that I don't need new slashdot frontpage more often than once a week.
    _________________________
  • Do they think of implementing better search abilites? Regex would be too much for the moment, but simple boolean would be fine to start with.
  • It might make it easier if they set it to let you tell the search engine how long to search each site, so it wouldn't get hung up on one site.
  • A dedicated, hard, well mirrored network, set up for external access, would alleviate these problems.
  • imagine some corporate making a script sending bad feedback to burry a concurrent website...

    Ya...exactly...and also look at /. itself...half the moderators don't even read the threads or look beyond the poster's name. Personally, I'll still intrust myself to Yahoo! when looking for good content and badasses like altavista and google when searching for rare, specific phrases.
  • "We are a Linux-based, customer-focused, technology-driven program which leverages synergies between results from common search engines, which can have E-commerce applications.

    "<Insert wildly overstated profit estimates here>"

    How's that?

  • Ouch! It hurts!

    Please, is this a joke ?
    Say it isn't. Slashdot is a site for news and discussion about these news. There is no need for off-topic things. Slashdot readers are not crackers (look at http://www.2600.com [2600.com] instead). You can break code with DeCSS if you find it. Noone's going to help you with that.

    The people who told you /. was a 31337 h4xOr w4r3z bunch were misinformed, or lies to you.

    Now, say it was a joke. I reply to this because there are not many comment at this time (3, including two off-topic), so there aren't more interesting debate here for now.

  • Gnutella is changing the world !
  • I mean really... This is just a great looking idea. Anyone get in BEFORE they decided ppl couldn't try it out? *sigh* one of the side-affects of hearing it on slashdot. Oh well. Oh and umm since it's mentioned after all... Gnutella rocks! Okay enough gnutella advocacy... It's mostly porn and people searching for kiddie porn. does that mean that is all this search engine will find? Oh great. If it could eliminate the 10000 popups, then maybe...
  • But all people ever search for is music and porn anyway...so what's the use?
  • Hi. I'm Allan Cox, Open Source advocate, Linux [saltire.org] advocate, and primary coder for Linux's TCP/IP stack. I hope I'm welcome in the SlashDot forums, as til this point, I've been a totally arrogant, antisocial bastard to the community which barely pays for my lifestyle.

    In regards to the TCP/IP stack in Linux and my arrogant attitude, I must apologize: as you all already knew, and I just recently admitted to myself, FreeBSD [saltire.org]'s TCP/IP stack is far superior to Linux's, and to top it off, Microsoft [saltire.org] has proven many a time that even the TCP/IP code found in Windows NT [saltire.org] functions better than the drivel I have generated myself. Boy, what a humbler that is! It was like RMS and ESR yelling at me on my own front porch (well it's not really my front porch, it's the landlady's, in front of my one-bed, half-bathroom hovel, but you get the point)!

    I'd also like to say, in regards to those who read and post in SlashDot's forums... I am sure I will be seeing Allan Cox. [note the period], Alien Cox, Allan Cocks, Allan Coox, and the like. Please, please, please, for those of you who take SlashDot posting seriously (as I do now, amen!) do not let these crank posters (heretofore to be called "trolls") ruin CmdrTaco's bountiful SlashDot experience! "Trolls" take some delight in confusing the populace and causing disparity in the community. Take the time to learn the real from the fake, as I have (re: how I admitted to myself my TCP/IP stack for Linux actually sucks)!

    Thank you.

  • This, and lots of other sorts of spamming, admit only one really good solution: collaborative filtering. You can find out more about this from Berkeley's link farm [berkeley.edu].
    --
  • Right now, the Internet could really do with some tools which empower the user. This looks like another way for big content providers to herd users where they want them. Traditional search engines have always been a bit of a battle, with content provides trying to find new ways to `stuff' the search results, and make sure that their pages came out on top. Now you don't need to do that any more -- just pay these guys whatever they're asking, and you can display what you want. Including images, by the look of things, which sounds cool, but really just lets provides grab your attention.

    Of course, if this were heavily user-moderated, I guess it might just work. But don't hold your breath. I'll be sticking to Google...

  • Federated search engines aren't exactly a new idea. The reason why they haven't been used much on the web are issues related to quality of service, reliability, spamming, and revenue sharing. Those could be worked out, but web search engines seem to have found it easier to just centralize resources. Federated search has been more popular within particular communities (e.g., scientific literature, intranet sharing), and for integrating a few top search engines (MetaCrawler etc.). For more information, type "federated search" into Google.
  • [Meta tags] They're still there, and still used correctly by some sites.

    Sure, but does anybody care? Meta tags are now 'tainted' and no search engine even spares as much as a glance in their direction.

    Perhaps collaborative trust-based filtering is the way to go.

    I don't understand how this could work. It's doable for a tree-like reference site (Yahoo), but seems impossible for a pure search engine (Google). Let's say I type "support vector machines Vapnik" into the engine -- who and how is going to filter the results so that I don't get "naked and petrified young girls" matches? The only feasible thing seems to be locking out of IP addresses which supply bogus data, but this will get extremely messy extremely fast.

    You MUST provide an ALT text alternative to any images, otherwise you will drive away viewers

    So instead of a banner I will see "Come to our site for hot chicks, naked and petrified...". I don't think this is going to help much.

    I wasn't really concerned with bandwidth. I was concerned with the fact that a search becomes a request to send targeted advertising to you, and nothing more.

    Kaa
  • I'd like to see a distributed version of consumer reports.

    www.deja.com [deja.com]

    But organization helps, so for now, stick with consumer reports.
    --
  • By reading this article I don't really get the difference between this search method used by gnutella and the harvest web indexer [ed.ac.uk]. I have to admit that I don't know much about both of them but for me it looks nearly the same.
  • I really, really hate projects which are "open source" but who refuse to release the source until it's "done." Too many projects these days seem to be following that path, and it's a dangerous one to take. Because what if the code is never truly "finished", as no project ever really is. Its sad.

    It is not sad. You're simply parroting Linus Torvalds's "release early and often" advice. This works for an OS, because compatibility problems are the issue of the day. There are also reasons not to do so:

    Most people don't take the time to upgrade, so they'll miss out on major features that are added later.

    If you release something that is incomplete, people will try it, see that it is incomplete, and have a poor impression from that point on.

    If promising software is released at an early stage, there is likely to be much more "cloning activity," channeling effort into doing things again--the right way--instead of tweaking something that's already most of the way there

  • Harvester still has a central Broker server that all the Gatherers send the info they have collected. Where as Gnutella is truly decentralized with your searches all going to what would be the Gatherers in Harvester, with no centralization.
    The reason for the Broker server may be because of problems of spam etc. as many have been discussing above.
  • First time I've seen how this stuff really works, and the implications are really amazing:

    All these technologies are just DNS all over again. DNS was created to make host information available all over the internet. Here's the difference:

    When DNS was set up, it was probably just as easy to try and pirate stuff, and who knows? Maybe people did use the early internet for illicit purposes, but only the few people who were in the know could actually do so. And not much was available on the net anyway. But DNS was created for the exact same purpose as napster and freenet: to make it easy to share information.

    Nowadays, the internet is so big that there are lots of people into it only to make money. The possibility of a scam makes people run to see how they can get their share of it, and a technology like this, however innocent, will make the headlines when everyone rushes over to see what scam (and related lawsuit) they can pull off.

    All these technologies: freenet, napster-likes, all sorts of things, are incredibly valuable extensions of what already provides structure for the wired world. If someone had thought of them in 1980, we would have a much tighter, distributed internet today.

    Well, we've thought of them now. I hope they are allowed to flourish, and that people don't keep just thinking about the negative implications of them. This is the first time I've seen a concrete example of putting it to good use.

    I think we should have the right and the possibility to choose to share what we want to:

    Imagine all the information that our governments gather from us a la enemy of the state for example: with this kind of network idea, peer to peer, we could all be gathering and sharing that information already, and maybe even doing something positive with it!
  • This has been an ongoing discussion on the developers list. Some people are for it, others against. The ones who seem to be against it are the ones who want to maintain the simplicity of gnutella. Others are suffering from kitchen-sink syndrome. You have to admit that the general technology behind gnutella could be adapted to a really great real-time web search engine. That is, if they ever get around to releasing the source to it.

  • o Gnutella gets attention for being a haven for pirates.
    o Nullsoft creates a search engine based on the technology to legitimize it.
    o InfraSearch gets media attention, much fanfare etc.

    . Harvest creates a distributed search engine.
    . Had you heard of it before now?

    It's the same old story, to be heard you have to be contraversial, or rich.

    Gotta love the 'net - created for war, popularized by porn and piracy.
  • I think I will go public next week, or maybe tomorrow. I am starting a small firm that will search the search results that were searched by a search engine that searches search engines.

    We're called Search.

  • Gnutella brings an interesting thought to our Internet, and it is an old one. It is an ever present, self expanding, responsive, searchable file system. You don't register with the search engine, you become a part of it. With the advent of major web page trafic, we got away from this very important concept. I large on the fly networked filesystem and expanding network connections are what the world really needs in order to move forward. As for search time, that is necessary in such a paradigm, since the responsiveness of searches is not a function of the algorithm running a localized database, but of the responsiveness of network nodes. IE, this is the good stuff that we left long agon in search of "user friendliness." It turns out that it is more useful, quicker, and friendlier! This is what we haven't been waiting for, but put on the back burner for a bit, in order to turn a profit. If you think about it though, with the right protocols to initiate such a turn in industry, this could be even more profitable than the web. ...And maybe we could get away from identifying ourselves by someone else's product (URLs).
  • The children! We musn't harm the children! Think of the children!

    Please Slashdotters, Think of the Children! and stop stealing the food of the innocent children with this Gnutella technology!

  • I just thought of something - remember way back when the internet first became somewhat popular? Everyone went bonkers saying it would be a haven for porn, piracy, hate literature, etc., and there were debates (maybe not as public as the current ones) on wether the internet should be censored or not.

    At the time, as I remember, one major argument against censoring the 'net was that it was nearly impossible to do - "anyone" can post "anything" on the net, and because its so international, no one nation had control.

    Oh where have those days of freedom gone? How did the censors get past those barriers? Easily enough, it seems - they have the money and the will to spend it on their self-interest that makes anything possible. In retrospect, our claims of immunity from censorship were naive.

    I believe that once systems like Gnutella become popular, it will move (like the original web) from being a geek's haven to corporate tool, and be appropriately restricted by their needs. Maybe less so than the web today, but order will be enforced. How? I don't know, but did we forsee the DMCA and other tactics corps are using to censor the 'net?

    Don't worry, by then I'm sure we'll think of something else. :)
  • A sort of learning search engine that receive feedback from users ("this site isn't about what it claim to be, don't show it as a possible answer", "this site is excellent and very complete", "this site is nice but unreadable without browser so-or-so", etc). That can be the future. Of course it will need lot of negative feedbacks unbalanced by positives and some other checksto definetly dismiss a site, otherwise we can imagine a way of making "softwar": imagine some corporate making a script sending bad feedback to burry a concurrent website...

    Sidenote: IIRC, "Softwar" is the title of a novel, so this wordplay is not mine.

  • I'm definitely not an expert, but collaborative moderation doesn't seem like a bad idea. You could maybe have a separate, probably more centralised moderation server. (Or lots of them if people start running their own.) Users could rank the results they get from any given site, and when others run a search, the reliable replying sites come up first.

    There are still lots of problems though, like how to stop anyone from just moderating their site to the top, and how to make sure the responding site is exactly who they say they are.. which is one of the major problems with spam these days anyway. It could also be really tedious working out how to distinguish a good result from a bad result.

  • by Ed Avis ( 5917 ) <ed@membled.com> on Wednesday May 31, 2000 @02:09AM (#1037136) Homepage
    What's to stop people 'spamming the index'? When your site gets a query, you could respond with 'very strong match' in the hope of getting more hits.

    Who is enforcing that sites won't just lie? Maybe some sort of collaborative moderation a la Slashdot would be needed?
  • by CMU_Nort ( 73700 ) on Wednesday May 31, 2000 @03:13AM (#1037137) Homepage
    I really, really hate projects which are "open source" but who refuse to release the source until it's "done." Too many projects these days seem to be following that path, and it's a dangerous one to take. Because what if the code is never truly "finished", as no project ever really is. Its sad.

  • by Spankophile ( 78098 ) on Wednesday May 31, 2000 @02:52AM (#1037138) Homepage
    Since people can define their own content, would this mean that people running the server-end could still be distributing their MP3's, pr0n etc, but through a web interface? It's not just limited to html-page searching.

    This makes pira^H^H^H^H trading files even easier - people no longer need to install a client, there's a nice web-search interface, with direct dload URLs. Web searching for files with no broken links. Nice.

  • by ArchieBunker ( 132337 ) on Wednesday May 31, 2000 @04:16AM (#1037139)
    Go ahead and do a search for something. Within 5 minutes whatever connection you have (dsl t3 etc) will be saturated. Has anyone ever had a complete download? Getting 100bytes/sec on a 5 meg file is insane. Maybe if it reported their connection speed truthfully and people set realistic download/upload limits.
  • by dayL8 ( 184680 ) on Wednesday May 31, 2000 @02:33AM (#1037140)
    Doesn't the model imply that every search will be processed by every available server - effectively turning a single query into n queries and responses?
    Just think - you're dialled in to an ISP and want to search for something. Eventually you start getting responses, first from hosts logically closer to you then those further away (we can only hope that there's no negative response in the protocol). You may have to wait for it all to come down the line before you get a useful result. And you'll still have to wade through mountains of useless junk (since responders get to define what content they have) just that now you'll have to actually visit the site to see that it's just another boring article on internet protocols instead of the "fix your credit record" guys you were looking for. Eventually, you'll learn which hosts not to accept responses from and which ones respond better to what types of queries (just like today).

    Big search engines will still dominate the field by being able to get it right most of the time. I don't see any real advance.

    ---
  • by GrayMouser_the_MCSE ( 192605 ) on Wednesday May 31, 2000 @02:21AM (#1037141)
    The article mentions this, but not strongly enough. Without "legitimate" applications for technology, they will be viewed as simply tools for pirating or other illegal use. FTP, as an example, could be used for those purposes, but the mainstream uses came first. We need to develop as many mainstream uses for mp3 and gnutella as can be done, so the focus of the technology critics can be drawn away from the music/copyright questions, on to the other uses. As of now, they can claim that other uses are simply "vaporware". Sure, they're possible, but no one is actually doing anything with them. Once the applications come, the technology will gain the acceptance it deserves.
  • Yes, but the idea of letting clients search eachother and share files, the Napster or the Gnutella, way is a very good idea. It has many legitimate possibilities, its just that it started out being used for piracy, but saying that its only use is for piracy is a bit short sighted. Though honestly, it can be easy to see things that way. I used to believe the only good use for CD-R technology, was copying games and music. But then I became a network administrator, and realized its benifits for cheap backups. Anyways, my point is that you should never abandon the new ideas, just because its first uses are bad (take nuclear power for instance :)
  • by JamesSharman ( 91225 ) on Wednesday May 31, 2000 @02:21AM (#1037143)
    The internet has traditionally been free, in recent years/months we have seen an increase in attempts to control the internet via legislation, patents and law suits. The problem is that whilst the internet has seen a large influx of everyday joe's and suits the real power behind the net is as always the people who write the software. Gnutella and software systems like it are part of the fight back. Previously online systems have been centralized due to simplicity and the lack of reason to build them any different. Since we are now entering a time when the freedom we used to take for granted online is under threat new software systems that are nearly impossible to regulate are inevitable. If the various governments and organizations had paid attention to the cherished principles of the net perhaps we could have found a way to limit the pedophiles and professional pirates that they seem so paranoid about without compromising the net's principles to much.Instead the MPAA, the RIAA and all the other control freaks decided they wanted to make a war out of it, and a way they will get.
  • by ClayJar ( 126217 ) on Wednesday May 31, 2000 @02:33AM (#1037144) Homepage

    What's to stop people from spamming the index?

    I suppose they could build in a little technology to actually check the page. On the other hand, anything you do can be circumvented.

    I suppose this is the classic downside to the entire Internet "thing". You can't enforce absolute control in a medium specifically designed against it. Of course, there are a few things you could do to help the situation.

    With a Gnutella-style model for distributed searches, any host that is consistently returning false positives could be cut off by the adjacent node(s), right? If you have tons of traffic coming through your node from a spam site, couldn't you just stop forwarding requests to them.

    Of course, this wouldn't stop all spamming on the index, but it should allow any one node to cut off a spam node "below" itself. On the other hand, since not everyone will be eternally vigilant, this much freedom could be damaging.

    You could always have something like the MAPS RBL for search nodes. Just have someone paying attention that can keep a database of hosts to ignore requests from. If anybody can create a blackhole list, it wouldn't necessarily be centralized, so it wouldn't impinge on freedom of the search. It may still have an "open relay" problem, like SMTP does now, but that doesn't necessarily make it not worthwhile.

  • by Pooh22 ( 145970 ) on Wednesday May 31, 2000 @03:38AM (#1037145)
    One of the problems I see with this gnutella method is the broadcasting of the searches.

    Example: If you get the results of this kind of broadcast search back from a bad search ("sex nude pictures jpg"), you'll trash your own internet connection and probably that of others (or the search-interface's if you use a web-interface).

    Imagine a network of a million hosts (a small subset of all webservers). Each of these is running a gnutella-based search-engine. On one of the servers is an interface to search the network for some information. The query is forwarded onto the overlay network, to say 10 nodes at each node, assuming some mechanism is in place to avoid loops. if the network is well interconnected, it will take about 5-6 hops to reach an edge of the cloud (probably a couple of times more to reach all the nodes). As soon as the first nodes get the search-request, they send back results, say limited to the first 5-10 most significant hits. Each reply has a number of tuples consisting of (URLs, a description and an indication of how close the match is and a timestamp and probably some more), maybe 1-2 kB per reply. Say 10% of servers have a match, then 100000 hosts will at some point send back results.

    I calculate, roughly a 100 MB of results will be arriving at the searching node within a few minutes, if it can process the dataflow

    This is only one search, both the searching nodes and the servers will have to deal with a lot of searches if you look at other search-engines as a comparison.

    Centralised search-engines are a good way to limit the bandwidth-usage, but they are slow to get changes on the web.

    idea: It would be good to have a webserver keep track of an index for it's own document-space and when that changes, push that change to a central search-engine where it can be searched. Distributing the searches is a waste of resources, IMHO you should distribute the indexing mechanism and centralise the searching.

    And considering that for this thing to work you need an index-engine on each server anyway, it's a small step to do it like this, isn't it?

  • by Kaa ( 21510 ) on Wednesday May 31, 2000 @03:55AM (#1037146) Homepage
    The idea is interesting, no doubt. However there are three major (from my POV) with it:

    (1) An obvious point: if a site itself decides which queries to respond to, there'll be a lot of spamming the index. Doesn't anybody remember the fate of the [meta] tags?

    (2) This search technology essentially turns a search into an advertising stream. Since the site decides what to return, it'll return a blurb instead of a context around the match. And if the site can returns graphics and not just text strings... oh, my! Advertising banners as search results! Joy.

    (3) The results are going to be dependent on the location of the query. Same question asked from a machine in California is likely to return different results if asked from a machine in Germany (especially with low timeouts). This isn't horrible, but not all that good. In particular, it means that I cannot tell other people "Search for 'foo', you'll find the site I am talking about on the first page".

    Out of the three, the first is so obvious, something will be done about it. I don't know what, though. It's the second that worries me most of all. Besides more advertising, there is a basic problem here -- I want to see what the site has, not necessarily what they prefer to show me. To give a trivial example, a company could have a recalls/warnings/manufacturing defects page somewhere on its site to satisfy disclosure requirements, but never return this page to any search.

    All in all, I'll stick with Google for the time being, thank you very much.

    Kaa
  • by Hard_Code ( 49548 ) on Wednesday May 31, 2000 @03:05AM (#1037147)
    "Unlike Napster, however, it allows people to search for any kind of files; a random sampling of the search terms being used at any given time ranges from MP3s to blockbuster movies to pornography."

    "The Department of Transportation released a shocking report this morning, in which it was discovered that the federal highway system, unlike rural routes, allow transportation of any kind of material. A random sampling of items being transported at any given time ranges from pirated music to pirated blockbuster movies, to pornography."

Real programmers don't bring brown-bag lunches. If the vending machine doesn't sell it, they don't eat it. Vending machines don't sell quiche.

Working...