Free Software Activists Take On Google Search 254
alphadogg writes "Free software activists have released a peer-to-peer search engine to take on Google, Yahoo, Bing and others. The free, distributed search engine, YaCy, takes a new approach to search. Rather than using a central server, its search results come from a network of independent 'peers,' users who have downloaded the YaCy software. The aim is that no single entity gets to decide what gets listed, or in which order results appear. 'Most of what we do on the Internet involves search. It's the vital link between us and the information we're looking for. For such an essential function, we cannot rely on a few large companies and compromise our privacy in the process,' said Michael Christen, YaCy's project leader."
Well (Score:4, Insightful)
Result: Search results will be controlled by botnets
Re:Well (Score:5, Insightful)
Result: Search results will be controlled by botnets
Yes. What's to stop me from downloading the code, modifying it to put my results on top and then joining my 1000 or so servers to the pool? You only need a small advantage to get big differences in results -- the difference between 10th and 11th place is page one vs obscurity.
Re:Well (Score:5, Informative)
This has been solved by distributed computing a long time ago, you simply get more than on worker to check the results and if anything looks fishy chuck away everything from that worker.
Not that this makes this any better of an idea.
Re: (Score:2)
Or you could get each search server to solve a small np-hard problem in real-time before serving its results.
You could call it "shitcoinfo" or "botsnot" or "captchayerknows" or "altacocker" or something.
Re: (Score:3)
Re: (Score:3)
...And if 10% of your workers are all part of the same botnet deliberately trying to skew the results, then there's about a 10% chance that the person re-checking the results will be giving you the same "error".
Re:Well (Score:5, Insightful)
The great thing about centralised search engines is that they're not gamed... oh wait...
A p2p search engine will have different problems. But in the limit perhaps it'll be like a load of Google or whatever servers sitting around the Internet instead of in one or two datacentres.
Re: (Score:3, Interesting)
At least it actually is in the interest of search providers like Google, Yahoo and Microsoft to produce useful results in order to achieve / maintain a large userbase.
Not so much in the interest of somebody who simply sees a distributed search engine as his chance to drive fews to his blog / ad collection / malware site.
Re:Well (Score:5, Insightful)
Re: (Score:2)
But Google has a lot of servers around the internet, not just in one or two datacenters, so basically your pie-in-the-sky best-case scenario for this alternative is that it might, if everything goes well, end up being just like Google.
Which is great, but if I want something just like Google, I can, you know, just use Google.
Re: (Score:2)
Assuming you regard Google as the best possible search engine with no room for improvement.
As was made clear at the end of the 19th century, anything that could possibly be invented already has been, so we don't need to bother trying any more.
Re:Also (Score:5, Insightful)
Google actively fought censorship in China more than any company on the planet. They put servers in Hong Kong that weren't required to censor results, and any page that was censored, Google made sure to state explicitly on the page that the content was censored so that people knew it.
In the end, China changed their laws and forced Google to comply. At that point they either had to pull out of China completely, or comply with laws. While some would contend that the high road is to pull out of China, but at the same time, you can't make in roads and try to effect change if you're not in the country at all.
Re: (Score:3)
But the blockcade won't do anything. You'd just force 100% adoption of Baidu by 1.3 billion people so that everything they see would be through the filtered eyes of the government.
At the very least, now that Google is forced to comply with the laws they are still the only ones who plainly put on the page that the search results were censored. They're informing the public that the government is keeping things from them.
Got to get off my lazy butt... (Score:4, Interesting)
Re:Got to get off my lazy butt... (Score:4, Insightful)
Just goes to show ideas are a dime a dozen
Exactly, and that's the reason the patent system only works for lawyers these days.
Re: (Score:2)
Re:Well (Score:5, Insightful)
This system probably solves spam the same way Freenet managed to eliminate it from its boards: by adopting a(n anonymous) Web Of Trust model. In practice, you'll only see results coming from those you trust directly or indirectly. The fake results will be there, but buried.
And even if they currently don't do that due to the smallness of the network, at some point they will. It's unavoidable.
Although the problem then might become you only seeing what you like because your friends/trusted nodes all think more or less the same, hence basically shielding yourself from different views. But then, mainstream search engines already do something like this, so it won't be that different from what we already have.
Re:Well (Score:5, Insightful)
Freenet solves the spam problem by ensuring that nobody actually uses Freenet. I think this project will apply the same solution.
This scheme has pretty slim chances of success. Which doesn't necessarily mean it shouldn't be attempted.
Re: (Score:2)
Result: Search results will be controlled by botnets
Nope, search results will be controlled by geeks. Result? 15K hits on Pikachu cosplay girl searches, zero on Project Runway.
Re:Well (Score:5, Interesting)
The whole "portal only as an afterthought demo" seems to me a huge flaw as well. You think your average person is going to install this on their computer just so they can do web searches? Not-going-to-happen. People who want to run it, will. People who don't or don't know how, won't. They're the 99.99%. They need a portal. Clients should automatically be putting themselves in the portal-switching queue.
As for the capabilities, I just tried it out [yacy.net]. The results are *extremely* few and very poor. "Dog" gets five hits, for example. You'd almost think it was a joke. Hopefully this was a load problem or a problem due to a lack of scaling in the system thusfar, and not a design flaw.
At least their frontend doesn't seem designed with injection in mind. Start off a search with ' (such as 'Test) and watch what happens to the peer listed at the bottom of the page. I doubt that particular issue is exploitable, but if this a habit of one of their coders...
Re: (Score:3)
I tried my standard search engine test (how hard is it to find the web page for the Hilton hotel in Paris?), and it failed miserably: "Paris Hilton" didn't get a single result, and neither did any other variation I tried.
Re: (Score:2)
Someone should ask Google to run a YaCy server! :-)
Re:Well (Score:5, Funny)
Instead of insight, comment contained bobcat. Would not read again.
Question (Score:4, Insightful)
Will one client be able to view the queries of its peers?
If yes, how is that an improvement?
If no, how does it work?
Re:Question (Score:4, Interesting)
Will one client be able to view the queries of its peers?
If yes, how is that an improvement? If no, how does it work?
From TFA: [yacy.net]
It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.
However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?
Re: (Score:2)
From TFA: [yacy.net]
It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.
Providing noone modifies the open source code to log user search requests and censor queries
Re: (Score:2)
From TFA: [yacy.net]
It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.
Providing noone modifies the open source code to log user search requests and censor queries
I'd be more concerned with some people stacking search results with links to spoof sites or malware servers.
Is this proofed against someone reverse engineering it and crap-flooding the results?
Re: (Score:3)
From TFA: [yacy.net]
It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.
However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?
The network stores everything.
Re: (Score:2)
And we all know that noone will ever modify their portion of the decentralized system to do any of these awful things....
Re: (Score:3)
Because, you know, I'm sure that YaCy is totally and absolutely 100% efficient about things. Every peer obviously has a list of URLs that it is responsible for, and every peer is capable of censoring anything on its list, and there will never be more than 1 copy of any shred of data.[/sarcasm]
Except it doesn't really work that way, as since nobody is in charge, nobody can dictate who will index what. You can censor the data on your own node and you'll certainly be successful (it's your computer, after all
Great (Score:5, Funny)
Awesome...
Comment removed (Score:5, Funny)
Re: (Score:2)
I just searched on my local YaCy install for tentacle hentai.
I somehow ended up on the front page of Daily Kos.
great stuff (Score:3)
Re: (Score:2)
It's hard to argue with "free" and "freedom", so I give it the thumbs up. But in this day and age it feels like going from a Ducati Panigale to a 1950's Triumph Bonneville.
Lots of people said basically the same thing back when the linux kernel was still numbered 0.9x.
Re: (Score:2)
It's hard to argue with "free" and "freedom"
I may differ from many readers in this opinion, but I happen to think it's very easy to argue with "free" and "freedom" if by doggedly sticking with dogmatic principles you end up taking giant leaps backward.
Ummm (Score:5, Insightful)
Re:Ummm (Score:4, Funny)
Yahoo's search engine IS Bing.
And Bing's search engine is Google [slashdot.org].
Come FLOSS Devs, We Need Better Names! (Score:5, Insightful)
Re: (Score:3)
Because most names are taken and they don't have a legal team to do research.
Re: (Score:2)
Re: (Score:2)
Why would you need a legal team? USPTO offers an online search for trademarks.
Re: (Score:2)
Trademarks don't need to be registered with the USPTO in order to be enforceable and actionable.
A mark need only be used in trade and -- zing! -- it's a trademark. Registering just makes it easier if/when things get ugly enough that a court gets involved, and makes it easier for others to avoid infringement in the first place.
This aspect of a trademark is a lot closer to copyright than it is to patents. Unlike patents, neither copyrights nor trademarks must be registered with a central body, although both
Re:Come FLOSS Devs, We Need Better Names! (Score:5, Insightful)
+1 Mod parent up.
Seems the geeky crowd still doesn't understand that marketing DOES play a critical role in the popularity of any type of project. "YaCy" really does suck- it is impossible to say, isn't a word, introduces strange capitalization, and it is not even easy to remember.
Re:Come FLOSS Devs, We Need Better Names! (Score:5, Insightful)
So fork it, changing only the name, and release it yourself under a more marketable moniker. The technical aspects of doing this are easy.
And if you think selecting a catchy, unencumbered name is also easy, then you really shouldn't have any problem pulling it off.
It's all GPL, so you can pretty much do what you want with it. If you really want to be in charge of marketing and distribution for a GPL project, the only thing stopping you is you.
Re: (Score:2)
Re: (Score:2)
I'd go with "Yucky." Self deferential. (What, like Yahoo! or Bing are awesome?) Or maybe Yoggyso, if they can get away with it. (What, like Google makes sense?)
Yahtzee (Score:5, Funny)
I assumed it was intended to be pronounced like Yahtzee, which is both memorable and quite descriptive of the quality of results you can expect.
Re: (Score:2)
Re:Come FLOSS Devs, We Need Better Names! (Score:5, Funny)
1) FreEble
2) !!_//[%%%
3) Bing
3) xkCQQT
GIMP is another example. Great program (Score:4, Insightful)
GIMP is another example. Great free graphics program, terrible name.
Cool, but what's in it for the peers? (Score:5, Interesting)
Doesn't need to be any formal system. Free software, for example, seems to be based more on the honour system than anything else, but people do develop free software because there's something in it for them - software tailored to their needs. What is the incentive for being a search peer?
Re:Cool, but what's in it for the peers? (Score:5, Interesting)
Comment removed (Score:5, Interesting)
Re:Java... (Score:5, Interesting)
Ugh, yeah. Another cool project is going to be held back by Java.
Way back, this happened with Freenet. I thought it was a cool idea, but the darn thing wasn't happy with all the 256MB I could give it. Even now, Java is still a considerable load on laptops with 4GB RAM.
I think that for best adoption they should have concentrated on making it small and light. If it can be run in say, 64MB RAM then you can install it anywhere. And it's quite likely that a good part of why Freenet was so horrible when I tried it, is because it made a lot of the machines it ran on swap like crazy.
Re:Java... (Score:4, Interesting)
cool project is going to be held back by Java.
You know, I'll take "cool projects held back by Java" any time over equally cool projects written in C that need to be patched 5 times a year for the next 10 years because of sloppy programming leading to arbitrary remote code execution vulnerabilities. Please, just let software written in C die with dignity, the language had its decades of glory before everything was accessible over the 'net ...
Re:Java... (Score:5, Informative)
...instead, you have to update the JRE about that often because of sloppy programming leading to arbitrary remote code execution vulnerabilities.
The JRE is currently the #1 malware vector, even above Flash and Acrobat.
Re: (Score:3, Funny)
Comment removed (Score:5, Interesting)
Re: (Score:3)
That is really stupid of you.
Let me see how the facts are:
Firefox with a few addons and 9 tabs: 180MB RAM. Eclipse with a lot of projects open: 200MB RAM.
At least with a Java Application I can just download it and run it on my Linux and Windows computers. It would be really nice if more applications would leave the Windows-monoculture, like from companies that owe their very existence to open source systems like Google (Google Sketchup is still not available for Linux and probably never will be).
Re: (Score:3)
Re: (Score:3, Insightful)
Not evil, no... but annoying as fuck, yes.
I've yet to see anything written in Java that didn't seem bloated, slow, and annoying.
No control over disk usage (Score:5, Interesting)
This whole concept seems quite fascinating/interesting. Ironically, two questions came to my mind immediately:
1) How much bandwidth does this take?
2) How much disk space does this take?
Neither question is answered on their FAQ ( http://www.yacy-websuche.de/wiki/index.php/En:FAQ [yacy-websuche.de] ), although they addressed the disk space issue thus: "Can I limit the size of the indexes on my hard-drive? For the moment no. Automatically limiting that size would mean having to delete stored indexes, which is not suitable. "
Yikes! I am not sure how many people will want to run a local YaCy client when there is no control over how much disk space it uses (or, apparently, bandwidth). It still has a lot of promise, though.
Re: (Score:2)
Disk quotas or separate file systems are a simple solution to this problem. Just takes a little more work than a line in a config file.
Re: (Score:3)
I wonder what happens when the thing runs out of space? If you can't set how much it uses, then how are we to know that it handles running out of space "gracefully"?
Also, you (presumably) and I are Linux users- so quotas, separate file systems, loopbacks, space checking, or whatever, are not rocket science. But that could be a lot more challenging for the people doing this on MS-Windows. Some users might be thinking they are "helping the world" by installing that app, then months later not understand why
Re:No control over disk usage (Score:5, Insightful)
Re:No control over disk usage (Score:5, Insightful)
Run it in a VM. limit its disk space and networking in one fell swoop.
Re: (Score:3)
3) What is to stop a malicious node in the network from getting my search history?
All of their claims about privacy seem to be implementation details of their code (which, being open source, is trivial to modify). They don't tell me how they designed the protocol to be avoid someone modifying the code to record searches or even to inject phishing sites into the top lists.
Re: (Score:2)
Yikes! I am not sure how many people will want to run a local YaCy client when there is no control over how much disk space it uses
Hasn't stopped Microsoft - Have you SEEN the size of C:\Windows\winsxs?? (AKA Window 7's fatal flaw) And they have no plan to do anything about it, and there's nothing you can do about it. You can't move it, you can't delete obsolete files from it, it just slowly fills any partition you put it on. (Filling your boot drive is "By Design" according to Microsoft.)
Re: (Score:2)
The index you can keep on your own hard disk and that of your direct peers is always going to be tiny compared to the indexes Google and Bing et. al. have. That in itself is an issue. Add to that the problem of finding and ranking results that come from a highly fragmented database and doing so at a good speed and I don't see it take off any time soon.
I'm not seeing why this should be tried. (Score:4, Interesting)
Re: (Score:2)
It doesn't matter what kind of security team they have.
Even if you LOVE it, the name is so bad you can't even tell anybody about it. Hell I already forget how to spell it and I just saw it 2 seconds ago.
Re: (Score:3)
And it's likely going to be as slow, as so many servers on so many different (and often relatively slow) connections have to be queried. Sorry but I don't like waiting for search results for more than a second or so, when Google provides them almost instantly.
Google sets the standard, that's what you have to beat. So yes the bar to get into the search engine market is really high, and not many players will be able to give it a go with much chance for success.
Needs more work (Score:3)
So, I tried the portal and searched for slashdot.
1. geek.net ...
2. slashdot tags
3. ostg.com
4. slashdot.org/favicon.ico
main page nowhere to be seen.
Second try, antirely different results: ...
1. microsoft.slashdot.org
2. slashdot.org
Seems very erratic so far. Then maybe it needs some time to stabilize a bit.
Re: (Score:2)
I installed the software and went into the local administration, with a few clicks (it isn't quite as intuitive as I'd like it to be) I was able to set up the web crawling functions to bring in my favorite site. There are several limits that can be put onto that crawl, but the main point is that you can add sites to the search, and they show up when other peers are performing queries.
It will be interesting to see how this software performs. It seems about as good as Lycos was back in the early 1990's, so
Cannot find the binary download link? (Score:2)
What about people who want to join but don't run their own compilers? You know, those people exist.
Re: (Score:2)
The platform-specific stuffs are there, but where's the .Jar?
Re: (Score:2)
You know, those people exist.
I think the idea is that those kind of people wouldn't be interested in this kind of a project.
False assumption. (Score:2)
"As is often the case in the early stages of a new technology, results are better on some topics than on others -- mainly computer-related issues."
Uh, no. Google became a search juggernaut because it provided better results. Otherwise there would be no motivation to switch from Yahoo. And, since this solves a problem most people don't care about, it's doomed.
OMG (Score:2)
Oh my, Google is so dead. DEAD!
It'll be just like when Diaspora totally stomped Facebook!
Re: (Score:2)
It'll be just like when Diaspora totally stomped Facebook!
Or when GNU/Hurd started cutting into Linux's...
Sorry, I tried, I really did - but I can't keep from laughing.
In 1996 this was done ... (Score:5, Informative)
... by the Harvest Project, which installed several local data collectors, and which then added a search engine over all those collectors. The cache system added in between is still known today: Squid.
http://en.wikipedia.org/wiki/Harvest_project [wikipedia.org]
- Hubert
I tried... (Score:2)
I installed the server on my machine and gave it a shot. I made very classical request such as the name of a couple universities, a couple famous website and made a few regular queries like "chocolate mousse recipe". None of the request actually pointed to something even remotely close to what I was looking for. I thought it might need some bootup time, so I tried again an hour later. It was not much better. Just much slower. I'll try again in a few day. But that does not look good...
On top of that it looks
Good idea, but Yacy is basically useless trash (Score:3)
I could go on, but you get the idea. I would really like to see a usable peer to peer search engine. The Internet needs it. Yacy is not it. The idea is good, the implementation can best be described as EPIC FAIL.
Absolutely not news! (Score:2)
Meh (Score:2)
I downloaded it and gave it a try, but I'm going to stick with DuckDuckGo [duckduckgo.com]. In my experience the results have been as good as or better than Google and if I don't find what I'm looking for, it also gives links to do the search in Google or Bing.
Re: (Score:2)
YaCy
Yay-See.
Sure. Unpronounceable.
--
BMO
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Siffy. Short for syphilis.
Re: (Score:2)
Re: (Score:3)
Cool. "Therefore, more complex ranking algorithms such as those used by Google (which analyze rank using a variety of contextual factors developed during webspidering) are not available in YaCy, placing severe limits on most users' means to retrieve the results they seek. For instance, none of the top 10 results returned by YaCy's public search when queried "Google" actually refer to Google's homepage."
Re: (Score:3)
Yahoo + Cyborg
Re: (Score:2)
Yaw-Sigh? Ewww...
Re: (Score:2)
Yaw-Sigh? Ewww...
Yahoo, not Yawhoo.
Re: (Score:2)
Re: (Score:3)
Where does "ach" come into it? "Yah" sounds exactly like "yar", as in what pirates say, which rhymes with "jar" and "far" and "ahh" and "pa", while "yaw" sounds exactly like "yore", which rhymes with "paw" and "poor" and "door" and "more". "Ah" vs "or".
At least that's how we pronounce those letters here in the Antipodes.
Re: (Score:2)
"Yaw" doesn't sound like you're trying to write out /ja/, but this word [wiktionary.org]. (And if Slashdot actually supported the thirteen-year-old technology that is UTF-8, I could just paste the IPA.)
Re: (Score:2)
"Yaw" is actually a word (motion around a vertical axis) pronounced differently from "Ya/Ja" you said. Sounds more like the common "aww" (think puppies) with a Y at the beginning.
Re: (Score:2)