"Understanding" Search Engine Enters Public Beta 192
religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.
I'm Unimpressed (Score:5, Interesting)
But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers:
So maybe it can't understand 'bad guy.' Well onto another question:
So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!
I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts
I find them talking about this in the articles:
So does this story actually have more than a startup looking for a sugar daddy to buy it out?
Yawn. Here is something really impressive... (Score:5, Interesting)
Re:I'm Unimpressed (Score:4, Interesting)
Heck, if Powerset is just watching what links people click on more often (Google does) then even that can help provide a training set for its algorithm. Using that kind of training set would make it vastly easier to figure out whether a change in the algorithm would be an improvement or not. That's priceless data and I hope they'll use it wisely.
But, really, just remember that this is the first in a new breed of search engines. It won't be the last, by any means:
-Search 0.9 was using the meta and description tags on a page to index (see Altavista). It broke when spammers figured out the algorithms.
-Search 1.0 was using the text of inbound links to index (see Google). It doesn't know what the text means, it just knows that it has a bunch of keywords. It's breaking as people start to game their Google search results [reputationdefender.com].
-Search 2.0 will try to find meaning in the web and understand what a page is really saying (see Powerset).
I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress. Just because Powerset isn't perfect doesn't mean we should give up on the whole venture.
Re:Jargon pisses me off... (Score:3, Interesting)
Re:I'm Unimpressed (Score:4, Interesting)
who is david bowie?
en.wikipedia.org/wiki/David_Bowie
en.wikipedia.org/wiki/David_Bowie_(album)
www.bowiewonderworld.com/
Result in the first three. Well done.
Who played the villain in the first Die Hard?
www.imdb.com/title/tt0095016/
www.emanuellevy.com/article.php?articleID=6136
wrestlingclassics.com/.ubb/ ultimatebb.php?ubb=get_topic;f=1;t=085316
Result in the preview of the second only. Why they include a wrestling site though is beyond me.
Who played the bad guy in the first Die Hard?
www.imdb.com/title/tt0095016/
www.imdb.com/title/tt0337978/usercomments
www.empiremovies.com/movie/live-free-or-die-hard-/13109/review/01
A lot of drivel, no name in the previews.
Who was the organist for The Beatles on Abbey Road?
paulmcgarry.com/cdcatalogue/details/5808.html
www.beatles.ws/1969.htm
www.sonicstate.com/news/shownews.cfm?newsid=4860
First two, well done.
It's interesting that Google and PowerSet are completely equivalent when your test data is available in Wikipedia. Now of course PowerSet is only searching Wikipedia, while Google has 8000(?) times more data, so it's not clear what is being tested.
But what's strange is that Wikipedia and IMDB are returned so often. With all the hype about their huge index, I'd expect Wikipedia or IMDB to be rarely the best source in most cases, since more authoritative data is bound to be available to Google, kind of like the Abbey road example.
Re:I'm Unimpressed (Score:3, Interesting)
Not to mention non-English Wikipedias, which contain a good deal of information not available in the English one.
But it doesn't give results any differently (Score:5, Interesting)
Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
Re:There is a reason query languages exists. (Score:2, Interesting)
Every so often, I find myself wanting to use them natural language in google. Like today I wanted to find out about the symptoms of a codeine histamine reaction. Sure, I could search for 'codiene', read about it and follow links (on no doubt, wikipedia) until I find what I want - but being able to search with "What are the symptoms of codiene histamine reactions?" is quite powerful.
Although, to be honest I'd prefer to be able to search google with regex and hashes (like search for all pages/images that have a certain MD5 hash).
Natural languages are not a help. (Score:5, Interesting)
1 + 1 = 2 is a special notation/langauge that is both more consise and easier than writing "add one and one to make two". So is music score, which is far easier than reading make a high note for a bit then wait a bit and make a low note". Same with C, C++, SQL or Python: the hard bit in programming is algorithm design, not understanding the actual language itself.
Is Natural language really a barrier to entry in using Google? I doubt it. My untechy wife and her friends find everything they need. Plugging natural language into Google gives reasonable results moset of the time.
Re:I'm Unimpressed (Score:5, Interesting)
I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress.
Actually, we aren't making progress -- *at all*. What these guys are trying to do is a subset of artificial intelligence. A subject people have banging their heads against since the 1940s, and we've made *zero* progress since then. We simply don't know how humans process information. We don't even have reasonable theories. We're at the equivalent of the "four elements make up the world" version of physics.
AI researchers always get defensive when I say this, but it's simply true. All we have are better brute-force algorithms that sort-of simulate some of the things that humans do (i.e., voice recognition, character recognition, and other yawner tricks). There is no science of AI. Any sort of human-level understanding of information is far, far away in the future.
Needs some work. (Score:3, Interesting)
So I tried to search for the person who quoted, "What doesn't kill you only makes you stronger.". The search text was "Who said, "What doesn't kill you makes you stronger?"
Google returned the closest match, who was Frederich Nietzsche, with several websites pointing to him. However, Powerset returned only instances of people who randomly said that quote. Google returned what I was looking for, while Powerset returned instances of the phrase (including one reference to Nietzsche).
I can't really say which one is better. Google has the entire web to its advantage, while Powerset is just growing. It seems that the search engine has a lot of potential to grow, which is great as Google and company could use another competitor in the mix.
It's about as good as Ask Jeeves. Maybe worse. (Score:5, Interesting)
I've been trying various queries, and Google is doing better than Powerset even when I type in some actual question, like "How many Japanese died in WWII?".
Question: "What is the planet closest to the sun?". First answer from Powerset: "Pluto".
I think I see how this works. It takes the question and breaks it at noise words, ("closed class words" in linguistic terminology) constructing a query with both words and phrases. So "What is the planet closest to the sun" becomes "planet closest" sun. In fact, if you rewrite a natural language question in that form and use Google, it does better on question-answering than Powerset does.
Remember Ask Jeeves? It worked like that? No technical breakthrough here, move along.
Re:I'm Unimpressed (Score:5, Interesting)
(Background: In 1966, some MIT computer science faculty thought AI was so easy that computer vision could be solved in one summer worth of work; it probably took 35 years to reach the milestones identified in the research abstract).
Re:I wonder how long... (Score:5, Interesting)
Re:It's about as good as Ask Jeeves. Maybe worse. (Score:1, Interesting)
Solar System
Mercury (0.4 AU) is the closest planet to the Sun and the smallest planet (0.055 Earth masses).
Powerset _is_ actually doing parses and semantic constraints, but it it's obviously not perfect.
Re:I'm Unimpressed (Score:3, Interesting)
Re:There is a reason query languages exists. (Score:2, Interesting)
Google has gone downhill (Score:1, Interesting)
Actually, am I the only one who thinks that google's results are worse now than they were years ago? It's still the best general search engine out there, but it often gives me results I don't want now, forcing me to put plusses in front of every word or quoted phrase just to make it actually search for what I asked for.
Re:I'm Unimpressed (Score:2, Interesting)
searching (and also on the look of your website - I love that blue!). How is
this different from ask.com [ask.com] though (Powerset's
search didn't give me an answer to that).
Re:I'm Unimpressed (Score:2, Interesting)