Slashdot Log In
"Understanding" Search Engine Enters Public Beta
Posted by
kdawson
on Mon May 12, 2008 09:53 PM
from the do-what-i-mean dept.
from the do-what-i-mean dept.
religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
My first search (Score:4, Funny)
Re:My first search (Score:4, Funny)
Parent
I'm Unimpressed (Score:5, Interesting)
But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers:
So maybe it can't understand 'bad guy.' Well onto another question:
So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!
I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts
I find them talking about this in the articles:
So does this story actually have more than a startup looking for a sugar daddy to buy it out?
Re:I'm Unimpressed (Score:5, Informative)
Parent
Re: (Score:3, Insightful)
So even for the tailor made, best-case examples, google seems to be quite on par.
Re:I'm Unimpressed (Score:4, Interesting)
Heck, if Powerset is just watching what links people click on more often (Google does) then even that can help provide a training set for its algorithm. Using that kind of training set would make it vastly easier to figure out whether a change in the algorithm would be an improvement or not. That's priceless data and I hope they'll use it wisely.
But, really, just remember that this is the first in a new breed of search engines. It won't be the last, by any means:
-Search 0.9 was using the meta and description tags on a page to index (see Altavista). It broke when spammers figured out the algorithms.
-Search 1.0 was using the text of inbound links to index (see Google). It doesn't know what the text means, it just knows that it has a bunch of keywords. It's breaking as people start to game their Google search results [reputationdefender.com].
-Search 2.0 will try to find meaning in the web and understand what a page is really saying (see Powerset).
I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress. Just because Powerset isn't perfect doesn't mean we should give up on the whole venture.
Parent
But it doesn't give results any differently (Score:5, Interesting)
Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
Parent
Re:But it doesn't give results any differently (Score:4, Funny)
Funny, when I was a boy I asked my father the same thing and he gave me a few articles with pictures of women wearing string. My conclusion: It's amazing what can be done with just a few bits of string.
Parent
Re:But it doesn't give results any differently (Score:5, Funny)
Parent
Re:But it doesn't give results any differently (Score:5, Funny)
Powerset's first response? "Fuck."
Funny, that was my response too, but at least I got 5 or 6 of them first...
Parent
No, early Google was better than anything else. (Score:5, Insightful)
Parent
Re: (Score:3, Informative)
Then Google comes around. You search for something and you find a good result (or three) on the first page, which was rare on Yahoo etc. unless you were looking for something really basic.
Re:I'm Unimpressed (Score:5, Interesting)
I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress.
Actually, we aren't making progress -- *at all*. What these guys are trying to do is a subset of artificial intelligence. A subject people have banging their heads against since the 1940s, and we've made *zero* progress since then. We simply don't know how humans process information. We don't even have reasonable theories. We're at the equivalent of the "four elements make up the world" version of physics.
AI researchers always get defensive when I say this, but it's simply true. All we have are better brute-force algorithms that sort-of simulate some of the things that humans do (i.e., voice recognition, character recognition, and other yawner tricks). There is no science of AI. Any sort of human-level understanding of information is far, far away in the future.
Parent
Re:I'm Unimpressed (Score:5, Interesting)
(Background: In 1966, some MIT computer science faculty thought AI was so easy that computer vision could be solved in one summer worth of work; it probably took 35 years to reach the milestones identified in the research abstract).
Parent
Re: (Score:3, Insightful)
Human brains have the computing power of a modern supercomputer and possibly a lot more of it, optimized for some specific applications such as data parsing/pattern matching. AI has had to for the past 40 years create
Re: (Score:3, Interesting)
Re:I'm Unimpressed (Score:4, Insightful)
Personally, I am impressed by the arguments advanced by the likes of Penrose and Hameroff, that "intelligence" (in the sense that we use the term wrt. humans) is a quantum phenomena.
Eh, that's just a "God in the gaps" argument. We don't know how it works, therefore, it must require something supernatural to make it work. The physicality of the brain has more than enough "throw your hands up in despair" complexity to explain intelligence.
Parent
Re:I'm Unimpressed (Score:5, Funny)
The current mayor is Jardir Silva Vidal who won the election in 2004 against Reino Martins de Oliveira
Parent
Re:I'm Unimpressed (Score:5, Insightful)
Terrorists!
Parent
Re: (Score:3, Informative)
Obviously still buggy. (Score:5, Funny)
Who is David Bowie? I trust that it came back with, "aka Ziggy Stardust, normal family guy"
Who played the villain in the first Die Hard? Well, obviously, the villain is "capitalism."
Billie Jean King and Madonna
Who was the organist for The Beatles on Abbey Road?
You had it at "organ," and it got distracted. What they need is some dev guys from Toledo to collaborate, and provide a little cognitive counterweight to the understanding engine. OK, maybe not Toledo. Maybe Atlanta.
Parent
Re:I'm Unimpressed (Score:4, Interesting)
who is david bowie?
en.wikipedia.org/wiki/David_Bowie
en.wikipedia.org/wiki/David_Bowie_(album)
www.bowiewonderworld.com/
Result in the first three. Well done.
Who played the villain in the first Die Hard?
www.imdb.com/title/tt0095016/
www.emanuellevy.com/article.php?articleID=6136
wrestlingclassics.com/.ubb/ ultimatebb.php?ubb=get_topic;f=1;t=085316
Result in the preview of the second only. Why they include a wrestling site though is beyond me.
Who played the bad guy in the first Die Hard?
www.imdb.com/title/tt0095016/
www.imdb.com/title/tt0337978/usercomments
www.empiremovies.com/movie/live-free-or-die-hard-/13109/review/01
A lot of drivel, no name in the previews.
Who was the organist for The Beatles on Abbey Road?
paulmcgarry.com/cdcatalogue/details/5808.html
www.beatles.ws/1969.htm
www.sonicstate.com/news/shownews.cfm?newsid=4860
First two, well done.
It's interesting that Google and PowerSet are completely equivalent when your test data is available in Wikipedia. Now of course PowerSet is only searching Wikipedia, while Google has 8000(?) times more data, so it's not clear what is being tested.
But what's strange is that Wikipedia and IMDB are returned so often. With all the hype about their huge index, I'd expect Wikipedia or IMDB to be rarely the best source in most cases, since more authoritative data is bound to be available to Google, kind of like the Abbey road example.
Parent
Re: (Score:3, Interesting)
Not to mention non-English Wikipedias, which contain a good deal of information not available in the English one.
Re:I'm Unimpressed (Score:5, Informative)
Powerset is not token matching. In fact, we read every sentence from every page in Wikipedia that we index. For examples of how we understand syntax, check out queries like "who did texaco acquire" vs. "who acquired texaco". Note that Powerset understands the difference between being acquired by and acquiring, that "buying" is equivalent to "acquiring", and that we are often able to highlight the actual answer to your question. Traditional search engines can do none of these things. Powerset is trying to match the meaning of your query to the meaning of a sentence in Wikipedia.
However, Powerset is very aware that: 1) Users shouldn't be expected to use natural language and 2) We only search Wikipedia and 3) Our algorithms aren't perfect yet. Powerset's release isn't intended to replace your regular keyword search engine. But, we do hope that you come back to Powerset when you have a question that might be answered in Wikipedia.
So, try some topical queries in Powerset, like "kurt godel." In the Factz section, Powerset knows that Kurt Godel proved theorems. If you click on "theorems," you'll see all the sentences in Wikipedia from which we derived that fact (be sure to click on "more"). Note that none of these Factz come from the Kurt Godel page. Powerset's ability to aggregate Factz from across Wikipedia is unique to our technology.
Now try, search for the Presidency of Bill Clinton and click through to the enhanced Wikipedia page (http://www.powerset.com/explore/semhtml/Presidency_of_Bill_Clinton?query=presidency+of+bill+clinton). Note that we also have Factz in the article outline, which helps to summarize long articles. Check out the second term during the Lewinsky affair: the Factz are an amazingly accurate description of the situation.
Sorry to be a bit lengthy, but I wanted to make it clear the Powerset isn't just about asking questions. We've got a video that identifies all of the features: http://vimeo.com/994819
{mark} powerset product manager
Parent
Re: (Score:3, Funny)
Next step.... (Score:5, Funny)
Re:Next step.... (Score:5, Funny)
Parent
The Web is currently at least 8,000 times as large (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:3, Funny)
Please send this fine fellow your password for future posts.
Jargon pisses me off... (Score:4, Funny)
Re:Jargon pisses me off... (Score:5, Insightful)
Parent
Re: (Score:3, Interesting)
Re:Jargon pisses me off... (Score:5, Informative)
Parent
Re:Jargon pisses me off... (Score:5, Funny)
Grokgrokgrok.
Pics or it didn't happen.
Parent
Personally I hate all made up words (Score:4, Funny)
grok is just the beginning.
I hate all made up words. Database, modem, gigabyte, daemon, ethernet... they all suck. And the word suck sucks, too. Bring me back to the days when we all communicated with grunts, before all of this linguistic b.s. started.
Parent
Yawn. Here is something really impressive... (Score:5, Interesting)
2 out of 10 (Score:5, Informative)
First match was an obscure album, then a few "factz" that made no sense.
Let's try again, "What is the largest city in Japan?"
Tokyo doesn't feature at all on the first page! It fairs just as badly with other countries.
It now seems to be slashdotted, so I better quit now.
There is a reason query languages exists. (Score:5, Insightful)
Natural languages are not a help. (Score:5, Interesting)
1 + 1 = 2 is a special notation/langauge that is both more consise and easier than writing "add one and one to make two". So is music score, which is far easier than reading make a high note for a bit then wait a bit and make a low note". Same with C, C++, SQL or Python: the hard bit in programming is algorithm design, not understanding the actual language itself.
Is Natural language really a barrier to entry in using Google? I doubt it. My untechy wife and her friends find everything they need. Plugging natural language into Google gives reasonable results moset of the time.
Parent
Yeah right (Score:5, Insightful)
What a marketing pile-of-poop. All it does is pull out phrases from Wikipedia; there is no attempt to understand the information at all. When I can type in a yes/no question ("Did they have looms in the 1400s?"), I'll be impressed. When it can make calculation ("How old was columbus when the first colony was founded?"), I'll be impressed. When it can make comparisons ("when did the earth's population match the current population of the united states?"), I'll be impressed.
In other words, when it even attempts to answer a question that isn't already in Wikipedia as a phrase, I'll be impressed.
Needs some work. (Score:3, Interesting)
So I tried to search for the person who quoted, "What doesn't kill you only makes you stronger.". The search text was "Who said, "What doesn't kill you makes you stronger?"
Google returned the closest match, who was Frederich Nietzsche, with several websites pointing to him. However, Powerset returned only instances of people who randomly said that quote. Google returned what I was looking for, while Powerset returned instances of the phrase (including one reference to Nietzsche).
I can't really say which one is better. Google has the entire web to its advantage, while Powerset is just growing. It seems that the search engine has a lot of potential to grow, which is great as Google and company could use another competitor in the mix.
It's about as good as Ask Jeeves. Maybe worse. (Score:5, Interesting)
I've been trying various queries, and Google is doing better than Powerset even when I type in some actual question, like "How many Japanese died in WWII?".
Question: "What is the planet closest to the sun?". First answer from Powerset: "Pluto".
I think I see how this works. It takes the question and breaks it at noise words, ("closed class words" in linguistic terminology) constructing a query with both words and phrases. So "What is the planet closest to the sun" becomes "planet closest" sun. In fact, if you rewrite a natural language question in that form and use Google, it does better on question-answering than Powerset does.
Remember Ask Jeeves? It worked like that? No technical breakthrough here, move along.
I wonder how long... (Score:3, Insightful)
...it will take Google to buy out the company for an obscene amount and incorporate anything even slightly better than PageRank into their system.
Re:I wonder how long... (Score:5, Interesting)
Parent
Oh man. it's down. (Score:4, Funny)
Why this is a freaking bad idea (Score:3, Insightful)
Search and information retrieval is art and science. I work in the field and let me tell you that if I had a cent for every "make it work like Google" statement, I would retire somewhere in Malibu. Users, in my case they are not end users but integrators, always want to put responsibility on something else but themselves. Until we get people who can actually say "yes, we are responsible for this," we won't get too far with any search engine no matter how complex and cool it is.
People are constantly asking questions about why it takes some time to insert a record into an engine that has 50 million documents and why a query *1*2*3* does not bring back any meaningful results (Google treats it like an arithmetic expression and gives you a '6' while many users expect '*' to be a wildcard). Then we have people who are not able to understand a precise query language that has a grammar and a set of rules you can't really fuck up. Now you give them an engine that can understand natural language and everybody in R&D and QA will soon go ape shit from all of the questions like, "I do know not to speak Inglish and engine is working but not corectly. Fix?" I am dead serious about this. Give people something genius and watch a handful of fools cause heart attacks across the search engine team.
If you want to do something for you and your end users, learn how to ask correct questions in order to get correct answers. In the 21st century skills like keyboarding and being able to use a search engine are almost essential to one's survival. While I encourage all academic research possible in the field of information retrieval, I highly suggest people with extra money to put their ideas toward usability. Make things simple, make things precise and let users figure out the rest. Once we get to the point where everybody can make a semi-decent query, we'll move to natural language processing.
Finally, a definitive answer! (Score:3, Funny)
Thoughtpuckey (Score:5, Insightful)
Impressive (Score:4, Funny)
A: Did you mean 'What the hell is a fact?'
Quite