Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Businesses The Internet

"Understanding" Search Engine Enters Public Beta 192

religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.
This discussion has been archived. No new comments can be posted.

"Understanding" Search Engine Enters Public Beta

Comments Filter:
  • I'm Unimpressed (Score:5, Interesting)

    by eldavojohn ( 898314 ) * <eldavojohn@gma[ ]com ['il.' in gap]> on Monday May 12, 2008 @10:55PM (#23387190) Journal
    Ok, so I like these new search engine ideas but I am grossly underwhelmed here. I tried the input:

    Who is David Bowie?
    Which it handled quite nicely. Biography, additional links and all that Wikipedia jazz.

    But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers:

    Who played the villain in the first Die Hard?
    Which at least put Alan Rickman at #8 [powerset.com]. But let's try mutating that to make it harder but still understood by you and I:

    Who played the bad guy in the first Die Hard?
    Which resulted in very little but drivel [powerset.com] with no mention of the great Alan Rickman whatsoever ... although it did put Billie Jean King and Madonna in there for some hilarious reason.

    So maybe it can't understand 'bad guy.' Well onto another question:

    Who was the organist for The Beatles on Abbey Road?
    Which resulted in at least the first 20 having no mention of the great & oft forgotten Billy Preston [powerset.com].

    So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!

    I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts ... no matter how flawed it might be.

    I find them talking about this in the articles:

    Powerset is different. It says that its technology reads and comprehends each word on a page. It looks at each sentence. It understand the words in each sentence and how they related to each other. It works out what that sentence really means, all the facts that are being presented. This means it knows what any page is really about.
    Yet, I'm not impressed. You can try to personify your software and convince me that Baby Alive really defecates like a human being all over so it feels like I have a real baby. But I know it's just software. You don't have to dumb it down if you're going to blog about it. What is this? A pattern matching implementation? A depth first search tree parsing implementation? An ontology builder? Could you at least drop one of the buzzwords of the natural language parsing field for me here?

    So does this story actually have more than a startup looking for a sugar daddy to buy it out?
  • by Sanity ( 1431 ) on Monday May 12, 2008 @11:04PM (#23387244) Homepage Journal
    True Knowledge [trueknowledge.com] actually interprets your question using Natural Language Processing, and then looks through a massive database of user-contributed facts, combining them using sophisticated inference rules, to give you the answer you need. Even the inference rules are user-editable.
  • Re:I'm Unimpressed (Score:4, Interesting)

    by WaltBusterkeys ( 1156557 ) * on Monday May 12, 2008 @11:10PM (#23387308)

    Yet, I'm not impressed.
    Powerset is not an instant solution, it's a step in the right direction. Early Google wasn't perfect, but it got a lot better over time as the Pagerank algorithm was refined. Hopefully Powerset will show similar improvement over time.

    Heck, if Powerset is just watching what links people click on more often (Google does) then even that can help provide a training set for its algorithm. Using that kind of training set would make it vastly easier to figure out whether a change in the algorithm would be an improvement or not. That's priceless data and I hope they'll use it wisely.

    But, really, just remember that this is the first in a new breed of search engines. It won't be the last, by any means:

    -Search 0.9 was using the meta and description tags on a page to index (see Altavista). It broke when spammers figured out the algorithms.

    -Search 1.0 was using the text of inbound links to index (see Google). It doesn't know what the text means, it just knows that it has a bunch of keywords. It's breaking as people start to game their Google search results [reputationdefender.com].

    -Search 2.0 will try to find meaning in the web and understand what a page is really saying (see Powerset).

    I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress. Just because Powerset isn't perfect doesn't mean we should give up on the whole venture.
  • by Anonymous Coward on Monday May 12, 2008 @11:29PM (#23387414)
    We need a +1 Punny.
  • Re:I'm Unimpressed (Score:4, Interesting)

    by martin-boundary ( 547041 ) on Monday May 12, 2008 @11:31PM (#23387418)
    Since you didn't give the facts on your Google search, here they are, as of this comment's posting time:

    who is david bowie?

    en.wikipedia.org/wiki/David_Bowie
    en.wikipedia.org/wiki/David_Bowie_(album)
    www.bowiewonderworld.com/

    Result in the first three. Well done.

    Who played the villain in the first Die Hard?

    www.imdb.com/title/tt0095016/
    www.emanuellevy.com/article.php?articleID=6136
    wrestlingclassics.com/.ubb/ ultimatebb.php?ubb=get_topic;f=1;t=085316

    Result in the preview of the second only. Why they include a wrestling site though is beyond me.

    Who played the bad guy in the first Die Hard?

    www.imdb.com/title/tt0095016/
    www.imdb.com/title/tt0337978/usercomments
    www.empiremovies.com/movie/live-free-or-die-hard-/13109/review/01

    A lot of drivel, no name in the previews.

    Who was the organist for The Beatles on Abbey Road?

    paulmcgarry.com/cdcatalogue/details/5808.html
    www.beatles.ws/1969.htm
    www.sonicstate.com/news/shownews.cfm?newsid=4860

    First two, well done.

    It's interesting that Google and PowerSet are completely equivalent when your test data is available in Wikipedia. Now of course PowerSet is only searching Wikipedia, while Google has 8000(?) times more data, so it's not clear what is being tested.

    But what's strange is that Wikipedia and IMDB are returned so often. With all the hype about their huge index, I'd expect Wikipedia or IMDB to be rarely the best source in most cases, since more authoritative data is bound to be available to Google, kind of like the Abbey road example.

  • Re:I'm Unimpressed (Score:3, Interesting)

    by Rocketship Underpant ( 804162 ) on Monday May 12, 2008 @11:41PM (#23387466)
    And if Powerset really did parse and "comprehend" the content of each page (which it doesn't, judging by your trial searches), how would it deal with the significant number of error-ridden and unintelligible articles in Wikipedia?

    Not to mention non-English Wikipedias, which contain a good deal of information not available in the English one.
  • by spoco2 ( 322835 ) on Monday May 12, 2008 @11:47PM (#23387488)
    I asked 'Where do babies come from' and it just gave me back a bunch of articles with that string somewhere in their text.

    Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
  • by erikina ( 1112587 ) <eri.kina@gmail.com> on Tuesday May 13, 2008 @12:00AM (#23387556) Homepage
    They're very different. It's not expected that this natural language parsing will replace SQL (anytime in the foreseeable future).

    Every so often, I find myself wanting to use them natural language in google. Like today I wanted to find out about the symptoms of a codeine histamine reaction. Sure, I could search for 'codiene', read about it and follow links (on no doubt, wikipedia) until I find what I want - but being able to search with "What are the symptoms of codiene histamine reactions?" is quite powerful.

    Although, to be honest I'd prefer to be able to search google with regex and hashes (like search for all pages/images that have a certain MD5 hash).
  • by EmbeddedJanitor ( 597831 ) on Tuesday May 13, 2008 @12:00AM (#23387560)
    There is a fallacy that putting a ntaural language on something will make it easy. There are many specialised languages that people use every day.

    1 + 1 = 2 is a special notation/langauge that is both more consise and easier than writing "add one and one to make two". So is music score, which is far easier than reading make a high note for a bit then wait a bit and make a low note". Same with C, C++, SQL or Python: the hard bit in programming is algorithm design, not understanding the actual language itself.

    Is Natural language really a barrier to entry in using Google? I doubt it. My untechy wife and her friends find everything they need. Plugging natural language into Google gives reasonable results moset of the time.

  • I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress.

    Actually, we aren't making progress -- *at all*. What these guys are trying to do is a subset of artificial intelligence. A subject people have banging their heads against since the 1940s, and we've made *zero* progress since then. We simply don't know how humans process information. We don't even have reasonable theories. We're at the equivalent of the "four elements make up the world" version of physics.

    AI researchers always get defensive when I say this, but it's simply true. All we have are better brute-force algorithms that sort-of simulate some of the things that humans do (i.e., voice recognition, character recognition, and other yawner tricks). There is no science of AI. Any sort of human-level understanding of information is far, far away in the future.

  • Needs some work. (Score:3, Interesting)

    by MrCrassic ( 994046 ) <deprecated&ema,il> on Tuesday May 13, 2008 @12:17AM (#23387644) Journal

    So I tried to search for the person who quoted, "What doesn't kill you only makes you stronger.". The search text was "Who said, "What doesn't kill you makes you stronger?"

    Google returned the closest match, who was Frederich Nietzsche, with several websites pointing to him. However, Powerset returned only instances of people who randomly said that quote. Google returned what I was looking for, while Powerset returned instances of the phrase (including one reference to Nietzsche).

    I can't really say which one is better. Google has the entire web to its advantage, while Powerset is just growing. It seems that the search engine has a lot of potential to grow, which is great as Google and company could use another competitor in the mix.

  • by Animats ( 122034 ) on Tuesday May 13, 2008 @12:19AM (#23387650) Homepage

    I've been trying various queries, and Google is doing better than Powerset even when I type in some actual question, like "How many Japanese died in WWII?".

    Question: "What is the planet closest to the sun?". First answer from Powerset: "Pluto".

    I think I see how this works. It takes the question and breaks it at noise words, ("closed class words" in linguistic terminology) constructing a query with both words and phrases. So "What is the planet closest to the sun" becomes "planet closest" sun. In fact, if you rewrite a natural language question in that form and use Google, it does better on question-answering than Powerset does.

    Remember Ask Jeeves? It worked like that? No technical breakthrough here, move along.

  • Re:I'm Unimpressed (Score:5, Interesting)

    by WaltBusterkeys ( 1156557 ) * on Tuesday May 13, 2008 @12:24AM (#23387692)
    Wait, you're saying that the MIT summer vision project [mit.edu] wasn't as easy as people thought?

    (Background: In 1966, some MIT computer science faculty thought AI was so easy that computer vision could be solved in one summer worth of work; it probably took 35 years to reach the milestones identified in the research abstract).
  • by DavidD_CA ( 750156 ) on Tuesday May 13, 2008 @12:43AM (#23387796) Homepage
    I believe you've stumbled upon this startup's business plan.
  • by Anonymous Coward on Tuesday May 13, 2008 @01:59AM (#23388138)
    Looks like the form of the question made a difference in this case. "closest planet to the sun" returns as the first result:

    Solar System
    Mercury (0.4 AU) is the closest planet to the Sun and the smallest planet (0.055 Earth masses).

    Powerset _is_ actually doing parses and semantic constraints, but it it's obviously not perfect.
  • Re:I'm Unimpressed (Score:3, Interesting)

    by fsterman ( 519061 ) on Tuesday May 13, 2008 @02:10AM (#23388216) Homepage
    Uhh, 1940 and no progress? Are you nuts? Cognitive scientists didn't theorize basic semantic networks until 1966, let alone artificial neurons. And no, that isn't just more brute forcing, yeah it is a *lot* more computation, but it's a completely different angle of attack than parsing sentence structure and swapping out words.
  • by TreeLuvBurdpu ( 1288430 ) on Tuesday May 13, 2008 @02:53AM (#23388446)
    I totally agree! What is the benefit of asking a computer questions using natural language? It is just going to be making an educated guess as to what you really mean. I am thinking of the stupid little dog in MS Office or the computer on the ship the Golden Heart in Hitchhikers Guide. "Perhaps you would like some tea." "Share and enjoy!" Those aren't the type of conversations we want to have with computers. That's what people are for. But really I don't think natural language works with people. I think we should get rid of it. How many times do you hear "what do you mean?" or "oh, I thought you meant..." Natural language sucks. And I have seen some very passionate poetry written in XML and Java.
  • by Anonymous Coward on Tuesday May 13, 2008 @04:28AM (#23388844)

    Early Google wasn't perfect, but it got a lot better over time as the Pagerank algorithm was refined.


    Actually, am I the only one who thinks that google's results are worse now than they were years ago? It's still the best general search engine out there, but it often gives me results I don't want now, forcing me to put plusses in front of every word or quoted phrase just to make it actually search for what I asked for.
  • Re:I'm Unimpressed (Score:2, Interesting)

    by Kugrian ( 886993 ) on Tuesday May 13, 2008 @04:47AM (#23388952) Homepage
    First of all, I congratulate you for making attempts to improve the worlds
    searching (and also on the look of your website - I love that blue!). How is
    this different from ask.com [ask.com] though (Powerset's
    search didn't give me an answer to that).
  • Re:I'm Unimpressed (Score:2, Interesting)

    by Anonymous Coward on Tuesday May 13, 2008 @04:52AM (#23388976)
    When I was but a mere lad, just staring on Computer Science, I really believed in the "Hard AI" position, viz, all we need is enough computing power and sufficiently clever algorithms, and we'd have AI. Ah, the arrogance of youth (or my youth, anyway). Since then, I've come to the conclusion that the "Hard AI" position is a total non-starter. As the parent poster says, thus far we have got nowhere in AI (AI research may have lead to useful stuff, but that stuff isn't really AI!). Personally, I am impressed by the arguments advanced by the likes of Penrose and Hameroff, that "intelligence" (in the sense that we use the term wrt. humans) is a quantum phenomena. So, yes, there's a way to go yet.

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...