Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Super-Fast RDF Search Engine Developed

Posted by ScuttleMonkey on Fri May 04, 2007 09:27 AM
from the google-to-buy-ireland dept.
The Register is reporting that Irish researchers have developed a new high-speed RDF search engine capable of answering search queries with more than seven billion RDF statements in mere fractions of a second. "'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI. 'These results enable us to create web search engines that really deliver answers instead of links. The technology also allows us to combine information from the web, for example the engine can list all partnerships of a company even if there is no single web page that lists all of them.'"
+ -
story

Related Stories

[+] Why the Semantic Web Will Fail 179 comments
Jack Action writes "A researcher at Canada's National Research Council has a provocative post on his personal blog predicting that the Semantic Web will fail. The researcher notes the rising problems with Web 2.0 — MySpace blocking outside widgets, Yahoo ending Flickr identities, rumors Google will turn off its search API — and predicts these will also cripple Web 3.0." From the post: "The Semantic Web will never work because it depends on businesses working together, on them cooperating. There is no way they: (1) would agree on web standards (hah!) (2) would adopt a common vocabulary (you don't say) (3) would reliably expose their APIs so anyone could use them (as if)."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by achillean (1031500) on Friday May 04 2007, @09:34AM (#18988207) Homepage
    Here's the link to the official NUIG: DERI (omgwtfbbq) website in Ireland:

    DERI [www.deri.ie]
  • This could be huge (Score:5, Interesting)

    by $RANDOMLUSER (804576) on Friday May 04 2007, @09:35AM (#18988211)
    Except for the minor little problem of getting everyone to agree on the ontologies. Being able to search quickly is important, but until somebody comes up with the Dewey Decimal System for all knowledge, it won't mean much.
    • by G4from128k (686170) on Friday May 04 2007, @09:44AM (#18988361)
      Yes, creating a consistent ontology is challenge. But the bigger challenge is the lack of incentive for ontology truthfulness. If this type of search becomes popular, ontology spam and OSEO (Ontology Search Engine Optimization) will become a booming industry.
      • Of course you're correct. It had never occured to me that there would be ontology spam, but of course there will be. Still, for the pure knowledge aspects (think Wikipedia on RDF) it would be a wonderful thing.
        • Of course you're correct. It had never occured to me that there would be ontology spam, but of course there will be. Still, for the pure knowledge aspects (think Wikipedia on RDF) it would be a wonderful thing.

          For a while, yes. But as long as there is a cash-per-page-view market, the onslaught of adverspam will reach every corner of the web. It can't be stopped as long as there is money to be made there.

          Certainly the big "pure knowledge" sites will defend themselves, as Wikipedia does, but that is an ar

        • I agree with you 100% and did not mean to imply that the goal is not worthy. Being able to search semantically or to pull out just the relevant information would be hugely valuable.

          And I'm sure that next generation search engines will create clever ways of detecting and punishing ontology spam (e.g., noting the dissonance between the text and the tags)
      • I was thinking the article kinda indicates a resolution to the ontology. Most definitions are a product of synonymous/antonymous context. For instance a person cannot understand the concept of clear without simultaneously understanding opaque. This level of search would suggest that if you throw enough generic definition at a term then some logic could be used to say "if we find so many synonyms then we have an accurate definition" this is how AIML works at a basic level. RDF would be like AIML on crack and
        • Ah, but that's the rub: most things are not binary, neither this nor that. Things live on a continuum, and it's all too often a judgement call where they should lie.
          transparency ==> translucency ==> opacity

          Or, to put it in website design terms: "It's not blue enough.
      • Why do I suddenly get this mental image of spam on increasing the plumage size of birds...?
      • Yes, creating a consistent ontology is challenge. But the bigger challenge is the lack of incentive for ontology truthfulness.

        I'd say consistent ontology is a bigger challenge (though also one that doesn't need to be anywhere near completely solved for all kinds of useful applications to exist.) Trust mechanisms built on RDF aren't really all that big of a challenge: trust relationships are fairly basic, straightforward relationships of exactly the type RDF was designed to express from the outset, after all

      • Re: (Score:3, Insightful)

        Ontology SPAM is OK, but Epistemology Spread is really yummy!
    • Ah, but the Dewey Decimal system only works because responsible people are involved in categorizing everything. They let just anyone publish information on the internet these days.
    • Re: (Score:2, Interesting)

      Actually there is a lot of research being done to get around the need for a 'Dewey Decimal System'. The idea is to analyze relations between terms (names, datatypes, ect.) in an ontology. One could also compare relationships between terms: A child of B, C child of D, and A=B does B==A ?? Please note that these are examples of how terms and ontologies *could* be matched and not necessarily how someone would match terms. http://www.ontologymatching.org/ [ontologymatching.org] Also, http://wordnet.princeton.edu/ [princeton.edu] is a project I thin
      • It was an admirable attempt in its time, but it's pretty clunky. It's also very biased towards the world-view of one man in the 1870s. While it does get updated, you'll find that there are structural issues. The classic example is religion: everything involving Buddhism, Sikhism, or Jainism is lumped together in a number space which is the same size as the number space reserved for Christian "Parish Government And Administration." Christianity itself gets 88 percent of all the top-level numbers set aside fo
  • for a Radio Direction Finder?
      • Yes. And it's always easy to find the largest RDF on the planet... It's wherever Steve Jobs is. I hear it extends into space.
  • Links! (Score:4, Insightful)

    by SolitaryMan (538416) on Friday May 04 2007, @09:37AM (#18988235) Homepage Journal

    These results enable us to create web search engines that really deliver answers instead of links.

    I need both: answers *and* links! Many times when I search the web, I don't know for sure what am I searching for, let alone being able to ask specific question...

    • This is probably the biggest problem with searching. Google can return really good results if you know what to search for. Most people I know just type in the first word that pops into their head, and make the search way too generalized, and don't get good results. Knowing what words to type in can save you a lot of time in searching.
      • The sad thing is that it's so easy to learn how to get good results using current search engines, but people are never taught how to do it.

        RDF could do very useful things, like throwing up a disambiguation question at the top os the results page when you've not made it clear what you want, or filtering out the plague of typosquatter/content free price comparison/'be the first to write a review of this item' sites, but so could a bit more intelligence built into Google.
        • People just expect computers to do everything for them, and turn off their brains most of the time. This is why people have so many problems operating computers. Most people when searching for information about Cats (the musical) will probably just type in "cats", and look through all the results. Whereas, a person who understands the concept of feeding the right information to the search engine, will probably type in "cats musical", or if you're looking for something more specific, you may type in "cats
        • RDF could do very useful things, like throwing up a disambiguation question at the top os the results page when you've not made it clear what you want
          It looks like you're trying to search for tentacle porn. Would you like help?

          No thanks, I don't need Clippy in my search engine.
          • I'm not suggesting Clippy, I'm just suggesting disambiguation. Google already does this for typos (You searched for "kats musical song list" did you mean "cats musical song list"?). If Google noticed that the cats pages fell into 3 major categories (musical/animal/character who says 'all your base') and offered me those options in the typo line, I'd find that useful in narrowing down which of the 86,500,000 pages it found is the right one.

            In your example, I'm guessing you might find the option to filter dow
            • I know, I was just making a joke.

              However, I think contextural disambiguation questions like what you're suggesting are already served by "search within results" queries. Proposing likely criteria for narrowing down the results would be, I think, a disservice. It pigeonholes sites, but worse than that, pigeonholes searches. This leads to easy gaming of the search system -- SEO would cause pretty much every site to make sure it's associated with the typical disambiguation terms, thus removing the utility
              • I'm not so sure. I'm not suggesting linking to disambiguation pages (which could be gamed by SEO), I'm suggesting Google analyses the text and notices that pages tend to either use the words "Andrew Lloyd Webber" "Kitty-litter" or "set us up the bomb" and that these phrases tend to be mutually exclusive, so they would be good ones to offer as means of disambiguation.

                The terms wouldn't be 'typical disambiguation terms', as they would be generated freshly from the content of the pages that appear in the searc
                • Hmm. I do manually what you're suggesting when searching. Enter my search terms, and if most of the results are for something different than what I'm looking for, I'll add terms to remove the extraneous results. This is based on the couple lines of content info returned by Google (of course, those lines aren't 100% fresh with Google).

                  So, I think what you're suggesting is that the search engine prompt those terms to help people narrow their search? Didn't Ask Jeeves try this and miserably fail -- and if
  • Having solved the problem of search, and providing a breakthrough product that has consciousness to what was previously mere series of tubes, now the National University of Ireland announced that it is going to solve world hunger next, may be in three months. Other projects in the pipeline includes cure for cancer and solving full Navier Stokes equation.
  • Hype (Score:5, Insightful)

    by gvc (167165) on Friday May 04 2007, @09:42AM (#18988317)

    users should get more relevant results


    Yet another /. article parroting an uncritical popular press account of a press release.
    • We have a Technical Report available at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf [www.deri.ie] that should answer most of the technical questions. From the abstract: "We present the architecture of an end-to-end search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a
  • RDF? (Score:4, Funny)

    by lancelotlink (958750) on Friday May 04 2007, @09:46AM (#18988393)
    I didn't realize Steve Jobs' Reality Distortion Field was able to be harnessed and bottled in a search engine, or any software for that matter. His abilities are boundless!
  • I'll prove him wrong (Score:4, Interesting)

    by Big Nothing (229456) <big.nothing@bigger.com> on Friday May 04 2007, @09:57AM (#18988577)
    "'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI."

    This is without a doubt the greatest invention in the history of time!

    There, I just proved the professor wrong. Muahaha.

  • by stevenp (610846) on Friday May 04 2007, @09:59AM (#18988617)
    - "The importance of this breakthrough cannot be overestimated"

    The importance of any event can be overestimated and quite often is overestimated. It is called hype.
    When speaking of XML, XHTML and semantic WEB then the word "overestimated" fits just nice.
    If this was not the case then HTML should long have been dead and the whole WEB should have been based on pure XML with meaningful tags.

    -- Do not read me, I am a stupid tag
  • by Anonymous Coward on Friday May 04 2007, @10:11AM (#18988821)
    What kind of data set did they use? The structure and contents of the graph that is the data in an RDF database has a huge impact on the performance of query execution, and different applications have different structures.

    What kind of queries are they running? There are several different RDF query languages (think of SeRQL, RDQL, N3, SPARQL, etcetera) and some of them support quite complex queries. Quickly finding the answers to a simple query like

    SELECT ?name WHERE ?name <http://xmlns.com/foaf/0.1/name> "John Smith"
    is just a matter of an indexed lookup and not very special. But, like in SQL, much more complex expressions can be generated that require complex index operations on the query execution level. Having implemented an RDF database that supports SPARQL queries an order of magnitude faster than the software the W3C uses for their experiments (which, admitedly, doesn't have performance as a prime requirement), I know that it's possible to do simple things fast, but the interesting part is handling RDF queries that don't easily map to efficient database operations.

    Which brings me to the most important point: where is their detailed report? Can I get the software somewhere and perform my own tests? The article is too vague to draw any conclusions about what their RDF database does, and how good it is. I'd love to read up on it, but I can't seem to find the information.
    • by aharth (412459) on Friday May 04 2007, @10:19AM (#18988963) Homepage
      Hello, I am one of the main developers of SWSE. True, the press release is vague, but there is only so much you can say in a press release aimed for the general public.

      We have a Technical Report available at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf [www.deri.ie] that should answer most of the technical questions.

      From the abstract:

      "We present the architecture of an end-to-end search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web.

      In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers.

      We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements."
  • Colonel Sandurz: Prepare ship for light speed. Dark Helmet: No, no, no. Light speed is too slow. Colonel Sandurz: Light speed is too slow? Dark Helmet: Yes. We're gonna have go right to... SUPER speed. [everybody gasps] Colonel Sandurz: SUPER speed? Sir, we've never gone that fast before. I do'nt know if this ship can take it. Dark Helmet: What's the matter Colonel Sandurz? Chicken? Colonel Sandurz: [Wimpering] Prepair ship! [Calms down] Colonel Sandurz: Prepare ship, for Ludicrous speed. Fasten all seat be
    • Re: (Score:2, Informative)


      [Wimpering] Prepair ship! [Calms down] Colonel Sandurz: Prepare ship, for Ludicrous speed. Fasten all seat belts.

      If you're going to steal a joke, you need to make sure to replace all references to the original. Find / Replace works great for this.

  • First, giving the amount of time and the number of items searched means nothing. Are they doing it on a BlueGene or an Apple II?

    Second, the problem with "the semantic web" if you're relying on people providing the metadata themselves, is the reliability (trustworthiness?) of the person creating the metadata. There's a reason the meta name="keywords" tags aren't a significant factor if at all in any of the major search engines' ranking systems.
  • sounds fishy (Score:3, Interesting)

    by vga_init (589198) on Friday May 04 2007, @10:50AM (#18989453) Journal

    Of course a search based on meta data is going to be faster and more accurate, but only when the meta data is correct. We've had this since the beginning of the interweb; people would load up their pages with bogus meta data just to generate search traffic. Because of this dishonesty, search engines have had to resort to other methods of evaluating and indexing pages (for example, based on actual content).

    I don't see any difference between this new RDF and that old stuff.

  • So now we have a search engine capeable of making a godzillion searches in a data domain that does not exist yet. That's all great and dandy, and we do indeed need new models and architectures for search engines once (if) the web goes all semantic. However, when (if) the semantic web ever becomes a reality, this search engine will long be retired. So, this result is great from a research point of view, but don't expect it to leave the lab.
  • by aidhog (1097699) on Friday May 04 2007, @11:18AM (#18989863)
    As one of the developers on the project (along with user aharth), feel free to ask any specific questions you may have here. The article is quite vague and so I refer you to a technical report at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf/ [www.deri.ie].
  • but why would I want to search several million statements from the Robotech Defense Force? I mean, sure I'm an Anime nerd, but there are limits...
    • I asked the RDF search engine, and that's what it told me. Maybe if we ask it the right question it could come up with an answer to do that. Now if only we could devise a machine powerful enough to tell us what that question would be...