Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Using the Semantic Web to Enhance Search

Posted by Zonk on Fri May 27, 2005 09:21 AM
from the does-whatever-a-spider-can dept.
RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by jason718 (634659) on Friday May 27 2005, @09:24AM (#12654785)
    Semantic-driven search engines have awesome potential. However, it does place a lot of demand on the content provider to provide metadata-rich content - or to be able to provide intelligent mining tools to create metadata from existing sites.

    This is definitely one to watch...
    • However, it does place a lot of demand on the content provider to provide metadata-rich content

      This statement is why I was wondering why this was considered such a wonderful thing. For a while now, there's been a research project at IBM called WebFountain [ieee.org] that not only does everything that Semantic Web attempts to do, but doesn't require any special mark up either. Its goal is to work with completely unstructured data of any type, including web pages, powerpoint documents, word docs, PDFs, etc etc.
    • Geographers have been waiting for over a decade for metadata to catch on. Everyone hates building metadata, even when they know it makes their data infinitely easier for other geographers to use.

      In the context of GIS data, where metadata can be incredibly useful, creation of metadata is like pulling teeth.

      Unfortunately, until and unless there's automated tools - your "intelligent mining tools", this whole thing will never be more than a curiosity...
      • As one who has written semantic web pages, it's also rather difficult. OWL is a real pain to write, and most interpreters don't support "OWL Full", which means I'm stuck writing for either "OWL Lite" (now with only half the calories!) or "OWL DL". Forget (X)HTML, too - you need to use XML+RDF to use OWL, which means that if you want content you either need a parser or you need to code two documents for each one: One for human readability, and one that contains the metadata. There used to be a language calle
          • So, where do you find the business case that justifies web designers all over the world spending even 10 % extra time to specify the information needed by the Semantic Web???

            if it would mean that their sites would rank higher in the search results, I'd say that they all would...
  • by Anonymous Coward
    "Search on TAP was built to answer the following types of queries: There are also two actors named Harrison Ford: the one who played Han Solo, and a silent film star from the 1920's."

    That's nice and all but who shot first and is there a mash up of both scenes with crazy alien bar music mixed with 20's sinister piano.

  • autocomplete (Score:5, Insightful)

    by cryptoz (878581) <cryptoz@gmail.com> on Friday May 27 2005, @09:28AM (#12654845) Homepage Journal
    Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a". The user is not interested in items starting with "a". Perhas they're interested in terms beging with "anon" or something, which has many fewer items to load, therefore making the load time much faster and not annoying the user in the process.

    Or, even better, never have any autocomplete turned on automatically. Do a VB-like idea, where if you want to see possibilities at a certain point, hit a specific key that will register for the list to pop down.
    • Have you tried Google Suggest [google.com]? Auto complete is very useful when it doesn't slow down the typing, and when the results are in a useful order.
    • Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a".

      Thats just usability, the concept is sound. Instead of filling in results with "a", fill them in on three letters like "ast", which could have asterisk, astronaut, etc. The idea is to 1) save time by not making them typein an extra 6 letters and 2) cut down on misspellings.

    • I don't agree that it's completely useless. Don't we all tend to type the most important query word first?

      In any case, for Japanese/Chinese/Korean - autocomplete is almost a natural part of using a web search engine, so it's not a "useless feature that nobody wants to see."

      Those languages use alphabet-based inputs which are then converted into native text. Why bother converting if you can take the direct alphabetical input and start showing native text autocompletes?

  • Semantic Web? (Score:5, Informative)

    by DoctoRoR (865873) * on Friday May 27 2005, @09:29AM (#12654854) Homepage

    The Stanford research is interesting, but I'm still trying to make up my mind about the Semantic Web, learning about RDF, and whether I need to bake in ways of handling these kinds of assertions in my web app. The Stanford group writes, "Our hope is that our search application spurs development of the Semantic Web, and leads to sites publishing their data in this format so that we don't have to." It obviously takes more work to encode such information and getting user contributions auto-marked for the semantic web. For a counter viewpoint, take a look at some of Clay Shirky's work -- in particular:

    Will the semantic web be supported by future versions of Drupal, phpBB, and other grass-roots content management web apps? Not sure. Since a lot of the content is visitor generated, you would have to build in ways of providing easy markup. Would be interested to hear /. thoughts on the matter.

    • Until truly intelligent semantic classifying engines are available, the semantic web is best suited to things like wikipedia where the information is (generally) of a higher quality than what you find on a more general purpose site, like the one you are viewing now !!

      For example, a slashdot story about a newly discovered type of <crab type="crustacean"/> would soon degenerate into postings about <crab type"venereal disease"/>. Marking quickie (pun intended) posts up semantically would detract f
    • I have never been fond of articles like this. Slashdot points us to something new (at least to me), and links to horribly long-winded and incomprehensible explanations of what it is. Sure, I could understand them ... if I had an extra hour or two.

      Since it's obvious that you do understand, would it be possible for you to come up with a 1-2 paragraph explanation of what the Semantic Web is and does?

      I've spent some time on the linked to web site, and read Clay Shirky's essay, and I'm still not sure what it
        • I guess what I'd like to see, instead of a vague initial paragraph and pages of formal specifications, is a concrete example of how you would code this, and then how it would be used.

          Many thanks.

          D
  • slashdotted (Score:3, Funny)

    by maharg (182366) on Friday May 27 2005, @09:31AM (#12654869) Homepage Journal
  • by aftk2 (556992) on Friday May 27 2005, @09:32AM (#12654883) Homepage Journal
    While the idea of the semantic web has been legitimately lambasted [shirky.com], I think it's a bit far from DOA. While I agree that it's not exactly practical, I think that if you get enough sites displaying their content in such a manner, you'll eventually reach a point at which others will do the same.

    I mean, think about it this way - while laziness or inertia might initially win out, once someone's competitors start to explore the idea of the semantic web, interest will start to be shown in it, especially once it becomes either profitable to do so.
    • Well, part of Shirky's point is that it is so lacking in usefulness that there will be no advantage to anybody for display their content that way. I think he's right. I've watched AI based on these kind of logical rules and semantics stumble along for years without producing anything useful, and then along comes some program that takes little pieces of what other people said and 'mindlessly' strings them together in new ways and it wins a Turing contest.

      Logical reasoning of this kind, despite all the hyp

    • It gets worse: the method relies on the web site content author to know the semantic content, and to honestly report it. How would you check these things? Voting to determine if the earth revolves around the sun?
        • As far as spam goes, and mistaking popularity for correctness, yes you are right, and both of these are a big problem already.
          But there remains the problem that this technique does not find semantic connections that the authors don't know about.
  • Firstly scraping is the same as what google does, which is fine but only a fool would trust the scraper not to censor their output.

    Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.

    But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pret

  • by Crimson Dragon (809806) * on Friday May 27 2005, @09:35AM (#12654913) Homepage
    The Semantic Web appears to be a budding server-side solution to the paradigm of information glut online. Social bookmarking appears to be a client-side solution to the paradigm of information glut online.

    It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.
  • ...towards the future.
  • awesome! (Score:3, Funny)

    by Anonymous Coward on Friday May 27 2005, @09:42AM (#12654974)
    ...now I can finally search for "images of women with breasts larger than 36D"!
  • by Artifakt (700173) on Friday May 27 2005, @09:50AM (#12655049)
    This looks like it will broaden the volume of useful searches. Right now, there are at least two limits that show up when searching:

    1. For really popular subjects, the useful links are swamped in the noise of sites trying to make a buck off of getting you to look at their ads before directing you to somewhere else, that might have the actual content or might not.

    2. For many less popular subjects, there is some oddity, like an unusual term being borrowed by some other field, so that it is something most people have never heard of, but people in two or more specialties use it frequently, in very different ways. resulting in strangeness. (i.e. the search engine throws up 23,003 links for a search on "Sator Resartus". 30% are esoteric literary criticism, 20% relate to apoptosis (cell biology), 20% relate to building moral inhibitions into A.I., 10% to Keith Laumer novels, and the rest are probably noise).

    (I'm sure there are more than these two limits. Someone else may want to comment on some others).

    This is likely to help with the second case, oddities in the data set grouping. (it could sort links into the larger sub-categories, query the user which one(s) seemed most applicable, and maybe even sort out a small set of links that explain, for the previous example, how a high brow literary term got borrowed by the other fields).
    It's not as likely it would help with the first case, though, as sites that don't have actual content are actively duplicitous. Something that is actively trying to fool humans is still likely to be very successful at fooling our tools.
  • My question (Score:5, Interesting)

    by News for nerds (448130) on Friday May 27 2005, @09:55AM (#12655096) Homepage
    Does it have a countermeasure against 'semantic spam'?
    • There is no such thing as semantic spam. What you refer to is desinformation or information junk. Like the actual web, semantic web is about freedom, openess and accessibility. So, everybody can publish (I don't refer to governement laws, repression, etc.). But semantic web has a solution to this wave of information in a thing called the web of trust which propose giving trust ranking to information and introduce inference engines to compute which links/sites may interest you and why. But this is not for to
  • The average starting salary offer for Stanford graduate students has raised 30% in the last hour, as Microsoft, Google, and Yahoo each vied tooth and nail for their services.

    (starts filling in application)
  • One wouldn't think this would be particularly newsworthy here in supposed geek-haven, but Google has an auto-complete [google.com] feature as well.

    Of course, it's a beta feature at Google Labs. FYI...

  • by bcmm (768152) on Friday May 27 2005, @10:28AM (#12655503)
    That second link goes to http://www.google.com/url?sa=U&start=1&q=http://ww w.w3.org/2001/sw/&e=9707 [google.com]
    How is that different to linking to http://www.w3.org/2001/sw/ [w3.org]?

    Is Slashdot trying to improve someone Google ranking?

    (Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)
      • They always did it, for a random number of links every few queries or so. It's so they can collect data on which sites people thought were relevant to their query. These links seem to have become more and more common though.

        • Thats right and proper and everything, because thats part of how they rank pages. Your explaination was nice, because I had been noticing both direct and monitored links and wondering what was going on.
  • ...not only what the Semantic Web is about, but more pragmatically why this is in "Hardware." :)

  • Although I find the Semantic Web project intriguing, the idea of tagging data to define it is somewhat of a cop-out. The "meaning" of any given page is already there: in the page. Instead of spending so much time tagging pages, how about working on algorithms to derive meaning from the content. Surely those in the field of Computational Linguistics can make a real push at this: "artificial" corpora aren't needed anymore: the web offers more data than you'll ever need.

    Shameless promotion: for OS X users,
    • The Semantic Web is about describing resources, not tagging pages.

      Indeed, you might output RDF from your processing of Web pages.

      Extracting information from semi-structured text is very different to making logical assertions about resources.
      • Yes, that is a valid point. However, considering the (IMHO) substantial barriers to widespread adoption (getting authors to provide semantic descriptions, dealing with SPAM or purposefully misleading descriptions, etc.), I still would like to see more effort in context analysis research. The AI field has been floundering for so long, a catchy phrase such as "Semantic Web" (which, has been quite a successful meme) applied towards AI applications in contextual derivation could be helpful in moving things al
        • "Webservices have no need for semantic web"

          Ignoring your grammar, I would reply: tell that to the people trying to develop Web Services standards! Specifically, I'd point you to OWL-S, and its simpler, ad-hoc cousins.

          One of the most common uses of the Semantic Web at present is describing PEOPLE (FOAF, as used by LiveJournal and countless others). Do you not see that the Semantic Web goes beyond a Web of human-readable documents into a machine-understandable Web of data? You don't find pages on the S
  • The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.


    Isn't this basically what HTML is supposed to do kind of?
  • Maybe the Sematic Web can work someday, maybe not.

    However, anyone who thinks this is a utopia in the making should the infamous MetaCrap essay by Cory Doctorow:

    Metacrap: Putting the torch to seven straw-men of the meta-utopia. [well.com]

    After you are done reading, go to e-bay and pick yourself up a cheap Plam Pilot. :)

    1. Introduction
    2. The problems
    2.1 People lie
    2.2 People are lazy
    2.3 People are stupid
    2.4 Mission: Impossible -- know thyself
    2.5 Schemas aren't neutral
    2.6 Metrics influence results
    2.7 There's more th
    • I hate to say it, but Semantic Web blows chunks. No business is ever going to tag all their data so that anyone can use it. Business prefer to build specific webservices to integrate and charge customers.

      Fine with me. I don't want their information. In fact I'd like to get rid of their information (banner ads and spam).

      If I want to deal with businesses, I go to my local shop. If I can't find what I want there, I look up the yellow pages of my local phonebook. If I can't find what I want there, I loo

    • You have a valid point of view, but just one quick clarification:

      Rete scales really well as you add rules but scales really poorly with the number of items in working memory.

      I believe that rete would be a bad choice for the SW where you would have a very large data set in working memory.

      (I used to do a lot of rete hacking: commercial expert system tools for Xerox Lisp Machines and the Mac, and hacking OPS5 to support 'multiple data worlds' for in house use.)
      • by Anonymous Coward
        Nice straw man argument. How many people making their own personal site is going to dedicate 2/3 of their time to tag their content? The only people that are going to tag their content are those looking to abuse the system. No sane individual is going to spent 3 months of time to go back and edit all their pages with tags. Even then, you still have the problem of conflicting categories (aka ontologies). There will never be a globally accepted set of Onotologies. It's all pipe dream. Why should users spend h
        • There quite a few things where people might want to use some semantic mark-up:
          • Creative Commons, use rdf to specify copyright and licence info about a page, you can now search on this using special pages on google and yahoo.
          • Anyone who want to sell something, will be interested in making their content easy to find. A little bit of semantic mark-up , could help them shift units.
          • Anything pulled out from a database. Here its relatively easy to modify the code to add some extra mark-up.
          • Tagging this seems to
    • One word: Context.

      Currently keywords are used to search for relevant matches and yes, this seems to work ok for lots of things but imagine if you could add context:

      Imagine searching for the title of a peice of music that you heard in a certain film.
      Currently this could involve some digging but a semantic search engine could very quickly narrow this search. Have a look at this [mspace.fm] (theres a demo somewhere on the site). It's a research project run by Southampton Uni. It's pretty basic but hopefully you'll g
    • Actually, I saw a preso for this project a while ago. It was pretty neat, showed a lot of promise, and I see that it's been progressing nicely. Stanford KSL actually DOES like the Semantic Web. Sure, they receive DARPA funding, but that's not why the like the Semantic Web. Also, some of the features/scrapers have been built as requested by the gov't, but it's not like the entire project is for the gov't.