





Using the Semantic Web to Enhance Search 150
RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"
Google watch out... (Score:5, Insightful)
This is definitely one to watch...
Re:Google watch out... (Score:1, Interesting)
Re:Google watch out... (Score:3, Informative)
Re:Google watch out... (Score:1)
It's not that most interpreters don't support OWL Full, but that there are no tractable, sound and complete algorithms for subsumption reasoning in the logic that underpins OWL Full. If you write OWL DL there are restrictions on what you can express, but you do then have tractable algorithms. It's the tradeoff between expressivity and complexity, in short.
SHOE was primarily the result of Jeff Heflin's PhD research, and he used his experiences of writing SHOE to good effect on the W3C's Web Ontology Workin
Re:Bashers watch out... (Score:2, Insightful)
if it would mean that their sites would rank higher in the search results, I'd say that they all would...
Re:Google watch out... (Score:3, Interesting)
This statement is why I was wondering why this was considered such a wonderful thing. For a while now, there's been a research project at IBM called WebFountain [ieee.org] that not only does everything that Semantic Web attempts to do, but doesn't require any special mark up either. Its goal is to work with completely unstructured data of any type, including web pages, powerpoint documents, word docs, PDFs, etc etc.
Re:Google watch out... (Score:1)
From the IEEE Spectrum article:
WebFountain compl
Re:Google watch out... (Score:2)
In the context of GIS data, where metadata can be incredibly useful, creation of metadata is like pulling teeth.
Unfortunately, until and unless there's automated tools - your "intelligent mining tools", this whole thing will never be more than a curiosity...
Qualify as Semantic Web ? (Score:1)
Also given that you dont have any 'meaning' to nodes and links
Re:Qualify as Semantic Web ? (Score:1)
Tap doesn't appear to have an ontology (OWL or RDFS) that's published separately to the RDF data, but the RDF data files do appear to contain class definitions. In my book, that's sufficient meaning to qualify as a SW application under the rules laid down by the SW Challenge. It's certainly about as much meaning as we had in CS AKTive Space [aktors.org] when we won the first SW Challenge in 2003.
Re:Qualify as Semantic Web ? (Score:1)
Today OWL is formalized. Several OWL based api/reasoners are in place. Using such 'RDF only' applications misguides people and the community. My only request to you all Semantic Web Gurus is to preach right
Loading... (Score:1)
Re:Loading... (Score:1)
From the check it out link... (Score:2, Funny)
That's nice and all but who shot first and is there a mash up of both scenes with crazy alien bar music mixed with 20's sinister piano.
600MB?!?!?!? (Score:1)
(oh, and I'm getting 503's for the searches)
How 'bout... (Score:1)
autocomplete (Score:5, Insightful)
Or, even better, never have any autocomplete turned on automatically. Do a VB-like idea, where if you want to see possibilities at a certain point, hit a specific key that will register for the list to pop down.
Re:autocomplete (Score:1)
auto complete list will not show up until only 20 results are returned
of cos, this is just another example
Re:autocomplete (Score:3, Insightful)
Re:autocomplete (Score:2)
Thats just usability, the concept is sound. Instead of filling in results with "a", fill them in on three letters like "ast", which could have asterisk, astronaut, etc. The idea is to 1) save time by not making them typein an extra 6 letters and 2) cut down on misspellings.
Useless? I don't think so. (Score:2)
In any case, for Japanese/Chinese/Korean - autocomplete is almost a natural part of using a web search engine, so it's not a "useless feature that nobody wants to see."
Those languages use alphabet-based inputs which are then converted into native text. Why bother converting if you can take the direct alphabetical input and start showing native text autocompletes?
Re:autocomplete (Score:1)
As a prior post pointed out, the most important problem with the Semantic Web is getting people to generate data. Until that happens on a widespread basis, the data coverage will always be spotty compared to a keyword engine.
We added the Autocomplete dropdown in response to user feedback that they had no idea what was in the system until they hit "enter", and by then it was too late. The dropdown gives immediate feedback abo
Semantic Web? (Score:5, Informative)
The Stanford research is interesting, but I'm still trying to make up my mind about the Semantic Web, learning about RDF, and whether I need to bake in ways of handling these kinds of assertions in my web app. The Stanford group writes, "Our hope is that our search application spurs development of the Semantic Web, and leads to sites publishing their data in this format so that we don't have to." It obviously takes more work to encode such information and getting user contributions auto-marked for the semantic web. For a counter viewpoint, take a look at some of Clay Shirky's work -- in particular:
Will the semantic web be supported by future versions of Drupal, phpBB, and other grass-roots content management web apps? Not sure. Since a lot of the content is visitor generated, you would have to build in ways of providing easy markup. Would be interested to hear /. thoughts on the matter.
Re:Semantic Web? (Score:2)
For example, a slashdot story about a newly discovered type of <crab type="crustacean"/> would soon degenerate into postings about <crab type"venereal disease"/>. Marking quickie (pun intended) posts up semantically would detract f
Re:Semantic Web? (Score:2)
Since it's obvious that you do understand, would it be possible for you to come up with a 1-2 paragraph explanation of what the Semantic Web is and does?
I've spent some time on the linked to web site, and read Clay Shirky's essay, and I'm still not sure what it
Re:Semantic Web? (Score:1)
In short, the goal of the semantic web is to make the web (semantic) understandable to computers (by any mean possible). This, to bring new possibilities and automatism. For this to be possible, we need to explicit things in a formal manner.
Re:Semantic Web? (Score:2)
Many thanks.
D
Re:Semantic Web? (Score:1)
Semantic web is not something you can thing of as a concrete application nor we can consider it mature. As you surely read, semantic web is an extention of the current web. So I can link you to firefox [mozilla.org] or some HTML editor. Joke aside, it is more complicated than that and if you want to embrass semantic web you should get to know XML, RDF and OWL (in this order). In fact, if you are not working to build sw, you should consider another approach. I
Re:Semantic Web? (Score:1)
Wine Rdf [w3.org]
Look through that RDF with emacs/notepad. You will probably not understand all of it, but you can get the gist of it. It attempts to classify things categorically almost, so finding out context for a word is simple. For instance, the owl:Class of "Wine" is a subClass of "PotableLiquid" with a couple restrictions and properties that wine could have in real life.
Why is this useful? It dramatically increases the level at
Re:Semantic Web? (Score:1)
The smarties of the world know that metadata can be used to do all sorts of great things, but it just hasn't happened yet. The technology and the understan
Re:Semantic Web? (Score:1)
slashdotted (Score:3, Funny)
Re:slashdotted (Score:1)
I think I'll lock the door so the IS department can't find me.
There's a coral cache [nyud.net] of the static content, including screenshots, if you can't get through to my melted pile of servers.
Semantic Web Pitfalls (Score:4, Insightful)
I mean, think about it this way - while laziness or inertia might initially win out, once someone's competitors start to explore the idea of the semantic web, interest will start to be shown in it, especially once it becomes either profitable to do so.
Re:Semantic Web Pitfalls (Score:2)
Well, part of Shirky's point is that it is so lacking in usefulness that there will be no advantage to anybody for display their content that way. I think he's right. I've watched AI based on these kind of logical rules and semantics stumble along for years without producing anything useful, and then along comes some program that takes little pieces of what other people said and 'mindlessly' strings them together in new ways and it wins a Turing contest.
Logical reasoning of this kind, despite all the hyp
Re:Semantic Web Pitfalls (Score:1)
Logical reasoning is currently primitive and definitely overrated. We don't use OWL. The reasoning we do is very primitive, and is not of the sort that Clay Shirky is talking about. I actually agree with the thrust of his essay, despite the flaws that others have pointed out.
TimBL has talked about the Semantic Web as less a thing of logic and more like a giant database. I think that characterization has some problems also, but it's closer to what Search on TAP is doing.
Re:Semantic Web Pitfalls (Score:2)
Re:Semantic Web Pitfalls (Score:1)
We haven't really dealt with the spam problem because it's a problem we'd love to have. Right now there's so little content that we can afford to only pick the highest quality sites.
The automated techniques like those WebFountain uses are susceptible to the same problems, as is Wikipedia, so I'm not convinced that this is necessarily a Semantic Web problem as much as an Internet problem.
Re:Semantic Web Pitfalls (Score:2)
But there remains the problem that this technique does not find semantic connections that the authors don't know about.
This won't work (Score:2, Interesting)
Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.
But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pret
A similiar project that I worked on at school (Score:1)
http://portal.acm.org/citation.cfm?id=544220.54422 8 [acm.org]
It was a really interesting project to be a part of!
Go UMass!
A tale of two technologies.... (Score:3, Interesting)
It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.
One more step... (Score:2)
awesome! (Score:3, Funny)
Re:awesome! (Score:2)
Just go to any hardcore site for that.
Maybe that's why sites showing non-professionals are becoming more popular. Not everyone likes the big fake ones...
Re:awesome! (Score:1)
I'm not sure our sponsors in the military and intelligence agencies would fund research in breast sizes. They're kind of a sensitive bunch. But maybe I'll stick it in a proposal and see what happens.
I think the comment that semantic web research has focused on logic such as query analysis, comparisons, and groupings is fair for the Semantic Web in general.
For Search on TAP we don't have a lot of people or resources. Despite that, I spend an awful lot of time generating data. The compressed RDF, which we
RDF? (Score:1)
Might actually help (Score:4, Insightful)
1. For really popular subjects, the useful links are swamped in the noise of sites trying to make a buck off of getting you to look at their ads before directing you to somewhere else, that might have the actual content or might not.
2. For many less popular subjects, there is some oddity, like an unusual term being borrowed by some other field, so that it is something most people have never heard of, but people in two or more specialties use it frequently, in very different ways. resulting in strangeness. (i.e. the search engine throws up 23,003 links for a search on "Sator Resartus". 30% are esoteric literary criticism, 20% relate to apoptosis (cell biology), 20% relate to building moral inhibitions into A.I., 10% to Keith Laumer novels, and the rest are probably noise).
(I'm sure there are more than these two limits. Someone else may want to comment on some others).
This is likely to help with the second case, oddities in the data set grouping. (it could sort links into the larger sub-categories, query the user which one(s) seemed most applicable, and maybe even sort out a small set of links that explain, for the previous example, how a high brow literary term got borrowed by the other fields).
It's not as likely it would help with the first case, though, as sites that don't have actual content are actively duplicitous. Something that is actively trying to fool humans is still likely to be very successful at fooling our tools.
Semantic Horse shit (Score:1, Interesting)
The best part is t
Re:Semantic Horse shit (Score:1)
Although it would be nice, no one is mandating or asking every website out there to mark up all their pages semantically. But if you want your information to be shared, a good way to start is to mark it up semantically so that more and better inform
Re:Semantic Horse shit (Score:2, Insightful)
Re:Semantic Horse shit (Score:1)
This is not about arguing over a set of standards over the ontology of how the data should be represented; this is about think
Re:Semantic Horse shit (Score:2)
Re:Semantic Horse shit (Score:2)
Fine with me. I don't want their information. In fact I'd like to get rid of their information (banner ads and spam).
If I want to deal with businesses, I go to my local shop. If I can't find what I want there, I look up the yellow pages of my local phonebook. If I can't find what I want there, I loo
Re:Semantic Horse shit (Score:2)
Rete scales really well as you add rules but scales really poorly with the number of items in working memory.
I believe that rete would be a bad choice for the SW where you would have a very large data set in working memory.
(I used to do a lot of rete hacking: commercial expert system tools for Xerox Lisp Machines and the Mac, and hacking OPS5 to support 'multiple data worlds' for in house use.)
Re:Semantic Horse shit (Score:1)
semantic web definitely is a solution in search of a problem. it's probably naive to think organizing data is easy. the original post was
Re:Semantic Horse shit (Score:1)
People who don't have a clue about semantic web tend to refer about it as semantic horse shit. It's a petty that those who don't believe in things try to demolish them rather than let it go... or let it perish if they are so sure about it's doom.
Re:RSS is not Semantic Web (Score:1)
As for RSS, it is limited, but it took off rapidly. RSS v1.0 introduced RDF. That is another step in the right direction.
BTW RDF isn't that complicated. Think of it as a triplet : Subject Verb Obje
Re:RSS is not Semantic Web (Score:1, Interesting)
I don't think the evidence on RDF mailing list supports that opinion. Look at the literature in the bookstores about semantic web. If anything, it is full of confusion and the specification is poorly written compared to the HTML and XML specification.
Triplet does not equal (Subject verb object). What the R
Re:RSS is not Semantic Web (Score:1)
I don't know which mailing list you refer to, nor which books but the web is an excellent source of information for that matter. Take a look at links returned by google for RDF : here [xml.com], RDF homepage [w3.org] full spec, RDF primer [w3.org] for some graphs and there [wikipedia.org] or this [oreilly.com] e
Re: poor explanation (Score:1)
I see what you mean. None the less, reasonning over hierarchies uses RDFS since hie
My question (Score:5, Interesting)
Re:My question (Score:2, Interesting)
Re:My question (Score:1)
I replied to a lower-scored post with this question that we haven't had this problem yet, but that it's a problem that exists with any technique, whether it's Wikipedia, and automated technique like WebFountain, or the Semantic Web. It's an Internet problem.
A followup to this post mentioned using a web of trust to counteract spam. That's something that Guha has done a lot of work with, and Paulo is working in the lab here on some prototypes based on movie data.
Spam is a problem I would love to have beca
In Related News (Score:2)
(starts filling in application)
auto-complete (Score:2)
Of course, it's a beta feature at Google Labs. FYI...
Slashdotting Google bomb? (Score:3, Interesting)
How is that different to linking to http://www.w3.org/2001/sw/ [w3.org]?
Is Slashdot trying to improve someone Google ranking?
(Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)
Re:Slashdotting Google bomb? (Score:1)
Re:Slashdotting Google bomb? (Score:2)
They always did it, for a random number of links every few queries or so. It's so they can collect data on which sites people thought were relevant to their query. These links seem to have become more and more common though.
Re:Slashdotting Google bomb? (Score:2)
I'm still trying to figure out... (Score:3, Funny)
The semantic data is already there (Score:2)
Shameless promotion: for OS X users,
You missed the point! (Score:3, Insightful)
Indeed, you might output RDF from your processing of Web pages.
Extracting information from semi-structured text is very different to making logical assertions about resources.
Re:You missed the point! (Score:2)
Re:You missed the point! (Score:2)
Ignoring your grammar, I would reply: tell that to the people trying to develop Web Services standards! Specifically, I'd point you to OWL-S, and its simpler, ad-hoc cousins.
One of the most common uses of the Semantic Web at present is describing PEOPLE (FOAF, as used by LiveJournal and countless others). Do you not see that the Semantic Web goes beyond a Web of human-readable documents into a machine-understandable Web of data? You don't find pages on the S
Re:You missed the point! (Score:2)
Re:You missed the point! (Score:1)
You live in the past. ;)
You are refering URL but semantic web stands on URIs which are a superset URL. If you limit yourself to URL we better not talk semantic web because it transcend the current view of resources. This concept is very significant.
Just to be a little clearer, let's take a simple example. You want to refer to yourself. How can you do this? You can't download yourself on the net (can't you? ;). What you can have is an homepage : your page
Re:You missed the point! (Score:1)
This is normal that people uses them interchangably, current web uses exclusively URL, so URI is like a new concept for most. As a matter of fact, they coul
Re:The semantic data is already there (Score:1)
An automated technique that could do better than a human tagger would have an additional feature of being able to pass the Turing Test.
I admire your faith in automated techniques, since the ones I've seen have a catastrophic error rate and can't provide particularly rich data. The state of the art there is constantly improving, though, and there's no reason why such algorithms can't generate RDF anyway. The Semantic Web is about file formats and conventions, it doesn't necessarily mean human tagging.
For
Gathering Metadata from Apple's Filesystem? (Score:1)
Does anyone know if web crawlers/gatherers (google, harvest, combine etc') have the ability to access that information and associate it with the file?
I would love an automatic gatherer extracting my metadata from the filesystem and allowing searches on it, in combination with the full text option.
What!? (Score:1, Offtopic)
The PLO is the organization representing the Palestinian people that eventually evolved into the Palestinian Authority. It had observer status in the UN General Assembly and even special permission to participate on Security Council debates (sans voting rights). Al Fatah is a political party which was involved in guerilla activities in the 70s, but that has, since the Oslo Accords, accepte
Re:What!? (Score:1)
Re:What!? (Score:2)
Piggy Bank (by MIT) (Score:1)
It contains a RDF engine, and allows you to install "screen scrapers" for different sites, plus it knows automatically how to read FOAF and some other ontologies that have spread on the net a little bit. When you see the "Semantic web coin" icon in your status bar, you can click on it and it will
How is this different from HTML? (Score:2, Insightful)
Isn't this basically what HTML is supposed to do kind of?
Re:How is this different from HTML? (Score:1)
HTML is a "hyper-textual" document with images, objects and links.
RDF's prime purpose is for organzing resources and creating catalogues.
metacrap (Score:2)
However, anyone who thinks this is a utopia in the making should the infamous MetaCrap essay by Cory Doctorow:
Metacrap: Putting the torch to seven straw-men of the meta-utopia. [well.com]
After you are done reading, go to e-bay and pick yourself up a cheap Plam Pilot.
1. Introduction
2. The problems
2.1 People lie
2.2 People are lazy
2.3 People are stupid
2.4 Mission: Impossible -- know thyself
2.5 Schemas aren't neutral
2.6 Metrics influence results
2.7 There's more th
Who'd want to see that? (Score:1)
meaning (Score:1)
Re:best with firefox (Score:5, Insightful)
Re:best with firefox (Score:1)
standards-compliant means (Score:2)
not entirely, but pretty close -- if you write compliant html/js, it has an excellent chance of working in all of {firefox, opera, safari}
Re:best with firefox (Score:2)
Re:best with firefox (Score:1)
Phrasing it that way, that it works best with any standards compliant browser, doesn't get the point across to those who think IE is a standards compliant browser.
Search on TAP has been tested with Firefox on Linux, Windows, and OS/X, and with IE on Windows. I think Andy might have tried it with Safari. I haven't tested it with Opera. With IE, I had to redo how the dynamic HTML was being generated twice to get around its limitations, and it's still ignoring my alignment tags.
Saying it works with standar
Re:best with firefox (Score:1)
Now the cute little firefox plushtoy (have you seen it?) - that's what people will remember. As long as you keep the FF designers on the straight and narrow wwith regards to implementing web standards, then everybody gets what they want.
Course, some will argue that Firefox isn'
Re:best with firefox (Score:2)
Re:And the big deal is??? (Score:2)
Currently keywords are used to search for relevant matches and yes, this seems to work ok for lots of things but imagine if you could add context:
Imagine searching for the title of a peice of music that you heard in a certain film.
Currently this could involve some digging but a semantic search engine could very quickly narrow this search. Have a look at this [mspace.fm] (theres a demo somewhere on the site). It's a research project run by Southampton Uni. It's pretty basic but hopefully you'll g
Re:RDF, RDFS, DAML, OWL?? (Score:1)
Re:Backwards (Score:1)
I really don't know what you mean by this nor how can it be good.
P.S.: Everyone as to "confirm you're not a script" even logged users.
Re:Full disclosure (Score:2)