Semantic Web Getting Real

Semantic Web Getting Real 135

Posted by kdawson on Sunday February 10, 2008 @09:12PM from the open-it-up-and-give-it-away dept.

BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."

Semantic Web Getting Real

This discussion has been archived. No new comments can be posted.

Search 135 Comments Log In/Create an Account

Comments Filter:

Semantic Spam (Score:2, Insightful)

by Rog7 ( 182880 ) writes: on Sunday February 10, 2008 @09:16PM (#22374878)

Next up, semantic spam.

Actually, I think it's beaten the rest of the content to the punch. =(

Content? (Score:4, Insightful)

by Walzmyn ( 913748 ) writes: on Sunday February 10, 2008 @09:26PM (#22374934)

What good are fancy links if the content still sucks?

Re:Where's the Money? (Score:5, Insightful)

by QuantumG ( 50515 ) writes: <qg@biodome.org> on Sunday February 10, 2008 @09:37PM (#22375002) Homepage Journal

Yeah, it won't matter until Google starts getting in on the act. When you can search for "a website where I can get free kittens and other pets" and get exactly that, instead of just sites that have those keywords in it (like this message in a day or so), then it will be valuable for people to RDF their site and maybe even look at the mess that the translator makes and clean it up.

Re:Where's the Money? (Score:3, Insightful)

by ushering05401 ( 1086795 ) writes: on Sunday February 10, 2008 @09:41PM (#22375032) Journal

Feeding Proxies is one potentially lucrative use of semantic technology.

Here is a basic scenario for ten years down the line:

1. You build a profile probably through a combination of allowing your online activities to be profiled, filling out in-depth surveys, and rating certain types of web-content on a semi-regular basis.

2. A proxy identity is imbued with a 'personality' based on both your preferences as represented in step one, and ongoing analysis of content that causes you to register a strong reaction.

3. The proxy consumes content and delivers what it believes to be desirable content to your device of choice.

Given this business model we could see a return to the old 'portal' style of doing web business - though the portal itself would be largely invisible to the subscriber. Anything as simple as changing diction of a news item could vastly alter the interest of the proxy public.

Re:long way to go.. (Score:3, Insightful)

by QuantumG ( 50515 ) writes: <qg@biodome.org> on Sunday February 10, 2008 @10:10PM (#22375238) Homepage Journal

blah, search is great and all, but that shouldn't really be the ultimate purpose of the Semantic Web.

Asking a question and getting a sensible answer, that's the killer app.

Not the Semantic Web (Score:5, Insightful)

by timeOday ( 582209 ) writes: on Sunday February 10, 2008 @10:21PM (#22375290)

IMHO this is not the semantic web. The primary representation is still (just) natural language. Anything in addition to that is really just search engine technology under a different banner. Is that a bad thing? No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it; instead, the evolution of natural language processing in search (rather than manual tagging) will solve the problem. Maybe the Reuters idea of exposing the "inferred" metadata will be useful (as opposed to normal searches like google who simply keep this metadata in their own indices), though as yet I don't see why.

Why can't AI get the semantics from the plain text (Score:3, Insightful)

by presidenteloco ( 659168 ) writes: on Sunday February 10, 2008 @10:22PM (#22375292)

When you start aggregating as much text as google does, the semantics just starts popping out, in the form of word relationship statistics.
The massive corpus size, when measured carefully, acts to filter semantic signal from expressive difference "noise".

Combine that kind of latent semantic analysis of global human text with conceptual knowledge representation and inference
technologies (which would use a combination of higher-order logic, bayesian probability, etc) and it should be possible to
create a software program that could start to get a basic semantic understanding of documents and document relationships
in the ordinary "dumb" web.

Could the proponents of the semantic web please tell me what it will add to this?

My basic proposition is that if an averagely intelligent human can infer the semantic essence (the gist, shall we say), of
individual documents, and relationships between documents on the web, why can't we build AI software that does
the same thing, and then reports its results out to people who ask.

Re:Semantic Spam (Score:5, Insightful)

by fonik ( 776566 ) writes: on Sunday February 10, 2008 @10:22PM (#22375298)

And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system.

It just doesn't seem like the best idea in the world to me.

Re:What? (Score:3, Insightful)

by STrinity ( 723872 ) writes: on Sunday February 10, 2008 @10:26PM (#22375328) Homepage

Is the semantic web supposed to be one of those Web 3.0 things?

If by that you mean "a collection of buzz-words that everyone uses without having Clue 1 what the hell they're talking about," yes.

Re:Why can't AI get the semantics from the plain t (Score:1, Insightful)

by Anonymous Coward writes: on Sunday February 10, 2008 @10:39PM (#22375388)

[...] if an averagely intelligent human can [do X], why can't we build AI software that does the same thing [...]
Because wetware is still ahead of machines in a few domains. Be thankful for that because when we can build AI software for everything, we won't be needed anymore.

Re:Yawn... (Score:4, Insightful)

by QuantumG ( 50515 ) writes: <qg@biodome.org> on Sunday February 10, 2008 @11:27PM (#22375636) Homepage Journal

How do *you* know when information is bullshit?

How does Google's pagerank algorithm?

Re:Vaporware? (Score:2, Insightful)

by smurgy ( 1126401 ) writes: on Monday February 11, 2008 @12:21AM (#22375948)

I noticed that too... I was looking at the tags to provide an example of what machine-created tagging has to go up against to beat human tagging for a rant up above. I guess I have to thank that idiot for proving my point. Humans do hostile tags, they haven't yet written a subroutine to make a machine act like a jerk.

Re:Yawn... (Score:5, Insightful)

by QuantumG ( 50515 ) writes: <qg@biodome.org> on Monday February 11, 2008 @12:25AM (#22375968) Homepage Journal

Ok, you seem to be of the belief that I'm still talking about search.. in the classical "give me a web page about" sense. I'm not.. and the Semantic Web people are not. "next" has a meaning.. everyone knows what it is. "shuttle launch" has an almost unique meaning.. although some concept of our culture and common sense is needed to disambiguate it. Asking when the next shuttle launch is has a unique answer: a date and a statement of the confidence in that date. For example "March 12, depending on weather and other things that might scrub the launch." I don't expect this to be "webpages that are kept up-to-date with information specific to the next shuttle launch"... I expect the answer to my question to be synthesized in real time from a dynamic pool of knowledge which is obtained from reading the web. I want a brain in a jar that is at my beck and call to answer every little question like this that I have through-out the day.. on everything from spacecraft launches to what the soup of the day is at the five closest restaurants to my office. There doesn't need to be some web page that is updated daily by some guy who works near me and enjoys soup.. there just needs to be information on soup and location posted by restaurants in my area.

So am I talking about search? Well, yes, but its an algorithm that uses search to answer my questions.. instead of me having to do it.

Think about that soup question.. how would you do it now? I'd go to Google maps.. enter the location of my office, search businesses for restaurants, click on one of the top 5 to see if they have a daily updated menu, note the soup of the day, go back to Google maps, click on the next one, etc, until I had the answer I wanted. That's a pretty simple algorithm.. it's something a machine learning system could come up with.

Re:Why can't AI get the semantics from the plain t (Score:3, Insightful)

by The Master Control P ( 655590 ) writes: <ejkeeverNO@SPAMnerdshack.com> on Monday February 11, 2008 @12:44AM (#22376066)

Why should I be thankful about spending my adult life working because machines aren't up to the task? I'll be thankful when machines take the work and leave us free to do what we want.

Re:Yawn... (Score:3, Insightful)

by martin-boundary ( 547041 ) writes: on Monday February 11, 2008 @01:09AM (#22376206)

You think that if we feed weak AI algorithms a lot of cleaned up, pre-tagged data, that's going to help overcome the weakness of the algorithms and produce something worthwhile?
Sorry, there's a flaw in your reasoning: Who gets to pre-tag the data? Everybody. But you can't trust everybody on the net. So you'll get a lot of data that's specifically designed to confuse and subvert the weak algorithms, and by definition such algorithms aren't strong enough to rise to the challenge.
The Semantic Web people will get a nasty shock when they realize that what they've really got is the Spamantic Web.

Not the Social Web (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 11, 2008 @01:20AM (#22376254)

"No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it"

And yet we have social sites.

Re:Why can't AI get the semantics from the plain t (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 11, 2008 @02:25AM (#22376604)

I really don't see that happening. The transition to this sort of economy is basically where the problem is now. As human labor is replaced by robotic arms in factories, those employees are left to find another job. Only, their entire skill set has now been replaced, so they are back to square one... They don't receive pay for the rest of their lives just because their job was replaced with a machine that does it better.

A Little too Cynical (Score:4, Insightful)

by Gregory Arenius ( 1105327 ) writes: on Monday February 11, 2008 @05:42AM (#22377296)

I understand being jaded about internet hype and buzzwords but I'm still surprised that after nearly eighty comments there doesn't seem to be anyone who has anything to say other than "vaporware" and "it won't work because of the spammers." Yes, maybe it has been overhyped and yes it is taking a while for the envisioned ideas to come to fruition but that doesn't mean that those ideas aren't worthwhile.

I'll use the following example because I recently had to do this with non semantic tools. Lets say you wanted to see how good or bad a job a transit agency is doing in its city in comparison to other similar cities. A couple of metrics you might use to find similar cities would be population size, population density and land area. Google doesn't do a good job with something like that. You end up needing to search for cities individually and then finding their data points. Or you can find a list of cities ranked by population or population density. If you search on Google for something like that you end up at one of the Wikipedia lists. These lists are helpful but....still lacking. They don't contain all the cities you need or they don't provide a way to look at multiple data sets at the same time. The lists are also compiled by hand and aren't automatically updated when the information on the city page is changed. The data is in wikipedia though. Every city page lists that information in a little box near the start of the article. But how do I take this data that is in Wikipedia from the form that its in into a form that I can use to find what I need to know? Enter the semantic web.

Lets say that wikipedia, or at least the parts dealing with geography, were semantic. Now, there are tens of thousands of pages describing countries, regions, states, counties, parishes, cities, towns and villages. Then those pages are translated into many other languages. Some of the data that these pages contain is of the same type . They all contain the name of the locality, latitude, longitude, size, population size and elevation. For data such as this it would be pretty easy to have a form to enter the data into as opposed using the usual markup and the form could put the data into the proper markup for the page and the proper RDF. Once the data is in proper RDF form it would be easy to automate the process of updating translations of that page with the new data as well as updating any pertinent lists. It would also make it easier for people who want to analyze or use the data because they would be able to access it much more easily.

But nobody really wants machine readable access to this information, you might say, except for the random geek and researcher. I would disagree. Lets say you're using a program like Marble which is similar to Google Earth in some ways but is completely open source. If they wanted to display the population of a city when you hover over it they would currently have to create and maintain their own dataset or they'd have to write a parser to extract it from wikipedia. Neither of those options is particularly easy at the moment but if the information was in semantic form on wikipedia it would be a piece of cake.

The strength of the semantic web isn't, in my opinion, going to be AI like personal agents or anything like that. It'll be things that in many ways are already here. Like Yelp putting geotags on the restaurants they reviews and apps like Google Earth taking that data thats available in machine readable (Semantic!) for to overlay that data on a map so that you can see whats nearby. It'll be applications doing the same with the geotags from flickr. Its really useful mashups like http://www.housingmaps.com/ [housingmaps.com]. Its the transit agency putting realtime bus data up in semantic form so you can see on your iphones google map how far away the bus is. So yeah, maybe the semantic web is overhyped but that doesn't mean there isn't a lot of substance there, too.

Cheers,
Greg
Read the rest of this comment...

Vapourware my arse (Score:4, Insightful)

by theno23 ( 27900 ) writes: on Monday February 11, 2008 @06:13AM (#22377424) Homepage

The company I work for, Garlik [garlik.com] has two products that are run off semantic web technology. DataPatrol [garlik.com] (for pay) and QDOS [qdos.com] (free, in beta).

We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.

A fully normalised SQL database has many of the attributes of an RDF store, but
a) when was the last time you saw one in production use?
b) how much of a pain was it to write big queries with outer joins?

RDF + SPARQL [w3.org] makes that kind of thing trivial, and has other fringe side benefits (better standardisation, data portability) that you don't get with SQL.

I guess it shouldn't be a surprise to see the comments consisting of the usual round of more-or-less irrelevant jokes and snide commentary - this is Slashdot after all - but I can't help responding.

Fallacy: Designing for Old People (Score:3, Insightful)

by EgoWumpus ( 638704 ) writes: on Monday February 11, 2008 @11:42AM (#22379564)

I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc.

I'm afraid whenever I see this argument I immediately tend to discredit all the rest that I've read in that post. Designing technology for those who are least able to uptake it is a losing proposition at best; at worst a total disaster. Technology has always been utilized by those less set in their ways first, less invested in the capital and experience of doing it the 'old way', and is only more broadly adopted once it proves out as a better way to do things. Universal acceptance tends to only come after a generation; when those who are poorly situated to utilize it have passed on.

This speaks to your other concern rather tellingly. Fry's may not put their inventory online. But if Best Buy does, and reaps more rewards, then you can bet eventually all companies will do this as standard practice. Far more likely - a company that is smaller and more mobile will do it first, and then get bought out by a larger company that will adopt it's practices in order to stay potent in a changing marketplace.

But the successful online inventory app is not going to design for Best Buy first. They're going to design for Mom and Pop shop, and scale up to whatever customer they can find. When it proves out or doesn't there will be tangible evidence for others to act on - rather than meaningless hype.

Finally, I think the thing that the semantic web provides is more of the ability of the end user to control results. As we perfect our ability to parse machine language, we perfect our ability to hear clear signal amongst all the noise. I look forward to the day when we have this technology in more than a nascent stage, and think it's silly to dismiss it before then.

Also, I look forward to the day when people stop designing for me. Because presumably I'll be happy with what I have!

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Semantic Web Getting Real 135

Semantic Web Getting Real More Login

Semantic Web Getting Real

Semantic Spam (Score:2, Insightful)

Content? (Score:4, Insightful)

Re:Where's the Money? (Score:5, Insightful)

Re:Where's the Money? (Score:3, Insightful)

Re:long way to go.. (Score:3, Insightful)

Not the Semantic Web (Score:5, Insightful)

Why can't AI get the semantics from the plain text (Score:3, Insightful)

Re:Semantic Spam (Score:5, Insightful)

Re:What? (Score:3, Insightful)

Re:Why can't AI get the semantics from the plain t (Score:1, Insightful)

Re:Yawn... (Score:4, Insightful)

Re:Vaporware? (Score:2, Insightful)

Re:Yawn... (Score:5, Insightful)

Re:Why can't AI get the semantics from the plain t (Score:3, Insightful)

Re:Yawn... (Score:3, Insightful)

Not the Social Web (Score:1, Insightful)

Re:Why can't AI get the semantics from the plain t (Score:1, Insightful)

A Little too Cynical (Score:4, Insightful)

Vapourware my arse (Score:4, Insightful)

Fallacy: Designing for Old People (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot