Semantic Web Getting Real

Semantic Web Getting Real 135

Posted by kdawson on Sunday February 10, 2008 @09:12PM from the open-it-up-and-give-it-away dept.

BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."

Semantic Web Getting Real

This discussion has been archived. No new comments can be posted.

Search 135 Comments Log In/Create an Account

Comments Filter:

Re:Yawn... (Score:2, Informative)

by InsurgentGeek ( 926646 ) writes: on Sunday February 10, 2008 @09:49PM (#22375086)

You're a little unclear on the concept of an RDF graph. It's not a graph like your intro algebra class - it's a RDF (thats Resource Description Framework) representation of the semantics of a document. Check Wikipedia for Semantic Web or RDF.

In case you have no clue what they're talking abou (Score:5, Informative)

by WK2 ( 1072560 ) writes: on Sunday February 10, 2008 @10:19PM (#22375278) Homepage

If you are like me, and have absolutely positively no dang fucking clue what the summary is talking about: http://en.wikipedia.org/wiki/Semantic_Web [wikipedia.org]

According to the Wikipedia history, this concept has been around since at least 2001.

Re:Semantic Spam (Score:3, Informative)

by msuarezalvarez ( 667058 ) writes: on Sunday February 10, 2008 @11:19PM (#22375612)

This is slashdot and all, I know. But you seem not to have read even the summary: this is about someone exposing an API which lets you turn text into and RDF graph independently of the text producer. If you want, this something like someone giving you access to a tool like the one used by Google.

Re:Why can't AI get the semantics from the plain t (Score:3, Informative)

by msuarezalvarez ( 667058 ) writes: on Sunday February 10, 2008 @11:29PM (#22375642)

Could the proponents of the semantic web please tell me what it will add to this?

Actually, the story is about a tool which does (a part of) what you are describing.

"Free" for "anyone"? Not so fast. (Score:3, Informative)

by janbjurstrom ( 652025 ) writes: <<moc.liamg> <ta> <raeenoni>> on Monday February 11, 2008 @05:41AM (#22377284)

Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. ...
It's "free" for "anyone" for loose definitions of the terms. Glancing at their terms of use [opencalais.com] (emphasis added):
You understand that Reuters will retain a copy of the metadata submitted by you or that generated by the Calais service. By submitting or generating metadata through the Calais service, you grant Reuters a non-exclusive perpetual, sublicensable, royalty-free license to that metadata. From a privacy standpoint, Reuters use of this metadata is governed by the terms of the Reuters and Calais Privacy Statements.
So you pay with your metadata. One can say you're doing that with Google too. Nevertheless, that's not entirely free.

Also, it's not yet for "anyone." According to the Calais roadmap [opencalais.com], only English documents are accepted: "Calais R3 [July 2008] begins ... to incorporate a number of additional languages... Japanese, Spanish and French with additional languages coming in the future."

Because "AI" is a misnomer (Score:3, Informative)

by melted ( 227442 ) writes: on Monday February 11, 2008 @05:53AM (#22377328) Homepage

There's no more "intelligence" in AI than in a can of Campbell soup. It's basically statistics, linear algebra and (sometimes) handcoded rules for reasoning. It doesn't evolve. It doesn't build upon what it "knows". It has no self-awareness or consciousness and its reasoning capabilities, if present, are extremely weak compared to even children.

We're so early in the development of this field that no one can even define what "self awareness" or "consciousness" really is, let alone how to create it or scale it. Folks try. There's Cycorp, there's Powerset, there are a lot of people in academia who work on NLP, Machine Vision, classification, neuroscience, etc. There is, however, no unifying vision or theory/understanding what is it we're trying to build, and the current methods have nothing in common with "intelligence" per se. They do learn, in a sense that they figure out the hidden structure of a given set of data by approximating it using a mathematical model. Even though this model sometimes closely matches what a human brain does (e.g. in multilayer neural nets), they don't come anywhere close to what one would call "intelligence". What they lack is scale (and speed), and advanced cognitive mechanisms required to become self-learning.

It's also interesting to note, that at this point humans know on a high level how their brain works. Neocortex is a six layer neural net with links going cross-layer and neurons organized into columns. Trouble is, there's hundred billion neurons. We sorta know how vision works, too. Trouble is, we can't work with it in real time (because, naturally, you'd need a chunk of those hundred billion neurons). Heck, even human language is a pain in the ass if you don't have advanced cognition (AKA strong AI), with ability to understand euphemisms, sarcasm and idioms, paraphrase, generalize and specialize. Heck, even anaphora resolution is not solved yet (i.e. what does he/she/it in the current sentence refer to in the previous text). It's as if you had a bunch of parts and no manual and someone asked you to assemble a spaceship out of what you have, warning you that some parts are broken and may require you to make your own replacements. Without blueprints. Blindfolded. With your hands tied behind your back.

I do believe that in 50 years we will have strong AI, though. I work in a science lab, however, and many researchers don't share my optimism.

Re:Semantic Spam (Score:5, Informative)

by SolitaryMan ( 538416 ) writes: on Monday February 11, 2008 @06:00AM (#22377354) Homepage Journal

And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system. It just doesn't seem like the best idea in the world to me.

I think you are missing the point of Semantic Web: you can refer or link to an object, not just document.

The company declares its URI. Now, If you are writing an article about this company, you can uniquely identify it and every web crawler knows *exactly* what company are you talking about. If the URI for the company is a hyperlink to its web site, then it can't be abused: the company itself declares what it is. The unique URI will in fact be a link to some file with information about company (maybe an RDF file -- doesn't really matter for the concept)

The system can (and will be abused) in the same way as an old web: irrelevant links, words, concepts -- nothing new for the crawler and can be defeated with existing techniques.

Again, Semantic Web = Links between concepts, not just documents, so please do not bury the good idea under the pile of misunderstanding.

Re:Yawn... (Score:3, Informative)

by Lally Singh ( 3427 ) writes: on Monday February 11, 2008 @12:17PM (#22379920) Journal

It's the difference between having all of your customer data in a set of text files vs a database. The database is structured, which lets the computer do more analysis on it. It can also index that data more effectively.

Here's one example, say I want to do a little semi-political research. I ask semantic google (which, for the sake of argument, has a more advanced query language) for the relationship between the price of RAM and the price of oil.

Right now, google could at best look for an article on that specifically.

With a semantic web, it can find data points for the price of RAM & oil in various places and give me back a table. Why? because the pages would be marked with those datapoints specifically.

Or, which years have wars with total dead > some threshold. A summation query over the lifetime of the war can do that. I don't have to find a single webpage where someone's done that by hand. Or some specialized data service for it. Google (or some other search agent) could correlate that data for me from blogs, newspaper articles, UN reports, etc. Combine them all together (b/c it knows that they're all data points for the same thing), and give me a report. It could even show me a comparison of which data sources give which numbers, letting me see report bias right there.

Give you a little bit of a chubby? Definitely gives me one. Add this to a smart voice-operated query agent and you have some star-trek stuff going on.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Semantic Web Getting Real 135

Semantic Web Getting Real More Login

Semantic Web Getting Real

Re:Yawn... (Score:2, Informative)

In case you have no clue what they're talking abou (Score:5, Informative)

Re:Semantic Spam (Score:3, Informative)

Re:Why can't AI get the semantics from the plain t (Score:3, Informative)

"Free" for "anyone"? Not so fast. (Score:3, Informative)

Because "AI" is a misnomer (Score:3, Informative)

Re:Semantic Spam (Score:5, Informative)

Re:Yawn... (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot