Semantic Web Getting Real 135
BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."
Re:Yawn... (Score:2, Informative)
In case you have no clue what they're talking abou (Score:5, Informative)
According to the Wikipedia history, this concept has been around since at least 2001.
Re:Semantic Spam (Score:3, Informative)
Re:Why can't AI get the semantics from the plain t (Score:3, Informative)
Actually, the story is about a tool which does (a part of) what you are describing.
"Free" for "anyone"? Not so fast. (Score:3, Informative)
Also, it's not yet for "anyone." According to the Calais roadmap [opencalais.com], only English documents are accepted: "Calais R3 [July 2008] begins
Because "AI" is a misnomer (Score:3, Informative)
We're so early in the development of this field that no one can even define what "self awareness" or "consciousness" really is, let alone how to create it or scale it. Folks try. There's Cycorp, there's Powerset, there are a lot of people in academia who work on NLP, Machine Vision, classification, neuroscience, etc. There is, however, no unifying vision or theory/understanding what is it we're trying to build, and the current methods have nothing in common with "intelligence" per se. They do learn, in a sense that they figure out the hidden structure of a given set of data by approximating it using a mathematical model. Even though this model sometimes closely matches what a human brain does (e.g. in multilayer neural nets), they don't come anywhere close to what one would call "intelligence". What they lack is scale (and speed), and advanced cognitive mechanisms required to become self-learning.
It's also interesting to note, that at this point humans know on a high level how their brain works. Neocortex is a six layer neural net with links going cross-layer and neurons organized into columns. Trouble is, there's hundred billion neurons. We sorta know how vision works, too. Trouble is, we can't work with it in real time (because, naturally, you'd need a chunk of those hundred billion neurons). Heck, even human language is a pain in the ass if you don't have advanced cognition (AKA strong AI), with ability to understand euphemisms, sarcasm and idioms, paraphrase, generalize and specialize. Heck, even anaphora resolution is not solved yet (i.e. what does he/she/it in the current sentence refer to in the previous text). It's as if you had a bunch of parts and no manual and someone asked you to assemble a spaceship out of what you have, warning you that some parts are broken and may require you to make your own replacements. Without blueprints. Blindfolded. With your hands tied behind your back.
I do believe that in 50 years we will have strong AI, though. I work in a science lab, however, and many researchers don't share my optimism.
Re:Semantic Spam (Score:5, Informative)
I think you are missing the point of Semantic Web: you can refer or link to an object, not just document.
The company declares its URI. Now, If you are writing an article about this company, you can uniquely identify it and every web crawler knows *exactly* what company are you talking about. If the URI for the company is a hyperlink to its web site, then it can't be abused: the company itself declares what it is. The unique URI will in fact be a link to some file with information about company (maybe an RDF file -- doesn't really matter for the concept)
The system can (and will be abused) in the same way as an old web: irrelevant links, words, concepts -- nothing new for the crawler and can be defeated with existing techniques.
Again, Semantic Web = Links between concepts, not just documents, so please do not bury the good idea under the pile of misunderstanding.
Re:Yawn... (Score:3, Informative)
Here's one example, say I want to do a little semi-political research. I ask semantic google (which, for the sake of argument, has a more advanced query language) for the relationship between the price of RAM and the price of oil.
Right now, google could at best look for an article on that specifically.
With a semantic web, it can find data points for the price of RAM & oil in various places and give me back a table. Why? because the pages would be marked with those datapoints specifically.
Or, which years have wars with total dead > some threshold. A summation query over the lifetime of the war can do that. I don't have to find a single webpage where someone's done that by hand. Or some specialized data service for it. Google (or some other search agent) could correlate that data for me from blogs, newspaper articles, UN reports, etc. Combine them all together (b/c it knows that they're all data points for the same thing), and give me a report. It could even show me a comparison of which data sources give which numbers, letting me see report bias right there.
Give you a little bit of a chubby? Definitely gives me one. Add this to a smart voice-operated query agent and you have some star-trek stuff going on.